Search | arXiv e-print repository

Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid spatial-spectral denoising network (HSSD), in which we design a novel hybrid dual-path network inspired by CNN and Transformer characteristics, leading to capturing both local and non-local spatial details while suppressing noise efficiently. Furthermore, to reduce computational complexity, we adopt a simple but effective decoupling strategy that disentangles the learning of space and spectral channels, where multilayer perception with few parameters is utilized to learn the global correlations among spectra. The synthetic and real experiments demonstrate that our proposed method outperforms state-of-the-art methods on spatial and spectral reconstruction. The code and details are available on https://github.com/HLImg/HSSD. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2403.05791 [pdf, other]

Asynchronous Microphone Array Calibration using Hybrid TDOA Information

Authors: Chengjie Zhang, Jiang Wang, He Kong

Abstract: Asynchronous microphone array calibration is a prerequisite for most audition robot applications. A popular solution to the above calibration problem is the batch form of Simultaneous Localisation and Map** (SLAM), using the time difference of arrival measurements between two microphones (TDOA-M), and the robot (which serves as a moving sound source during calibration) odometry information. In t… ▽ More Asynchronous microphone array calibration is a prerequisite for most audition robot applications. A popular solution to the above calibration problem is the batch form of Simultaneous Localisation and Map** (SLAM), using the time difference of arrival measurements between two microphones (TDOA-M), and the robot (which serves as a moving sound source during calibration) odometry information. In this paper, we introduce a new form of measurement for microphone array calibration, i.e. the time difference of arrival between adjacent sound events (TDOA-S) with respect to the microphone channels. We propose to combine TDOA-S and TDOA-M, called hybrid TDOA, together with odometry measurements for bath SLAM-based calibration of asynchronous microphone arrays. Simulation and real-world experiment results consistently show that our method is more independent of microphone number, less sensitive to initial values (when using off-the-shelf algorithms such as Gauss-Newton iterations), and has better calibration accuracy and robustness under various TDOA noises. In addition, the simulation result demonstrates that our method has a lower Cramér-Rao lower bound (CRLB) for microphone parameters. To benefit the community, we open-source our code and data at https://github.com/zcj808/Hybrid-TDOA-Calib. △ Less

Submitted 19 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2301.07925 [pdf, other]

Communication under Mixed Gaussian-Impulsive Channel: An End-to-End Framework

Authors: Chengjie Zhao, Jun Wang, Wei Huang, Xiaonan Chen, Tianfu Qi

Abstract: In many communication scenarios, the communication signals simultaneously suffer from white Gaussian noise (WGN) and non-Gaussian impulsive noise (IN), i.e., mixed Gaussian-impulsive noise (MGIN). Under MGIN channel, classical communication signal schemes and corresponding detection methods usually can not achieve desirable performance as they are optimized with respect to WGN. Moreover, as the wi… ▽ More In many communication scenarios, the communication signals simultaneously suffer from white Gaussian noise (WGN) and non-Gaussian impulsive noise (IN), i.e., mixed Gaussian-impulsive noise (MGIN). Under MGIN channel, classical communication signal schemes and corresponding detection methods usually can not achieve desirable performance as they are optimized with respect to WGN. Moreover, as the widely adopted IN model has no analytical and general closed-form expression of probability density function (PDF), it is extremely hard to obtain optimal communication signal and corresponding detection schemes based on classical stochastic signal processing theory. To circumvent these difficulties, we propose a data-driven end-to-end framework to address the communication signal design and detection under MGIN channel in this paper. In this proposed framework, a channel noise simulator (CNS) is elaborately designed based on an improved generative adversarial net (GAN) to simulate the MGIN without requirement of any analytical PDF. Meanwhile, a multi-level wavelet convolutional neural network (MWCNN) based preprocessing network is used to mitigate the negative effect of outliers due to the IN. Compared with conventional approaches and existing end-to-end systems, extensive simulation results verify that our proposed novel end-to-end communication system can achieve better performance in terms of bit-error rate (BER) under MGIN environments. △ Less

Submitted 19 January, 2023; originally announced January 2023.

arXiv:2211.14448 [pdf, other]

How to Backpropagate through Hungarian in Your DETR?

Authors: Lingji Chen, Alok Sharma, Chinmay Shirore, Chengjie Zhang, Balarama Raju Buddharaju

Abstract: The DEtection TRansformer (DETR) approach, which uses a transformer encoder-decoder architecture and a set-based global loss, has become a building block in many transformer based applications. However, as originally presented, the assignment cost and the global loss are not aligned, i.e., reducing the former is likely but not guaranteed to reduce the latter. And the issue of gradient is ignored w… ▽ More The DEtection TRansformer (DETR) approach, which uses a transformer encoder-decoder architecture and a set-based global loss, has become a building block in many transformer based applications. However, as originally presented, the assignment cost and the global loss are not aligned, i.e., reducing the former is likely but not guaranteed to reduce the latter. And the issue of gradient is ignored when a combinatorial solver such as Hungarian is used. In this paper we show that the global loss can be expressed as the sum of an assignment-independent term, and an assignment-dependent term which can be used to define the assignment cost matrix. Recent results on generalized gradients of optimal assignment cost with respect to parameters of an assignment problem are then used to define generalized gradients of the loss with respect to network parameters, and backpropagation is carried out properly. Our experiments using the same loss weights show interesting convergence properties and a potential for further performance improvements. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2209.14435 [pdf, other]

Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Authors: Chengjie Huang, Van Duong Nguyen, Vahdat Abdelzad, Christopher Gus Mannes, Luke Rowe, Benjamin Therien, Rick Salay, Krzysztof Czarnecki

Abstract: 3D object detection is an essential part of automated driving, and deep neural networks (DNNs) have achieved state-of-the-art performance for this task. However, deep models are notorious for assigning high confidence scores to out-of-distribution (OOD) inputs, that is, inputs that are not drawn from the training distribution. Detecting OOD inputs is challenging and essential for the safe deployme… ▽ More 3D object detection is an essential part of automated driving, and deep neural networks (DNNs) have achieved state-of-the-art performance for this task. However, deep models are notorious for assigning high confidence scores to out-of-distribution (OOD) inputs, that is, inputs that are not drawn from the training distribution. Detecting OOD inputs is challenging and essential for the safe deployment of models. OOD detection has been studied extensively for the classification task, but it has not received enough attention for the object detection task, specifically LiDAR-based 3D object detection. In this paper, we focus on the detection of OOD inputs for LiDAR-based 3D object detection. We formulate what OOD inputs mean for object detection and propose to adapt several OOD detection methods for object detection. We accomplish this by our proposed feature extraction method. To evaluate OOD detection methods, we develop a simple but effective technique of generating OOD objects for a given object detection model. Our evaluation based on the KITTI dataset shows that different OOD detection methods have biases toward detecting specific OOD objects. It emphasizes the importance of combined OOD detection methods and more research in this direction. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: Accepted at ITSC 2022

arXiv:2205.05675 [pdf, other]

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

arXiv:2111.08857 [pdf, other]

SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition

Authors: Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie Wu, Jianye Hao, Dong Li, **zhong Tang

Abstract: The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards. To address the challenge, in this paper, we present \textbf{SEIHAI}, a \textbf{S}ample-\textbf{e}f… ▽ More The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards. To address the challenge, in this paper, we present \textbf{SEIHAI}, a \textbf{S}ample-\textbf{e}ff\textbf{i}cient \textbf{H}ierarchical \textbf{AI}, that fully takes advantage of the human demonstrations and the task structure. Specifically, we split the task into several sequentially dependent subtasks, and train a suitable agent for each subtask using reinforcement learning and imitation learning. We further design a scheduler to select different agents for different subtasks automatically. SEIHAI takes the first place in the preliminary and final of the NeurIPS-2020 MineRL competition. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: The winner solution of NeurIPS 2020 MineRL competition (https://www.aicrowd.com/challenges/neurips-2020-minerl-competition/leaderboards). The paper has been accepted by DAI 2021 (the third International Conference on Distributed Artificial Intelligence)

arXiv:2108.13294 [pdf, other]

The missing link: Develo** a safety case for perception components in automated driving

Authors: Rick Salay, Krzysztof Czarnecki, Hiroshi Kuwajima, Hirotoshi Yasuoka, Toshihiro Nakae, Vahdat Abdelzad, Chengjie Huang, Maximilian Kahn, Van Duong Nguyen

Abstract: Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases expre… ▽ More Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases express safety requirements at the system level and these efforts are missing the critical linking argument needed to integrate safety requirements at the system level with component performance requirements at the unit level. In this paper, we propose the Integration Safety Case for Perception (ISCaP), a generic template for such a linking safety argument specifically tailored for perception components. The template takes a deductive and formal approach to define strong traceability between levels. We demonstrate the applicability of ISCaP with a detailed case study and discuss its use as a tool to support incremental development of perception components. △ Less

Submitted 6 September, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

arXiv:2107.06463 [pdf, other]

Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Authors: Haisheng Fu, Feng Liang, Jian** Lin, Bing Li, Mohammad Akbari, Jie Liang, Guohe Zhang, Dong Liu, Chengjie Tu, **gning Han

Abstract: Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models… ▽ More Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions within one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately and efficiently, given the same complexity. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak, Tecnick-100 and Tecnick-40 datasets show that the proposed scheme outperforms all the leading learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM. The source code is available at \url{https://github.com/fengyuren**sheng} △ Less

Submitted 9 February, 2024; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: IEEE Transactions On Image Processing

arXiv:2012.15463 [pdf, other]

Learned Multi-Resolution Variable-Rate Image Compression with Octave-based Residual Blocks

Authors: Mohammad Akbari, Jie Liang, **gning Han, Chengjie Tu

Abstract: Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convol… ▽ More Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv) with built-in generalized divisive normalization (GDN) and inverse GDN (IGDN) layers. Novel GoConv- and GoTConv-based residual blocks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalar quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods. △ Less

Submitted 31 December, 2020; originally announced December 2020.

Comments: 10 pages, 9 figures, 1 table; accepted to IEEE Transactions on Multimedia 2020. arXiv admin note: substantial text overlap with arXiv:1912.05688

arXiv:2009.13074 [pdf, other]

Learned Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution

Authors: Jian** Lin, Mohammad Akbari, Haisheng Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, Feng Liang, Guohe Zhang, Chengjie Tu

Abstract: In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the origin… ▽ More In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the auto-encoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: MMSP 2020; JPEG-AI. arXiv admin note: text overlap with arXiv:2002.10032

arXiv:2006.14497 [pdf, other]

Quantumized Microwave Detection Based on $Λ$-Type Three-level Superconducting System: HMM Modeling and Performance Prediction

Authors: Junyu Zhang, Chen Gong, Shangbin Li, Shanchi Wu, Rui Ni, Chengjie Zuo, **kang Zhu, Ming Zhao, Zhengyuan Xu

Abstract: We adopt artificial $Λ$-type three-level system with superconducting devices for microwave signal detection, where the signal intensity reaches the level of discrete photons instead of continuous waveform. Based on the state transition principles of the three-level system, we propose a statistical model for microwave signal detection. Moreover, we investigate the achievable transmission rate and s… ▽ More We adopt artificial $Λ$-type three-level system with superconducting devices for microwave signal detection, where the signal intensity reaches the level of discrete photons instead of continuous waveform. Based on the state transition principles of the three-level system, we propose a statistical model for microwave signal detection. Moreover, we investigate the achievable transmission rate and signal detection based on the statistical model. It is predicted that the proposed detection can achieve significantly higher sensitivity compared with the currently deployed 4G/5G communication system. We further characterize the received signal considering the saturation phonomenon, which reveals negligible performance degradation caused by saturation under weak received power regime. △ Less

Submitted 27 August, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: 12 pages, 18 figures

arXiv:2006.14471 [pdf, other]

Wireless Communication Based on Microwave Photon-Level Detection With Superconducting Devices: Achievable Rate Prediction

Authors: Junyu Zhang, Chen Gong, Shangbin Li, Rui Ni, Chengjie Zuo, **kang Zhu, Ming Zhao, Zhengyuan Xu

Abstract: Future wireless communication system embraces physical-layer signal detection with high sensitivity, especially in the microwave photon level. Currently, the receiver primarily adopts the signal detection based on semi-conductor devices for signal detection, while this paper introduces high-sensitivity photon-level microwave detection based on superconducting structure. We first overview existing… ▽ More Future wireless communication system embraces physical-layer signal detection with high sensitivity, especially in the microwave photon level. Currently, the receiver primarily adopts the signal detection based on semi-conductor devices for signal detection, while this paper introduces high-sensitivity photon-level microwave detection based on superconducting structure. We first overview existing works on the photon-level communication in the optical spectrum as well as the microwave photon-level sensing based on superconducting structure in both theoretical and experimental perspectives, including microwave detection circuit model based on Josephson junction, microwave photon counter based on Josephson junction, and two reconstruction approaches under background noise. In addition, we characterize channel modeling based on two different microwave photon detection approaches, including the absorption barrier and the dual-path Handury Brown-Twiss (HBT) experiments, and predict the corresponding achievable rates. According to the performance prediction, it is seen that the microwave photon-level signal detection can increase the receiver sensitivity compared with the state-of-the-art standardized communication system with waveform signal reception, with gain over $10$dB. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 9 pages, 13 figures

arXiv:2003.12933 [pdf, other]

Weak Radio Frequency Signal Detection Based on Piezo-Opto-Electro-Mechanical System: Architecture Design and Sensitivity Prediction

Authors: Shanchi Wu, Chen Gong, Chengjie Zuo, Shangbin Li, Junyu Zhang, Zhongbin Dai, Kai Yang, Ming Zhao, Rui Ni, Zhengyuan Xu, **kang Zhu

Abstract: We propose a novel radio-frequency (RF) receiving architecture based on micro-electro-mechanical system (MEMS) and optical coherent detection module. The architecture converts the received electrical signal into mechanical vibration through the piezoelectric effect and adopts an optical detection module to detect the mechanical vibration. We analyze the response function of piezoelectric film to a… ▽ More We propose a novel radio-frequency (RF) receiving architecture based on micro-electro-mechanical system (MEMS) and optical coherent detection module. The architecture converts the received electrical signal into mechanical vibration through the piezoelectric effect and adopts an optical detection module to detect the mechanical vibration. We analyze the response function of piezoelectric film to an RF signal, the noise limited sensitivity of the optical detection module and the system transfer function in the frequency domain. Finally, we adopt simple on-off keying (OOK) modulation with bandwidth 1 kHz and carrier frequency 1 GHz, to numerically evaluate the detection sensitivity. The result shows that, considering the main noise sources in wireless channel and circuits, the signal detection sensitivity can reach around -160 dBm with a 50 $Ω$ impedance. Such sensitivity significantly outperforms that of the currently deployed Long Term Evolution (LTE) system, when normalizing the transmission bandwidth also to 1 kHz. △ Less

Submitted 8 October, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

Comments: 15 pages, 16 figures, 6 tables

arXiv:2002.10032 [pdf, other]

Generalized Octave Convolutions for Learned Multi-Frequency Image Compression

Authors: Mohammad Akbari, Jie Liang, **gning Han, Chengjie Tu

Abstract: Learned image compression has recently shown the potential to outperform the standard codecs. State-of-the-art rate-distortion (R-D) performance has been achieved by context-adaptive entropy coding approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents are feature maps of the… ▽ More Learned image compression has recently shown the potential to outperform the standard codecs. State-of-the-art rate-distortion (R-D) performance has been achieved by context-adaptive entropy coding approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents are feature maps of the same spatial resolution in previous works, which contain some redundancies that affect the R-D performance. In this paper, we propose the first learned multi-frequency image compression and entropy coding approach that is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components, where the low frequency is represented by a lower resolution. Therefore, its spatial redundancy is reduced, which improves the R-D performance. Novel generalized octave convolution and octave transposed-convolution architectures with internal activation layers are also proposed to preserve more spatial structure of the information. Experimental results show that the proposed scheme not only outperforms all existing learned methods as well as standard codecs such as the next-generation video coding standard VVC (4:2:0) on the Kodak dataset in both PSNR and MS-SSIM. We also show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks such as semantic segmentation and image denoising. △ Less

Submitted 31 December, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

Comments: 13 pages, 10 figures, 5 tables; Extended version of the paper accepted to AAAI 2021

arXiv:1912.05688 [pdf, other]

Learned Variable-Rate Image Compression with Residual Divisive Normalization

Authors: Mohammad Akbari, Jie Liang, **gning Han, Chengjie Tu

Abstract: Recently it has been shown that deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increases the implementation complexity. In this paper, we propose a variable-rate image compression framework, which employs more Generalized Divisive Normalization (GDN) layers than pre… ▽ More Recently it has been shown that deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increases the implementation complexity. In this paper, we propose a variable-rate image compression framework, which employs more Generalized Divisive Normalization (GDN) layers than previous GDN-based methods. Novel GDN-based residual sub-networks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalable quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms all standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: 6 pages, 5 figures

arXiv:1907.06566 [pdf, other]

doi 10.1016/j.image.2019.115774

Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs

Authors: Haisheng Fu, Feng Liang, Bo Lei, Nai Bian, Qian zhang, Mohammad Akbari, Jie Liang, Chengjie Tu

Abstract: Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly enco… ▽ More Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Comments: Submitted to Signal Processing: Image Communication

Report number: 1907.06566

Journal ref: Volume 82, March 2020, 115774

Showing 1–17 of 17 results for author: Chengjie