-
Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes
Authors:
Kang You,
Kai Liu,
Li Yu,
Pan Gao,
Dandan Ding
Abstract:
Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance an…
▽ More
Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Prototy** and Experimental Results for Environment-Aware Millimeter Wave Beam Alignment via Channel Knowledge Map
Authors:
Zhuoyin Dai,
Di Wu,
Zhenjun Dong,
Kun Li,
Dingyang Ding,
Sihan Wang,
Yong Zeng
Abstract:
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, te…
▽ More
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, termed beam index map (BIM). To this end, a general CKM construction method is first presented, and an indoor BIM is constructed offline to learn the candidate transmit and receive beam index pairs for each grid in the experimental area. Furthermore, based on the location information of the receiver (or the dynamic obstacles) from the ultra-wide band (UWB) positioning system, the established BIM is used to achieve training-free beam alignment by directly providing the beam indexes for the transmitter and receiver. Three typical scenarios are considered in the experiment, including quasi-static environment with line-of-sight (LoS) link, quasistatic environment without LoS link and dynamic environment. Besides, the receiver orientation measured from the gyroscope is also used to help CKM predict more accurate beam indexes. The experiment results show that compared with the benchmark location-based beam alignment strategy, the CKM-based beam alignment strategy can achieve much higher received power, which is close to that achieved by exhaustive beam search, but with significantly reduced training overhead.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Another Way to the Top: Exploit Contextual Clustering in Learned Image Coding
Authors:
Yichi Zhang,
Zhihao Duan,
Ming Lu,
Dandan Ding,
Fengqing Zhu,
Zhan Ma
Abstract:
While convolution and self-attention are extensively used in learned image compression (LIC) for transform coding, this paper proposes an alternative called Contextual Clustering based LIC (CLIC) which primarily relies on clustering operations and local attention for correlation characterization and compact representation of an image. As seen, CLIC expands the receptive field into the entire image…
▽ More
While convolution and self-attention are extensively used in learned image compression (LIC) for transform coding, this paper proposes an alternative called Contextual Clustering based LIC (CLIC) which primarily relies on clustering operations and local attention for correlation characterization and compact representation of an image. As seen, CLIC expands the receptive field into the entire image for intra-cluster feature aggregation. Afterward, features are reordered to their original spatial positions to pass through the local attention units for inter-cluster embedding. Additionally, we introduce the Guided Post-Quantization Filtering (GuidedPQF) into CLIC, effectively mitigating the propagation and accumulation of quantization errors at the initial decoding stage. Extensive experiments demonstrate the superior performance of CLIC over state-of-the-art works: when optimized using MSE, it outperforms VVC by about 10% BD-Rate in three widely-used benchmark datasets; when optimized using MS-SSIM, it saves more than 50% BD-Rate over VVC. Our CLIC offers a new way to generate compact representations for image compression, which also provides a novel direction along the line of LIC development.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Transferable Learned Image Compression-Resistant Adversarial Perturbations
Authors:
Yang Sui,
Zhuohang Li,
Ding Ding,
Xiang Pan,
Xiaozhong Xu,
Shan Liu,
Zhenzhong Chen
Abstract:
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of D…
▽ More
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Resilient Constrained Reinforcement Learning
Authors:
Dongsheng Ding,
Zhengyan Huan,
Alejandro Ribeiro
Abstract:
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we…
▽ More
We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we propose a new constrained RL approach that searches for policy and constraint specifications together. This method features the adaptation of relaxing the constraint according to a relaxation cost introduced in the learning objective. Since this feature mimics how ecological systems adapt to disruptions by altering operation, our approach is termed as resilient constrained RL. Specifically, we provide a set of sufficient conditions that balance the constraint satisfaction and the reward maximization in notion of resilient equilibrium, propose a tractable formulation of resilient constrained policy optimization that takes this equilibrium as an optimal solution, and advocate two resilient constrained policy search algorithms with non-asymptotic convergence guarantees on the optimality gap and constraint satisfaction. Furthermore, we demonstrate the merits and the effectiveness of our approach in computational experiments.
△ Less
Submitted 29 December, 2023; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression
Authors:
Yang Sui,
Ding Ding,
Xiang Pan,
Xiaozhong Xu,
Shan Liu,
Bo Yuan,
Zhenzhong Chen
Abstract:
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression…
▽ More
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Authors:
Dongsheng Ding,
Chen-Yu Wei,
Kaiqing Zhang,
Alejandro Ribeiro
Abstract:
We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To…
▽ More
We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual variable via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
△ Less
Submitted 16 January, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning
Authors:
Dongsheng Ding,
Xiaohan Wei,
Zhuoran Yang,
Zhaoran Wang,
Mihailo R. Jovanović
Abstract:
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic util…
▽ More
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L \sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
Supervised Domain Adaptation for Recognizing Retinal Diseases from Wide-Field Fundus Images
Authors:
Qijie Wei,
**gyuan Yang,
Bo Wang,
**rui Wang,
Jianchun Zhao,
Xinyu Zhao,
Sheng Yang,
Niranchana Manivannan,
Youxin Chen,
Dayong Ding,
**g Zhou,
Xirong Li
Abstract:
This paper addresses the emerging task of recognizing multiple retinal diseases from wide-field (WF) and ultra-wide-field (UWF) fundus images. For an effective use of existing large amount of labeled color fundus photo (CFP) data and the relatively small amount of WF and UWF data, we propose a supervised domain adaptation method named Cross-domain Collaborative Learning (CdCL). Inspired by the suc…
▽ More
This paper addresses the emerging task of recognizing multiple retinal diseases from wide-field (WF) and ultra-wide-field (UWF) fundus images. For an effective use of existing large amount of labeled color fundus photo (CFP) data and the relatively small amount of WF and UWF data, we propose a supervised domain adaptation method named Cross-domain Collaborative Learning (CdCL). Inspired by the success of fixed-ratio based mixup in unsupervised domain adaptation, we re-purpose this strategy for the current task. Due to the intrinsic disparity between the field-of-view of CFP and WF/UWF images, a scale bias naturally exists in a mixup sample that the anatomic structure from a CFP image will be considerably larger than its WF/UWF counterpart. The CdCL method resolves the issue by Scale-bias Correction, which employs Transformers for producing scale-invariant features. As demonstrated by extensive experiments on multiple datasets covering both WF and UWF images, the proposed method compares favorably against a number of competitive baselines.
△ Less
Submitted 23 October, 2023; v1 submitted 14 May, 2023;
originally announced May 2023.
-
Lossless Point Cloud Attribute Compression Using Cross-scale, Cross-group, and Cross-color Prediction
Authors:
Jianqiang Wang,
Dandan Ding,
Zhan Ma
Abstract:
This work extends the multiscale structure originally developed for point cloud geometry compression to point cloud attribute compression. To losslessly encode the attribute while maintaining a low bitrate, accurate probability prediction is critical. With this aim, we extensively exploit cross-scale, cross-group, and cross-color correlations of point cloud attribute to ensure accurate probability…
▽ More
This work extends the multiscale structure originally developed for point cloud geometry compression to point cloud attribute compression. To losslessly encode the attribute while maintaining a low bitrate, accurate probability prediction is critical. With this aim, we extensively exploit cross-scale, cross-group, and cross-color correlations of point cloud attribute to ensure accurate probability estimation and thus high coding efficiency. Specifically, we first generate multiscale attribute tensors through average pooling, by which, for any two consecutive scales, the decoded lower-scale attribute can be used to estimate the attribute probability in the current scale in one shot. Additionally, in each scale, we perform the probability estimation group-wisely following a predefined grou** pattern. In this way, both cross-scale and (same-scale) cross-group correlations are exploited jointly. Furthermore, cross-color redundancy is removed by allowing inter-color processing for YCoCg/RGB alike multi-channel attributes. The proposed method not only demonstrates state-of-the-art compression efficiency with significant performance gains over the latest G-PCC on various contents but also sustains low complexity with affordable encoding and decoding runtime.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Using Learned Indexes to Improve Time Series Indexing Performance on Embedded Sensor Devices
Authors:
David Ding,
Ivan Carvalho,
Ramon Lawrence
Abstract:
Efficiently querying data on embedded sensor and IoT devices is challenging given the very limited memory and CPU resources. With the increasing volumes of collected data, it is critical to process, filter, and manipulate data on the edge devices where it is collected to improve efficiency and reduce network transmissions. Existing embedded index structures do not adapt to the data distribution an…
▽ More
Efficiently querying data on embedded sensor and IoT devices is challenging given the very limited memory and CPU resources. With the increasing volumes of collected data, it is critical to process, filter, and manipulate data on the edge devices where it is collected to improve efficiency and reduce network transmissions. Existing embedded index structures do not adapt to the data distribution and characteristics. This paper demonstrates how applying learned indexes that develop space efficient summaries of the data can dramatically improve the query performance and predictability. Learned indexes based on linear approximations can reduce the query I/O by 50 to 90% and improve query throughput by a factor of 2 to 5, while only requiring a few kilobytes of RAM. Experimental results on a variety of time series data sets demonstrate the advantages of learned indexes that considerably improve over the state-of-the-art index algorithms.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Dynamic Point Cloud Geometry Compression Using Multiscale Inter Conditional Coding
Authors:
Jianqiang Wang,
Dandan Ding,
Hao Chen,
Zhan Ma
Abstract:
This work extends the Multiscale Sparse Representation (MSR) framework developed for static Point Cloud Geometry Compression (PCGC) to support the dynamic PCGC through the use of multiscale inter conditional coding. To this end, the reconstruction of the preceding Point Cloud Geometry (PCG) frame is progressively downscaled to generate multiscale temporal priors which are then scale-wise transferr…
▽ More
This work extends the Multiscale Sparse Representation (MSR) framework developed for static Point Cloud Geometry Compression (PCGC) to support the dynamic PCGC through the use of multiscale inter conditional coding. To this end, the reconstruction of the preceding Point Cloud Geometry (PCG) frame is progressively downscaled to generate multiscale temporal priors which are then scale-wise transferred and integrated with lower-scale spatial priors from the same frame to form the contextual information to improve occupancy probability approximation when processing the current PCG frame from one scale to another. Following the Common Test Conditions (CTC) defined in the standardization committee, the proposed method presents State-Of-The-Art (SOTA) compression performance, yielding 78% lossy BD-Rate gain to the latest standard-compliant V-PCC and 45% lossless bitrate reduction to the latest G-PCC. Even for recently-emerged learning-based solutions, our method still shows significant performance gains.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Segmentation-based Information Extraction and Amalgamation in Fundus Images for Glaucoma Detection
Authors:
Yanni Wang,
Gang Yang,
Dayong Ding,
Jianchun Zao
Abstract:
Glaucoma is a severe blinding disease, for which automatic detection methods are urgently needed to alleviate the scarcity of ophthalmologists. Many works have proposed to employ deep learning methods that involve the segmentation of optic disc and cup for glaucoma detection, in which the segmentation process is often considered merely as an upstream sub-task. The relationship between fundus image…
▽ More
Glaucoma is a severe blinding disease, for which automatic detection methods are urgently needed to alleviate the scarcity of ophthalmologists. Many works have proposed to employ deep learning methods that involve the segmentation of optic disc and cup for glaucoma detection, in which the segmentation process is often considered merely as an upstream sub-task. The relationship between fundus images and segmentation masks in terms of joint decision-making in glaucoma assessment is rarely explored. We propose a novel segmentation-based information extraction and amalgamation method for the task of glaucoma detection, which leverages the robustness of segmentation masks without disregarding the rich information in the original fundus images. Experimental results on both private and public datasets demonstrate that our proposed method outperforms all models that utilize solely either fundus images or masks.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
CARNet:Compression Artifact Reduction for Point Cloud Attribute
Authors:
Dandan Ding,
Junzhe Zhang,
Jianqiang Wang,
Zhan Ma
Abstract:
A learning-based adaptive loop filter is developed for the Geometry-based Point Cloud Compression (G-PCC) standard to reduce attribute compression artifacts. The proposed method first generates multiple Most-Probable Sample Offsets (MPSOs) as potential compression distortion approximations, and then linearly weights them for artifact mitigation. As such, we drive the filtered reconstruction as clo…
▽ More
A learning-based adaptive loop filter is developed for the Geometry-based Point Cloud Compression (G-PCC) standard to reduce attribute compression artifacts. The proposed method first generates multiple Most-Probable Sample Offsets (MPSOs) as potential compression distortion approximations, and then linearly weights them for artifact mitigation. As such, we drive the filtered reconstruction as close to the uncompressed PCA as possible. To this end, we devise a Compression Artifact Reduction Network (CARNet) which consists of two consecutive processing phases: MPSOs derivation and MPSOs combination. The MPSOs derivation uses a two-stream network to model local neighborhood variations from direct spatial embedding and frequency-dependent embedding, where sparse convolutions are utilized to best aggregate information from sparsely and irregularly distributed points. The MPSOs combination is guided by the least square error metric to derive weighting coefficients on the fly to further capture content dynamics of input PCAs. The CARNet is implemented as an in-loop filtering tool of the GPCC, where those linear weighting coefficients are encapsulated into the bitstream with negligible bit rate overhead. Experimental results demonstrate significant improvement over the latest GPCC both subjectively and objectively.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Channel Estimation for Delay Alignment Modulation
Authors:
Dingyang Ding,
Yong Zeng
Abstract:
Delay alignment modulation (DAM) is a promising technology for inter-symbol interference (ISI)-free communication without relying on sophisticated channel equalization or multi-carrier transmissions. The key ideas of DAM are delay precompensation and path-based beamforming, so that the multi-path signal components will arrive at the receiver simultaneously and constructively, rather than causing t…
▽ More
Delay alignment modulation (DAM) is a promising technology for inter-symbol interference (ISI)-free communication without relying on sophisticated channel equalization or multi-carrier transmissions. The key ideas of DAM are delay precompensation and path-based beamforming, so that the multi-path signal components will arrive at the receiver simultaneously and constructively, rather than causing the detrimental ISI. However, the practical implementation of DAM requires channel state information (CSI) at the transmitter side. Therefore, in this letter, we study an efficient channel estimation method for DAM based on block orthogonal matching pursuit (BOMP) algorithm, by exploiting the block sparsity of the channel vector. Based on the imperfectly estimated CSI, the delay pre-compensations and tap-based beamforming are designed for DAM, and the resulting performance is studied. Simulation results demonstrate that with the BOMP-based channel estimation method, the CSI can be effectively acquired with low training overhead, and the performance of DAM based on estimated CSI is comparable to the ideal case with perfect CSI.
△ Less
Submitted 31 March, 2023; v1 submitted 19 June, 2022;
originally announced June 2022.
-
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs
Authors:
Dongsheng Ding,
Kaiqing Zhang,
Jiali Duan,
Tamer Başar,
Mihailo R. Jovanović
Abstract:
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD)…
▽ More
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. Finally, we use computational experiments to showcase the merits and the effectiveness of our approach.
△ Less
Submitted 17 October, 2023; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Environment-Aware Beam Selection for IRS-Aided Communication with Channel Knowledge Map
Authors:
Dingyang Ding,
Di Wu,
Yong Zeng,
Shi **,
Rui Zhang
Abstract:
Intelligent reflecting surface (IRS)-aided communication is a promising technology for beyond 5G (B5G) systems, to reconfigure the radio environment proactively. However, IRS-aided communication in practice requires efficient channel estimation or passive beam training, whose overhead and complexity increase drastically with the number of reflecting elements/beam directions. To tackle this challen…
▽ More
Intelligent reflecting surface (IRS)-aided communication is a promising technology for beyond 5G (B5G) systems, to reconfigure the radio environment proactively. However, IRS-aided communication in practice requires efficient channel estimation or passive beam training, whose overhead and complexity increase drastically with the number of reflecting elements/beam directions. To tackle this challenge, we propose in this paper a novel environment-aware joint active and passive beam selection scheme for IRS-aided wireless communication, based on the new concept of channel knowledge map (CKM). Specifically, by utilizing both the location information of the user equipment (UE), which is readily available in contemporary wireless systems with ever-increasing accuracy, and the environment information offered by CKM, the proposed scheme achieves efficient beam selection with either no real-time training required (training-free beam selection) or only moderate training overhead (light-training beam selection). Numerical results based on practical channels obtained using commercial ray tracing software are presented, which demonstrate the superior performance of the proposed scheme over various benchmark schemes.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Sparse Tensor-based Multiscale Representation for Point Cloud Geometry Compression
Authors:
Jianqiang Wang,
Dandan Ding,
Zhu Li,
Xiaoxing Feng,
Chuntong Cao,
Zhan Ma
Abstract:
This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also all…
▽ More
This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also allows us to compress scale-wise MP-POVs by exploiting cross-scale and same-scale correlations extensively and flexibly. The overall compression efficiency highly depends on the accuracy of estimated occupancy probability for each MP-POV. Thus, we first design the Sparse Convolution-based Neural Network (SparseCNN) which stacks sparse convolutions and voxel sampling to best characterize and embed spatial correlations. We then develop the SparseCNN-based Occupancy Probability Approximation (SOPA) model to estimate the occupancy probability either in a single-stage manner only using the cross-scale correlation, or in a multi-stage manner by exploiting stage-wise correlation among same-scale neighbors. Besides, we also suggest the SparseCNN based Local Neighborhood Embedding (SLNE) to aggregate local variations as spatial priors in feature attribute to improve the SOPA. Our unified approach not only shows state-of-the-art performance in both lossless and lossy compression modes across a variety of datasets including the dense object PCGs (8iVFB, Owlii, MUVB) and sparse LiDAR PCGs (KITTI, Ford) when compared with standardized MPEG G-PCC and other prevalent learning-based schemes, but also has low complexity which is attractive to practical applications.
△ Less
Submitted 21 October, 2022; v1 submitted 20 November, 2021;
originally announced November 2021.
-
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Authors:
Andrew Jaegle,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Carl Doersch,
Catalin Ionescu,
David Ding,
Skanda Koppula,
Daniel Zoran,
Andrew Brock,
Evan Shelhamer,
Olivier Hénaff,
Matthew M. Botvinick,
Andrew Zisserman,
Oriol Vinyals,
Joāo Carreira
Abstract:
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data f…
▽ More
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
△ Less
Submitted 15 March, 2022; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Advances In Video Compression System Using Deep Neural Network: A Review And Case Studies
Authors:
Dandan Ding,
Zhan Ma,
Di Chen,
Qingshuang Chen,
Zoe Liu,
Fengqing Zhu
Abstract:
Significant advances in video compression system have been made in the past several decades to satisfy the nearly exponential growth of Internet-scale video traffic. From the application perspective, we have identified three major functional blocks including pre-processing, coding, and post-processing, that have been continuously investigated to maximize the end-user quality of experience (QoE) un…
▽ More
Significant advances in video compression system have been made in the past several decades to satisfy the nearly exponential growth of Internet-scale video traffic. From the application perspective, we have identified three major functional blocks including pre-processing, coding, and post-processing, that have been continuously investigated to maximize the end-user quality of experience (QoE) under a limited bit rate budget. Recently, artificial intelligence (AI) powered techniques have shown great potential to further increase the efficiency of the aforementioned functional blocks, both individually and jointly. In this article, we review extensively recent technical advances in video compression system, with an emphasis on deep neural network (DNN)-based approaches; and then present three comprehensive case studies. On pre-processing, we show a switchable texture-based video coding example that leverages DNN-based scene understanding to extract semantic areas for the improvement of subsequent video coder. On coding, we present an end-to-end neural video coding framework that takes advantage of the stacked DNNs to efficiently and compactly code input raw videos via fully data-driven learning. On post-processing, we demonstrate two neural adaptive filters to respectively facilitate the in-loop and post filtering for the enhancement of compressed frames. Finally, a companion website hosting the contents developed in this work can be accessed publicly at https://purdueviper.github.io/dnn-coding/.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning
Authors:
Ming Lu,
Tong Chen,
Dandan Ding,
Fengqing Zhu,
Zhan Ma
Abstract:
Inspired by the facts that retinal cells actually segregate the visual scene into different attributes (e.g., spatial details, temporal motion) for respective neuronal processing, we propose to first decompose the input video into respective spatial texture frames (STF) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMF) at a lower sp…
▽ More
Inspired by the facts that retinal cells actually segregate the visual scene into different attributes (e.g., spatial details, temporal motion) for respective neuronal processing, we propose to first decompose the input video into respective spatial texture frames (STF) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMF) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic resampling in decomposition and HEVC compliant codec in compression, and puts the focus on the synthesis part. For resolution-adaptive synthesis, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a non-local texture transfer network (NL-TTN) to better augment spatial details, by which the compression and resolution resampling noises can be effectively alleviated with better rate-distortion efficiency. Such "Decomposition, Compression, Synthesis (DCS)" based scheme is codec agnostic, currently exemplifying averaged $\approx$1 dB PSNR gain or $\approx$25% BD-rate saving, against the HEVC anchor using reference software. In addition, experimental comparisons to the state-of-the-art methods and ablation studies are conducted to further report the efficiency and generalization of DCS algorithm, promising an encouraging direction for future video coding.
△ Less
Submitted 15 January, 2024; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Multiscale Point Cloud Geometry Compression
Authors:
Jianqiang Wang,
Dandan Ding,
Zhu Li,
Zhan Ma
Abstract:
Recent years have witnessed the growth of point cloud based applications because of its realistic and fine-grained representation of 3D objects and scenes. However, it is a challenging problem to compress sparse, unstructured, and high-precision 3D points for efficient communication. In this paper, leveraging the sparsity nature of point cloud, we propose a multiscale end-to-end learning framework…
▽ More
Recent years have witnessed the growth of point cloud based applications because of its realistic and fine-grained representation of 3D objects and scenes. However, it is a challenging problem to compress sparse, unstructured, and high-precision 3D points for efficient communication. In this paper, leveraging the sparsity nature of point cloud, we propose a multiscale end-to-end learning framework which hierarchically reconstructs the 3D Point Cloud Geometry (PCG) via progressive re-sampling. The framework is developed on top of a sparse convolution based autoencoder for point cloud compression and reconstruction. For the input PCG which has only the binary occupancy attribute, our framework translates it to a downscaled point cloud at the bottleneck layer which possesses both geometry and associated feature attributes. Then, the geometric occupancy is losslessly compressed using an octree codec and the feature attributes are lossy compressed using a learned probabilistic context model.Compared to state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based PCC (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG), our method achieves more than 40% and 70% BD-Rate (Bjontegaard Delta Rate) reduction, respectively. Its encoding runtime is comparable to that of G-PCC, which is only 1.5% of V-PCC.
△ Less
Submitted 7 November, 2020;
originally announced November 2020.
-
Global exponential stability of primal-dual gradient flow dynamics based on the proximal augmented Lagrangian: A Lyapunov-based approach
Authors:
Dongsheng Ding,
Mihailo R. Jovanović
Abstract:
For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian. The result holds when the differentiable part of the objective function is strongly convex with a Lipschitz continuous gradient; the non-dif…
▽ More
For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian. The result holds when the differentiable part of the objective function is strongly convex with a Lipschitz continuous gradient; the non-differentiable part is proper, lower semi-continuous, and convex; and the matrix in the linear constraint is full row rank. Our quadratic Lyapunov function generalizes recent result from strongly convex problems with either affine equality or inequality constraints to a broader class of composite optimization problems with nonsmooth regularizers and it provides a worst-case lower bound of the exponential decay rate. Finally, we use computational experiments to demonstrate that our convergence rate estimate is less conservative than the existing alternatives.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization
Authors:
Weisen Wang,
Zhiyan Xu,
Weihong Yu,
Jianchun Zhao,
**gyuan Yang,
Feng He,
Zhikun Yang,
Di Chen,
Dayong Ding,
Youxin Chen,
Xirong Li
Abstract:
This paper studies automated categorization of age-related macular degeneration (AMD) given a multi-modal input, which consists of a color fundus image and an optical coherence tomography (OCT) image from a specific eye. Previous work uses a traditional method, comprised of feature extraction and classifier training that cannot be optimized jointly. By contrast, we propose a two-stream convolution…
▽ More
This paper studies automated categorization of age-related macular degeneration (AMD) given a multi-modal input, which consists of a color fundus image and an optical coherence tomography (OCT) image from a specific eye. Previous work uses a traditional method, comprised of feature extraction and classifier training that cannot be optimized jointly. By contrast, we propose a two-stream convolutional neural network (CNN) that is end-to-end. The CNN's fusion layer is tailored to the need of fusing information from the fundus and OCT streams. For generating more multi-modal training instances, we introduce Loose Pair training, where a fundus image and an OCT image are paired based on class labels rather than eyes. Moreover, for a visual interpretation of how the individual modalities make contributions, we extend the class activation map** technique to the multi-modal scenario. Experiments on a real-world dataset collected from an outpatient clinic justify the viability of our proposal for multi-modal AMD categorization.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.