Search | arXiv e-print repository

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Authors: Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Abstract: Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an ef… ▽ More Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an effective method for reducing memory costs and computational complexity. When quantizing diffusion transformers, we find that applying existing diffusion quantization methods designed for U-Net faces challenges in preserving quality. After analyzing the major challenges for quantizing diffusion transformers, we design an improved quantization scheme: "ViDiT-Q": Video and Image Diffusion Transformer Quantization) to address these issues. Furthermore, we identify highly sensitive layers and timesteps hinder quantization for lower bit-widths. To tackle this, we improve ViDiT-Q with a novel metric-decoupled mixed-precision quantization method (ViDiT-Q-MP). We validate the effectiveness of ViDiT-Q across a variety of text-to-image and video models. While baseline quantization methods fail at W8A8 and produce unreadable content at W4A8, ViDiT-Q achieves lossless W8A8 quantization. ViDiTQ-MP achieves W4A8 with negligible visual quality degradation, resulting in a 2.5x memory optimization and a 1.5x latency speedup. △ Less

Submitted 30 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Project Page: https://a-suozhang.xyz/viditq.github.io/

arXiv:2405.12438 [pdf, other]

doi 10.1145/3635636.3664260

CoCo Matrix: Taxonomy of Cognitive Contributions in Co-writing with Intelligent Agents

Authors: Ruyuan Wan, Simret Gebreegziabhe, Toby Jia-Jun Li, Karla Badillo-Urquiola

Abstract: In recent years, there has been a growing interest in employing intelligent agents in writing. Previous work emphasizes the evaluation of the quality of end product-whether it was coherent and polished, overlooking the journey that led to the product, which is an invaluable dimension of the creative process. To understand how to recognize human efforts in co-writing with intelligent writing system… ▽ More In recent years, there has been a growing interest in employing intelligent agents in writing. Previous work emphasizes the evaluation of the quality of end product-whether it was coherent and polished, overlooking the journey that led to the product, which is an invaluable dimension of the creative process. To understand how to recognize human efforts in co-writing with intelligent writing systems, we adapt Flower and Hayes' cognitive process theory of writing and propose CoCo Matrix, a two-dimensional taxonomy of entropy and information gain, to depict the new human-agent co-writing model. We define four quadrants and situate thirty-four published systems within the taxonomy. Our research found that low entropy and high information gain systems are under-explored, yet offer promising future directions in writing tasks that benefit from the agent's divergent planning and the human's focused translation. CoCo Matrix, not only categorizes different writing systems but also deepens our understanding of the cognitive processes in human-agent co-writing. By analyzing minimal changes in the writing process, CoCo Matrix serves as a proxy for the writer's mental model, allowing writers to reflect on their contributions. This reflection is facilitated through the measured metrics of information gain and entropy, which provide insights irrespective of the writing system used. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.19247 [pdf, ps, other]

Improved AutoEncoder with LSTM module and KL divergence

Authors: Wei Huang, Bingyang Zhang, Kaituo Zhang, Hua Gao, Rongchun Wan

Abstract: The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep convolutional autoencoder (CAE) network and deep supporting vector data description (SVDD) model have been universally employed and have demonstrated significant success in detecting anomalies. However, the over-reconstruction ability of CAE network for anomalous data can easily lead to… ▽ More The task of anomaly detection is to separate anomalous data from normal data in the dataset. Models such as deep convolutional autoencoder (CAE) network and deep supporting vector data description (SVDD) model have been universally employed and have demonstrated significant success in detecting anomalies. However, the over-reconstruction ability of CAE network for anomalous data can easily lead to high false negative rate in detecting anomalous data. On the other hand, the deep SVDD model has the drawback of feature collapse, which leads to a decrease of detection accuracy for anomalies. To address these problems, we propose the Improved AutoEncoder with LSTM module and Kullback-Leibler divergence (IAE-LSTM-KL) model in this paper. An LSTM network is added after the encoder to memorize feature representations of normal data. In the meanwhile, the phenomenon of feature collapse can also be mitigated by penalizing the featured input to SVDD module via KL divergence. The efficacy of the IAE-LSTM-KL model is validated through experiments on both synthetic and real-world datasets. Experimental results show that IAE-LSTM-KL model yields higher detection accuracy for anomalies. In addition, it is also found that the IAE-LSTM-KL model demonstrates enhanced robustness to contaminated outliers in the dataset. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2403.01677 [pdf, other]

Elliptic anisotropy of open-charm hadron in p--Pb collisions at LHC energies from parton scatterings

Authors: Siyu Tang, Yuan Lu, Chao Zhang, RenZhuo Wan

Abstract: The elliptic azimuthal anisotropy coefficient ($v_{2}$) of open-charm hadron at midrapidity ($|η<1|$) was studied in p--Pb collisions at $\sqrt{s_{\mathrm{NN}}}=$ 8.16 TeV using a multi-phase transport model (AMPT). By implementing an additional heavy quark--antiquark pair production trigger in the AMPT, we obtained a simultaneously description of the $p_{\mathrm{T}}$ spectrum and $v_{2}$ of… ▽ More The elliptic azimuthal anisotropy coefficient ($v_{2}$) of open-charm hadron at midrapidity ($|η<1|$) was studied in p--Pb collisions at $\sqrt{s_{\mathrm{NN}}}=$ 8.16 TeV using a multi-phase transport model (AMPT). By implementing an additional heavy quark--antiquark pair production trigger in the AMPT, we obtained a simultaneously description of the $p_{\mathrm{T}}$ spectrum and $v_{2}$ of $D^{0}$ meson. Then the predictions for the $v_{2}$ of charm hadrons including $D^{+}$, $D_{s}^{+}$ and $Λ_{c}^{+}$ in p--Pb collisions are provided for the first time. We found that the $v_{2}$ of open-charm hadron follows the number-of-constituent-quark (NCQ) scaling in high-multiplicity p--Pb collisions, and is significantly affected by the parton scattering process. These findings further demonstrate the importance of partonic degrees of freedom in small collision systems for heavy flavors, and provide referential value for future measurements of azimuthal anisotropy at the LHC energies. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures, submitted to Physical Review C

arXiv:2402.12184 [pdf, other]

Colorizing Monochromatic Radiance Fields

Authors: Yean Cheng, Renjie Wan, Shuchen Weng, Chengxuan Zhu, Yakun Chang, Boxin Shi

Abstract: Though Neural Radiance Fields (NeRF) can produce colorful 3D representations of the world by using a set of 2D images, such ability becomes non-existent when only monochromatic images are provided. Since color is necessary in representing the world, reproducing color from monochromatic radiance fields becomes crucial. To achieve this goal, instead of manipulating the monochromatic radiance fields… ▽ More Though Neural Radiance Fields (NeRF) can produce colorful 3D representations of the world by using a set of 2D images, such ability becomes non-existent when only monochromatic images are provided. Since color is necessary in representing the world, reproducing color from monochromatic radiance fields becomes crucial. To achieve this goal, instead of manipulating the monochromatic radiance fields directly, we consider it as a representation-prediction task in the Lab color space. By first constructing the luminance and density representation using monochromatic images, our prediction stage can recreate color representation on the basis of an image colorization module. We then reproduce a colorful implicit model through the representation of luminance, density, and color. Extensive experiments have been conducted to validate the effectiveness of our approaches. Our project page: https://liquidammonia.github.io/color-nerf. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.10165 [pdf, ps, other]

Marked length spectrum rigidity in groups with contracting elements

Authors: Renxing Wan, Xiaoyu Xu, Wenyuan Yang

Abstract: This paper presents a study of the well-known marked length spectrum rigidity problem in the coarse-geometric setting. For any two (possibly non-proper) group actions $G\curvearrowright X_1$ and $G\curvearrowright X_2$ with contracting property, we prove that if the two actions have the same marked length spectrum, then the orbit map $Go_1\to Go_2$ must be a rough isometry. In the special case of… ▽ More This paper presents a study of the well-known marked length spectrum rigidity problem in the coarse-geometric setting. For any two (possibly non-proper) group actions $G\curvearrowright X_1$ and $G\curvearrowright X_2$ with contracting property, we prove that if the two actions have the same marked length spectrum, then the orbit map $Go_1\to Go_2$ must be a rough isometry. In the special case of cusp-uniform actions, the rough isometry can be extended to the entire space. This generalizes the existing results in hyperbolic groups and relatively hyperbolic groups. In addition, we prove a finer marked length spectrum rigidity from confined subgroups and further, geometrically dense subgroups. Our proof is based on the Extension Lemma and uses purely elementary metric geometry. This study produces new results and recovers existing ones for many more interesting groups through a unified and elementary approach. △ Less

Submitted 20 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 45 pages; updated information about reference [41]

MSC Class: 20F65

arXiv:2401.08195 [pdf, ps, other]

Three classes of propagation rules for GRS and EGRS codes and their applications to EAQECCs

Authors: Ruhao Wan, Shixin Zhu

Abstract: In this paper, we study the Hermitian hulls of (extended) generalized Reed-Solomon (GRS and EGRS) codes over finite fields. For a given class of (extended) GRS codes, by increasing the length, increasing the dimensions and increasing both the length and the dimensions, we obtain three new classes of (extended) GRS codes with Hermitian hulls of arbitrary dimensions. Furthermore, we obtain several n… ▽ More In this paper, we study the Hermitian hulls of (extended) generalized Reed-Solomon (GRS and EGRS) codes over finite fields. For a given class of (extended) GRS codes, by increasing the length, increasing the dimensions and increasing both the length and the dimensions, we obtain three new classes of (extended) GRS codes with Hermitian hulls of arbitrary dimensions. Furthermore, we obtain several new classes of $q^2$-ary maximum distance separable (MDS) codes with Hermitian hulls of arbitrary dimensions. And the dimension of these MDS codes can be taken from $1$ to $\frac{n}{2}$. By propagation rules, the parameters of the obtained code can be more flexible. As an application, a lot of new (MDS) entanglement-assisted quantum error correction codes (EAQECCs) can be constructed from previous known (extended) GRS codes. We derive three new propagation rules on (MDS) EAQECCs constructed from (extended) GRS codes. Finally, we present several new classes of (MDS) EAQECCs with flexible parameters. Notably, the distance parameters of our codes can range from $2$ to $\frac{n+2}{2}$. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 23 pages, 5 tables

ACM Class: E.4

arXiv:2401.03618 [pdf, ps, other]

On sharp variational inequalities for oscillatory integral operators of Stein-Wainger type

Authors: Renhui Wan

Abstract: For any integer $n \geq 2$, we establish $L^p(\R^n)$ inequalities for the $r$-variations of Stein-Wainger type oscillatory integral operators with general radial phase functions. These inequalities closely related to Carleson's theorem are sharp, up to endpoints. In particular, when the phase function is chosen as $|t|^\A$ with $\A\in (0,1)$, our results provide an affirmative answer to a question… ▽ More For any integer $n \geq 2$, we establish $L^p(\R^n)$ inequalities for the $r$-variations of Stein-Wainger type oscillatory integral operators with general radial phase functions. These inequalities closely related to Carleson's theorem are sharp, up to endpoints. In particular, when the phase function is chosen as $|t|^\A$ with $\A\in (0,1)$, our results provide an affirmative answer to a question posed in Guo-Roos-Yung (Anal. PDE, 2020). In pursuit of the objective, we use an approach incorporating the method of stationary phase, a square function estimate derived from Seeger's arguments, Stein-Tomas restriction estimates, and a crucial observation enabling the derivation of useful decay estimates near endpoints. Furthermore, we obtain the restricted weak type estimates for endpoints in the specific case of homogeneous phase functions. As an application of these outcomes, we further extend analogous results to a class of ergodic integral operators. △ Less

Submitted 6 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: We do some changes, and thank Dr. Haixia Yu for his suggestions on some related references. The problem is motivated by an open question posed by Guo-Roos-Yung (Anal. PDE, 2020), but the method here is different (in fact, we need both new approaches and new observations) except the classical procedure that variational inequality is reduced to the long and short variation-norm inequalities

arXiv:2401.02031 [pdf, other]

Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Authors: Ruofei Wang, Renjie Wan, Zongyu Guo, Qing Guo, Rui Huang

Abstract: Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Water… ▽ More Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP2024

arXiv:2312.15595 [pdf, other]

Zero-Inflated Bandits

Authors: Haoyu Wei, Runzhe Wan, Lei Shi, Rui Song

Abstract: Many real applications of bandits have sparse non-zero rewards, leading to slow learning rates. A careful distribution modeling that utilizes problem-specific structures is known as critical to estimation efficiency in the statistics literature, yet is under-explored in bandits. To fill the gap, we initiate the study of zero-inflated bandits, where the reward is modeled as a classic semi-parametri… ▽ More Many real applications of bandits have sparse non-zero rewards, leading to slow learning rates. A careful distribution modeling that utilizes problem-specific structures is known as critical to estimation efficiency in the statistics literature, yet is under-explored in bandits. To fill the gap, we initiate the study of zero-inflated bandits, where the reward is modeled as a classic semi-parametric distribution called zero-inflated distribution. We carefully design Upper Confidence Bound (UCB) and Thompson Sampling (TS) algorithms for this specific structure. Our algorithms are suitable for a very general class of reward distributions, operating under tail assumptions that are considerably less stringent than the typical sub-Gaussian requirements. Theoretically, we derive the regret bounds for both the UCB and TS algorithms for multi-armed bandit, showing that they can achieve rate-optimal regret when the reward distribution is sub-Gaussian. The superior empirical performance of the proposed methods is shown via extensive numerical studies. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.12871 [pdf, other]

Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

Authors: Yu Liu, Runzhe Wan, James McQueen, Doug Hains, **xiang Gu, Rui Song

Abstract: The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of da… ▽ More The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of data-driven AES selection in for online experimentation services by introducing two solutions. The first employs a three-layer Gaussian Mixture Model considering the heteroskedasticity across experiments, and it seeks to estimate the true expected effect size among positive experiments. The second method, grounded in utility theory, aims to determine the optimal effect size by striking a balance between the experiment's cost and the precision of decision-making. Through comparisons with baseline methods using both simulated and real data, we showcase the superior performance of the proposed approaches. △ Less

Submitted 17 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2310.18715 [pdf, other]

Robust Offline Reinforcement learning with Heavy-Tailed Rewards

Authors: ** Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi

Abstract: This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-m… ▽ More This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM. △ Less

Submitted 30 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: 23 pages, 6 figures. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

arXiv:2310.00214 [pdf, ps, other]

Quantum MDS Codes with length $n\equiv 0,1($mod$\,\frac{q\pm1}{2})$

Authors: Ruhao Wan

Abstract: An important family of quantum codes is the quantum maximum-distance-separable (MDS) codes. In this paper, we construct some new classes of quantum MDS codes by generalized Reed-Solomon (GRS) codes and Hermitian construction. In addition, the length $n$ of most of the quantum MDS codes we constructed satisfies $n\equiv 0,1($mod$\,\frac{q\pm1}{2})$, which is different from previously known code len… ▽ More An important family of quantum codes is the quantum maximum-distance-separable (MDS) codes. In this paper, we construct some new classes of quantum MDS codes by generalized Reed-Solomon (GRS) codes and Hermitian construction. In addition, the length $n$ of most of the quantum MDS codes we constructed satisfies $n\equiv 0,1($mod$\,\frac{q\pm1}{2})$, which is different from previously known code lengths. At the same time, the quantum MDS codes we construct have large minimum distances that are greater than $q/2+1$. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: 21 pages, 2 tables

MSC Class: 81p70

arXiv:2309.12708 [pdf, other]

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

Authors: Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, Jian Pu

Abstract: Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC,… ▽ More Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC, the first cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. These scenes exhibit long-range perception and minimal occlusion. We develop an automated annotation pipeline leveraging Semantic Segment Anything to efficiently assign semantics. To benchmark progress, we propose a LiDAR-based model with a Spatial-Aware Transformer for global and local feature extraction and a Completion and Segmentation Cooperative Module for joint completion and segmentation. PointSSC provides a challenging testbed to drive advances in semantic point cloud completion for real-world navigation. The code and datasets are available at https://github.com/yyxssm/PointSSC. △ Less

Submitted 6 March, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: ICRA2024, oral & poster

arXiv:2309.02702 [pdf, other]

Gene-induced Multimodal Pre-training for Image-omic Classification

Authors: Ting **, Xingran Xie, Renjie Wan, Qingli Li, Yan Wang

Abstract: Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classificatio… ▽ More Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information.After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at https://github.com/huangwudiduan/GIMP. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.03990 [pdf, ps, other]

NEOLAF, an LLM-powered neural-symbolic cognitive architecture

Authors: Richard Jiarui Tong, Cassie Chen Cao, Timothy Xueqian Lee, Guodong Zhao, Ray Wan, Feiyue Wang, Xiangen Hu, Robin Schmucker, **sheng Pan, Julian Quevedo, Yu Lu

Abstract: This paper presents the Never Ending Open Learning Adaptive Framework (NEOLAF), an integrated neural-symbolic cognitive architecture that models and constructs intelligent agents. The NEOLAF framework is a superior approach to constructing intelligent agents than both the pure connectionist and pure symbolic approaches due to its explainability, incremental learning, efficiency, collaborative and… ▽ More This paper presents the Never Ending Open Learning Adaptive Framework (NEOLAF), an integrated neural-symbolic cognitive architecture that models and constructs intelligent agents. The NEOLAF framework is a superior approach to constructing intelligent agents than both the pure connectionist and pure symbolic approaches due to its explainability, incremental learning, efficiency, collaborative and distributed learning, human-in-the-loop enablement, and self-improvement. The paper further presents a compelling experiment where a NEOLAF agent, built as a problem-solving agent, is fed with complex math problems from the open-source MATH dataset. The results demonstrate NEOLAF's superior learning capability and its potential to revolutionize the field of cognitive architectures and self-improving adaptive instructional systems. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.14489 [pdf, other]

SuperInpaint: Learning Detail-Enhanced Attentional Implicit Representation for Super-resolutional Image Inpainting

Authors: Canyu Zhang, Qing Guo, Xiaoguang Li, Renjie Wan, Hongkai Yu, Ivor Tsang, Song Wang

Abstract: In this work, we introduce a challenging image restoration task, referred to as SuperInpaint, which aims to reconstruct missing regions in low-resolution images and generate completed images with arbitrarily higher resolutions. We have found that this task cannot be effectively addressed by stacking state-of-the-art super-resolution and image inpainting methods as they amplify each other's flaws,… ▽ More In this work, we introduce a challenging image restoration task, referred to as SuperInpaint, which aims to reconstruct missing regions in low-resolution images and generate completed images with arbitrarily higher resolutions. We have found that this task cannot be effectively addressed by stacking state-of-the-art super-resolution and image inpainting methods as they amplify each other's flaws, leading to noticeable artifacts. To overcome these limitations, we propose the detail-enhanced attentional implicit representation (DEAR) that can achieve SuperInpaint with a single model, resulting in high-quality completed images with arbitrary resolutions. Specifically, we use a deep convolutional network to extract the latent embedding of an input image and then enhance the high-frequency components of the latent embedding via an adaptive high-pass filter. This leads to detail-enhanced semantic embedding. We further feed the semantic embedding into an unmask-attentional module that suppresses embeddings from ineffective masked pixels. Additionally, we extract a pixel-wise importance map that indicates which pixels should be used for image reconstruction. Given the coordinates of a pixel we want to reconstruct, we first collect its neighboring pixels in the input image and extract their detail-enhanced semantic embeddings, unmask-attentional semantic embeddings, importance values, and spatial distances to the desired pixel. Then, we feed all the above terms into an implicit representation and generate the color of the specified pixel. To evaluate our method, we extend three existing datasets for this new task and build 18 meaningful baselines using SOTA inpainting and super-resolution methods. Extensive experimental results demonstrate that our method outperforms all existing methods by a significant margin on four widely used metrics. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.11526 [pdf, other]

CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields

Authors: Ziyuan Luo, Qing Guo, Ka Chun Cheung, Simon See, Renjie Wan

Abstract: Neural Radiance Fields (NeRF) have the potential to be a major representation of media. Since training a NeRF has never been an easy task, the protection of its model copyright should be a priority. In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with… ▽ More Neural Radiance Fields (NeRF) have the potential to be a major representation of media. Since training a NeRF has never been an easy task, the protection of its model copyright should be a priority. In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with a watermarked color representation. Then, a distortion-resistant rendering scheme is designed to guarantee robust message extraction in 2D renderings of NeRF. Our proposed method can directly protect the copyright of NeRF models while maintaining high rendering quality and bit accuracy when compared among optional solutions. △ Less

Submitted 29 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: 11 pages, 6 figures, accepted by ICCV 2023 non-camera-ready version

arXiv:2307.08400 [pdf, ps, other]

Uniform exponential growth for groups with proper product actions on hyperbolic spaces

Authors: Renxing Wan, Wenyuan Yang

Abstract: This paper studies the locally uniform exponential growth and product set growth for a finitely generated group $G$ acting properly on a finite product of hyperbolic spaces. Under the assumption of coarsely dense orbits or shadowing property on factors, we prove that any finitely generated non-virtually abelian subgroup has uniform exponential growth. These assumptions are full-filled in many hier… ▽ More This paper studies the locally uniform exponential growth and product set growth for a finitely generated group $G$ acting properly on a finite product of hyperbolic spaces. Under the assumption of coarsely dense orbits or shadowing property on factors, we prove that any finitely generated non-virtually abelian subgroup has uniform exponential growth. These assumptions are full-filled in many hierarchically hyperbolic groups, including map** class groups, specially cubulated groups and BMW groups. Moreover, if $G$ acts weakly acylindrically on each factor, we show that, with two exceptional classes of subgroups, $G$ has uniform product set growth. As corollaries, this gives a complete classification of subgroups with product set growth for any group acting discretely on a simply connected manifold with pinched negative curvature, for groups acting acylindrically on trees, and for 3-manifold groups. △ Less

Submitted 22 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 27 pages, 3 figures; the number of pages changes due to different formats

MSC Class: 20F65

arXiv:2307.04122 [pdf, other]

Enhancing Low-Light Images Using Infrared-Encoded Images

Authors: Shulin Tian, Yufei Wang, Renjie Wan, Wenhan Yang, Alex C. Kot, Bihan Wen

Abstract: Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility… ▽ More Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility of images captured under low-light environments by removing the in-camera infrared (IR) cut-off filter, which allows for the capture of more photons and results in improved signal-to-noise ratio due to the inclusion of information from the IR spectrum. To verify the proposed strategy, we collect a paired dataset of low-light images captured without the IR cut-off filter, with corresponding long-exposure reference images with an external filter. The experimental results on the proposed dataset demonstrate the effectiveness of the proposed method, showing better performance quantitatively and qualitatively. The dataset and code are publicly available at https://wyf0912.github.io/ELIEI/ △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: The first two authors contribute equally. The work is accepted by ICIP 2023

arXiv:2306.11503 [pdf, other]

The Age of Synthetic Realities: Challenges and Opportunities

Authors: João Phillipe Cardenuto, **g Yang, Rafael Padilha, Renjie Wan, Daniel Moreira, Haoliang Li, Shiqi Wang, Fernanda Andaló, Sébastien Marcel, Anderson Rocha

Abstract: Synthetic realities are digital creations or augmentations that are contextually generated through the use of Artificial Intelligence (AI) methods, leveraging extensive amounts of data to construct new narratives or realities, regardless of the intent to deceive. In this paper, we delve into the concept of synthetic realities and their implications for Digital Forensics and society at large within… ▽ More Synthetic realities are digital creations or augmentations that are contextually generated through the use of Artificial Intelligence (AI) methods, leveraging extensive amounts of data to construct new narratives or realities, regardless of the intent to deceive. In this paper, we delve into the concept of synthetic realities and their implications for Digital Forensics and society at large within the rapidly advancing field of AI. We highlight the crucial need for the development of forensic techniques capable of identifying harmful synthetic creations and distinguishing them from reality. This is especially important in scenarios involving the creation and dissemination of fake news, disinformation, and misinformation. Our focus extends to various forms of media, such as images, videos, audio, and text, as we examine how synthetic realities are crafted and explore approaches to detecting these malicious creations. Additionally, we shed light on the key research challenges that lie ahead in this area. This study is of paramount importance due to the rapid progress of AI generative techniques and their impact on the fundamental principles of Forensic Science. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2305.15070 [pdf, other]

Annotation Imputation to Individualize Predictions: Initial Studies on Distribution Dynamics and Model Predictions

Authors: London Lowmanstone, Ruyuan Wan, Risako Owan, Jaehyung Kim, Dongyeop Kang

Abstract: Annotating data via crowdsourcing is time-consuming and expensive. Due to these costs, dataset creators often have each annotator label only a small subset of the data. This leads to sparse datasets with examples that are marked by few annotators. The downside of this process is that if an annotator doesn't get to label a particular example, their perspective on it is missed. This is especially co… ▽ More Annotating data via crowdsourcing is time-consuming and expensive. Due to these costs, dataset creators often have each annotator label only a small subset of the data. This leads to sparse datasets with examples that are marked by few annotators. The downside of this process is that if an annotator doesn't get to label a particular example, their perspective on it is missed. This is especially concerning for subjective NLP datasets where there is no single correct label: people may have different valid opinions. Thus, we propose using imputation methods to generate the opinions of all annotators for all examples, creating a dataset that does not leave out any annotator's view. We then train and prompt models, using data from the imputed dataset, to make predictions about the distribution of responses and individual annotations. In our analysis of the results, we found that the choice of imputation method significantly impacts soft label changes and distribution. While the imputation introduces noise in the prediction of the original dataset, it has shown potential in enhancing shots for prompts, particularly for low-response-rate annotators. We have made all of our code and data publicly available. △ Less

Submitted 5 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NLPerspectives - 2nd Workshop on Perspectivist Approaches to NLP, 39 pages, 13 figures, 13 tables

Journal ref: 2nd Workshop on Perspectivist Approaches to NLP 2023

arXiv:2305.09110 [pdf, ps, other]

Sharp maximal function estimates for Hilbert transforms along monomial curves in higher dimensions

Authors: Renhui Wan

Abstract: For any nonempty set $U\subset\R^+$, we consider the maximal operator $\h^U$ defined as $\h^Uf=\sup_{u\in U}|H^{(u)} f|$, where $H^{(u)}$ represents the Hilbert transform along the monomial curve $uγ(s)$. We focus on the $L^p(\mathbb{R}^d)$ operator norm of $\h^U$ for $p\in (p_\circ(d),\infty)$, where $p_\circ(d)$ is the optimal exponent known for the $L^p$ boundedness of the maximal averaging ope… ▽ More For any nonempty set $U\subset\R^+$, we consider the maximal operator $\h^U$ defined as $\h^Uf=\sup_{u\in U}|H^{(u)} f|$, where $H^{(u)}$ represents the Hilbert transform along the monomial curve $uγ(s)$. We focus on the $L^p(\mathbb{R}^d)$ operator norm of $\h^U$ for $p\in (p_\circ(d),\infty)$, where $p_\circ(d)$ is the optimal exponent known for the $L^p$ boundedness of the maximal averaging operator obtained by Ko-Lee-Oh \cite{KLO22,KLO23} and Beltran-Guo-Hickman-Seeger \cite{BGHS}. To achieve this goal, we employ a novel bootstrap** argument to establish a maximal estimate for the Mihlin-Hörmander-type multiplier, along with utilizing the local smoothing estimate for the averaging operator and its vector-valued extension to obtain crucial decay estimates. Furthermore, our approach offers an alternative means for deriving the upper bound established in \cite{Guo20}. △ Less

Submitted 3 December, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: minor corrections, proof unchanged

MSC Class: 42B25

arXiv:2304.11393 [pdf, other]

Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation

Authors: Feng Jiang, Heng Gao, Shoumeng Qiu, Haiqiang Zhang, Ru Wan, Jian Pu

Abstract: LiDAR point cloud segmentation is one of the most fundamental tasks for autonomous driving scene understanding. However, it is difficult for existing models to achieve both high inference speed and accuracy simultaneously. For example, voxel-based methods perform well in accuracy, while Bird's-Eye-View (BEV)-based methods can achieve real-time inference. To overcome this issue, we develop an effec… ▽ More LiDAR point cloud segmentation is one of the most fundamental tasks for autonomous driving scene understanding. However, it is difficult for existing models to achieve both high inference speed and accuracy simultaneously. For example, voxel-based methods perform well in accuracy, while Bird's-Eye-View (BEV)-based methods can achieve real-time inference. To overcome this issue, we develop an effective 3D-to-BEV knowledge distillation method that transfers rich knowledge from 3D voxel-based models to BEV-based models. Our framework mainly consists of two modules: the voxel-to-pillar distillation module and the label-weight distillation module. Voxel-to-pillar distillation distills sparse 3D features to BEV features for middle layers to make the BEV-based model aware of more structural and geometric information. Label-weight distillation helps the model pay more attention to regions with more height information. Finally, we conduct experiments on the SemanticKITTI dataset and Paris-Lille-3D. The results on SemanticKITTI show more than 5% improvement on the test set, especially for classes such as motorcycle and person, with more than 15% improvement. The code can be accessed at https://github.com/fengjiang5/Knowledge-Distillation-from-Cylinder3D-to-PolarNet. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Comments: ICME 2023 Accepted

arXiv:2304.00420 [pdf, other]

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring

Authors: Runzhe Wan, Yu Liu, James McQueen, Doug Hains, Rui Song

Abstract: With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible. Therefore, there is an increasing demand for an efficient continuous monitoring service that allows early stop** when appropriate. Classic statistical methods focus on hypothesis testing and are mostly developed for traditional high-stake probl… ▽ More With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible. Therefore, there is an increasing demand for an efficient continuous monitoring service that allows early stop** when appropriate. Classic statistical methods focus on hypothesis testing and are mostly developed for traditional high-stake problems such as clinical trials, while experiments at online service companies typically have very different features and focuses. Motivated by the real needs, in this paper, we introduce a novel framework that we developed in Amazon to maximize customer experience and control opportunity cost. We formulate the problem as a Bayesian optimal sequential decision making problem that has a unified utility function. We discuss extensively practical design choices and considerations. We further introduce how to solve the optimal decision rule via Reinforcement Learning and scale the solution. We show the effectiveness of this novel approach compared with existing methods via a large-scale meta-analysis on experiments in Amazon. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.06577 [pdf, other]

doi 10.1007/s41365-024-01387-4

Investigating the elliptic anisotropy of identified particles in p--Pb collisions with a multi-phase transport model

Authors: Siyu Tang, Liang Zheng, Xiaoming Zhang, Renzhuo Wan

Abstract: The elliptic azimuthal anisotropy coefficient ($v_{2}$) of the identified particles at midrapidity ($|η|<0.8$) was investigated in p--Pb collisions at $\sqrt{s_\mathrm{NN}}=$ 5.02 TeV using a multi-phase transport model (AMPT). The calculations of differential $v_{2}$ based on the advanced flow extraction method of light flavor hadrons (pions, kaons, protons, and $Λ$) in small collision systems we… ▽ More The elliptic azimuthal anisotropy coefficient ($v_{2}$) of the identified particles at midrapidity ($|η|<0.8$) was investigated in p--Pb collisions at $\sqrt{s_\mathrm{NN}}=$ 5.02 TeV using a multi-phase transport model (AMPT). The calculations of differential $v_{2}$ based on the advanced flow extraction method of light flavor hadrons (pions, kaons, protons, and $Λ$) in small collision systems were extended to a wider transverse momentum ($p_{\mathrm{T}}$) range of up to 8 GeV/$c$ for the first time. The string-melting version of the AMPT model provides a good description of the measured $p_{\mathrm{T}}$-differential $v_{2}$ of the mesons but exhibits a slight deviation from the baryon $v_{2}$. In addition, we observed the features of mass ordering at low $p_{\mathrm{T}}$ and the approximate number of constituent quarks (NCQ) scaled at intermediate $p_{\mathrm{T}}$. Moreover, we demonstrate that hadronic rescattering does not have a significant impact on $v_{2}$ in p--Pb collisions for different centrality selections, whereas partonic scattering dominates in generating the elliptic anisotropy of the final particles. This study provides further insight into the origin of collective-like behavior in small collision systems and has referential value for future measurements of azimuthal anisotropy. △ Less

Submitted 28 March, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: 9 pages, 9 figures

Journal ref: NUCL SCI TECH 35, 32 (2024)

arXiv:2302.13528 [pdf, other]

Study of jet substructure modification by differential girth at the LHC

Authors: Si-Yu Tang, Wei Zhang, Wen-Di Deng, Cheng-De Tian, Fan Yang, Ren-Zhuo Wan

Abstract: Jet substructure observables are crucial for exploring the effects of the hot, dense medium and differentiating between quark and gluon jets. In this paper, we investigate the modification of jet shape by calculating the differential girth at LHC energies, ranging from pp to Pb--Pb collisions. The differential girth distribution exhibits a wave-like pattern, which offers a better explanation for t… ▽ More Jet substructure observables are crucial for exploring the effects of the hot, dense medium and differentiating between quark and gluon jets. In this paper, we investigate the modification of jet shape by calculating the differential girth at LHC energies, ranging from pp to Pb--Pb collisions. The differential girth distribution exhibits a wave-like pattern, which offers a better explanation for the radial spreading of jet energy and large-angle radiation effects. We fit the differential girth distribution with a sine wave function to obtain the amplitude, angular frequency, initial phase, and offset distance parameters as a function of jet radius. The results clearly differentiate between pp and heavy-ion collisions, with varying dependencies on centrality and collision energy. Our findings shed new light on the understanding of radial jet energy loss in the QGP (quark-gluon plasma) medium, and provide additional potential observables for measuring jet shape at the LHC in the future. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 7 pages, 6 figures

arXiv:2302.13251 [pdf, other]

Unsupervised Domain Adaptation for Low-dose CT Reconstruction via Bayesian Uncertainty Alignment

Authors: Kecheng Chen, Jie Liu, Renjie Wan, Victor Ho-Fun Lee, Varut Vardhanabhuti, Hong Yan, Haoliang Li

Abstract: Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised d… ▽ More Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance. △ Less

Submitted 2 June, 2024; v1 submitted 26 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2302.06169 [pdf, ps, other]

New Quantum MDS codes from Hermitian self-orthogonal generalized Reed-Solomon codes

Authors: Ruhao Wan, Shixin Zhu

Abstract: Quantum maximum-distance-separable (MDS for short) codes are an important class of quantum codes. In this paper, by using Hermitian self-orthogonal generalized Reed-Solomon (GRS for short) codes, we construct five new classes of $q$-ary quantum MDS codes with minimum distance larger than $q/2+1$. Furthermore, the parameters of our quantum MDS code cannot be obtained from the previous constructions… ▽ More Quantum maximum-distance-separable (MDS for short) codes are an important class of quantum codes. In this paper, by using Hermitian self-orthogonal generalized Reed-Solomon (GRS for short) codes, we construct five new classes of $q$-ary quantum MDS codes with minimum distance larger than $q/2+1$. Furthermore, the parameters of our quantum MDS code cannot be obtained from the previous constructions. △ Less

Submitted 9 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 19 pages, 3 tables

MSC Class: 94B05; 81P70

arXiv:2302.05746 [pdf, other]

Removing Image Artifacts From Scratched Lens Protectors

Authors: Yufei Wang, Renjie Wan, Wenhan Yang, Bihan Wen, Lap-Pui Chau, Alex C. Kot

Abstract: A protector is placed in front of the camera lens for mobile devices to avoid damage, while the protector itself can be easily scratched accidentally, especially for plastic ones. The artifacts appear in a wide variety of patterns, making it difficult to see through them clearly. Removing image artifacts from the scratched lens protector is inherently challenging due to the occasional flare artifa… ▽ More A protector is placed in front of the camera lens for mobile devices to avoid damage, while the protector itself can be easily scratched accidentally, especially for plastic ones. The artifacts appear in a wide variety of patterns, making it difficult to see through them clearly. Removing image artifacts from the scratched lens protector is inherently challenging due to the occasional flare artifacts and the co-occurring interference within mixed artifacts. Though different methods have been proposed for some specific distortions, they seldom consider such inherent challenges. In our work, we consider the inherent challenges in a unified framework with two cooperative modules, which facilitate the performance boost of each other. We also collect a new dataset from the real world to facilitate training and evaluation purposes. The experimental results demonstrate that our method outperforms the baselines qualitatively and quantitatively. The code and datasets will be released after acceptance. △ Less

Submitted 14 February, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted by ISCAS 2023

arXiv:2302.01543 [pdf, other]

Multiplier Bootstrap-based Exploration

Authors: Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song

Abstract: Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. We prove both instance-dependent a… ▽ More Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. We prove both instance-dependent and instance-independent rate-optimal regret bounds for MBE in sub-Gaussian multi-armed bandits. With extensive simulation and real data experiments, we show the generality and adaptivity of MBE. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.13152 [pdf, other]

STEEL: Singularity-aware Reinforcement Learning

Authors: Xiaohong Chen, Zhengling Qi, Runzhe Wan

Abstract: Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there do not exist non-overlap** regions) on the distribution induced by target policies with respect to the data distribution over either the state or action or b… ▽ More Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there do not exist non-overlap** regions) on the distribution induced by target policies with respect to the data distribution over either the state or action or both. We propose a new batch RL algorithm that allows for singularity for both state and action spaces (e.g., existence of non-overlap** regions between offline data distribution and the distribution induced by the target policies) in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and under some technical conditions, we derive a first finite-sample regret guarantee for our proposed algorithm under singularity. Compared with existing algorithms,by requiring only minimal data-coverage assumption, STEEL improves the applicability and robustness of batch RL. In addition, a two-step adaptive STEEL, which is nearly tuning-free, is proposed. Extensive simulation studies and one (semi)-real experiment on personalized pricing demonstrate the superior performance of our methods in dealing with possible singularity in batch RL. △ Less

Submitted 25 June, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2301.07301 [pdf, other]

PTA-Det: Point Transformer Associating Point cloud and Image for 3D Object Detection

Authors: Rui Wan, Tianyun Zhao, Wei Zhao

Abstract: In autonomous driving, 3D object detection based on multi-modal data has become an indispensable approach when facing complex environments around the vehicle. During multi-modal detection, LiDAR and camera are simultaneously applied for capturing and modeling. However, due to the intrinsic discrepancies between the LiDAR point and camera image, the fusion of the data for object detection encounter… ▽ More In autonomous driving, 3D object detection based on multi-modal data has become an indispensable approach when facing complex environments around the vehicle. During multi-modal detection, LiDAR and camera are simultaneously applied for capturing and modeling. However, due to the intrinsic discrepancies between the LiDAR point and camera image, the fusion of the data for object detection encounters a series of problems. Most multi-modal detection methods perform even worse than LiDAR-only methods. In this investigation, we propose a method named PTA-Det to improve the performance of multi-modal detection. Accompanied by PTA-Det, a Pseudo Point Cloud Generation Network is proposed, which can convert image information including texture and semantic features by pseudo points. Thereafter, through a transformer-based Point Fusion Transition (PFT) module, the features of LiDAR points and pseudo points from image can be deeply fused under a unified point-based representation. The combination of these modules can conquer the major obstacle in feature fusion across modalities and realizes a complementary and discriminative representation for proposal generation. Extensive experiments on the KITTI dataset show the PTA-Det achieves a competitive result and support its effectiveness. △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2301.05036 [pdf, other]

Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information

Authors: Ruyuan Wan, Jaehyung Kim, Dongyeop Kang

Abstract: In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on the agreement of major annotators. However, annotators are individuals with different backgrounds, and minors' opinions should not be simply ignored. As annotation tasks become subjective and topics are controversial in modern NLP tasks, we need NLP systems that can represent… ▽ More In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on the agreement of major annotators. However, annotators are individuals with different backgrounds, and minors' opinions should not be simply ignored. As annotation tasks become subjective and topics are controversial in modern NLP tasks, we need NLP systems that can represent people's diverse voices on subjective matters and predict the level of diversity. This paper examines whether the text of the task and annotators' demographic background information can be used to estimate the level of disagreement among annotators. Particularly, we extract disagreement labels from the annotators' voting histories in the five subjective datasets, and then fine-tune language models to predict annotators' disagreement. Our results show that knowing annotators' demographic information, like gender, ethnicity, and education level, helps predict disagreements. In order to distinguish the disagreement from the inherent controversy from text content and the disagreement in the annotators' different perspectives, we simulate everyone's voices with different combinations of annotators' artificial demographics and examine its variance of the finetuned disagreement predictor. Our paper aims to improve the annotation process for more efficient and inclusive NLP systems through a novel disagreement prediction mechanism. Our code and dataset are publicly available. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.14580 [pdf, ps, other]

Heterogeneous Synthetic Learner for Panel Data

Authors: Ye Shen, Runzhe Wan, Hengrui Cai, Rui Song

Abstract: In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on th… ▽ More In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies. △ Less

Submitted 29 January, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

arXiv:2212.12845 [pdf, ps, other]

Mining the Factor Zoo: Estimation of Latent Factor Models with Sufficient Proxies

Authors: Runzhe Wan, Yingying Li, Wenbin Lu, Rui Song

Abstract: Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing th… ▽ More Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing the number of factor proxies to diverge, and hence make the latent factor model estimation robust, flexible, and statistically more accurate. As a bonus, the number of factors is also allowed to grow. At the heart of our method is a penalized reduced rank regression to combine information. To further deal with heavy-tailed data, a computationally attractive penalized robust reduced rank regression method is proposed. We establish faster rates of convergence compared with the benchmark. Extensive simulations and real examples are used to illustrate the advantages. △ Less

Submitted 2 January, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

arXiv:2211.01553 [pdf, other]

User or Labor: An Interaction Framework for Human-Machine Relationships in NLP

Authors: Ruyuan Wan, Naome Etori, Karla Badillo-Urquiola, Dongyeop Kang

Abstract: The bridging research between Human-Computer Interaction and Natural Language Processing is develo** quickly these years. However, there is still a lack of formative guidelines to understand the human-machine interaction in the NLP loop. When researchers crossing the two fields talk about humans, they may imply a user or labor. Regarding a human as a user, the human is in control, and the machin… ▽ More The bridging research between Human-Computer Interaction and Natural Language Processing is develo** quickly these years. However, there is still a lack of formative guidelines to understand the human-machine interaction in the NLP loop. When researchers crossing the two fields talk about humans, they may imply a user or labor. Regarding a human as a user, the human is in control, and the machine is used as a tool to achieve the human's goals. Considering a human as a laborer, the machine is in control, and the human is used as a resource to achieve the machine's goals. Through a systematic literature review and thematic analysis, we present an interaction framework for understanding human-machine relationships in NLP. In the framework, we propose four types of human-machine interactions: Human-Teacher and Machine-Learner, Machine-Leading, Human-Leading, and Human-Machine Collaborators. Our analysis shows that the type of interaction is not fixed but can change across tasks as the relationship between the human and the machine develops. We also discuss the implications of this framework for the future of NLP and human-machine relationships. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.10562 [pdf, ps, other]

Research on Hermitian self-dual codes, GRS codes and EGRS codes

Authors: Ruhao Wan, Shixin Zhu

Abstract: MDS self-dual codes have nice algebraic structures, theoretical significance and practical implications. In this paper, we present three classes of $q^2$-ary Hermitian self-dual (extended) generalized Reed-Solomon codes with different code locators. Combining the results in Ball et al. (Designs, Codes and Cryptography, 89: 811-821, 2021), we show that if the code locators do not contain zero,… ▽ More MDS self-dual codes have nice algebraic structures, theoretical significance and practical implications. In this paper, we present three classes of $q^2$-ary Hermitian self-dual (extended) generalized Reed-Solomon codes with different code locators. Combining the results in Ball et al. (Designs, Codes and Cryptography, 89: 811-821, 2021), we show that if the code locators do not contain zero, $q^2$-ary Hermitian self-dual (extended) GRS codes of length $\geq 2q\ (q>2)$ does not exist. Under certain conditions, we prove Conjecture 3.7 and Conjecture 3.13 proposed by Guo and Li et al. (IEEE Communications Letters, 25(4): 1062-1065, 2021). △ Less

Submitted 14 December, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 18 pages

MSC Class: 94B05; 81p70

arXiv:2209.12254 [pdf, other]

From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion

Authors: Rui Wan, Shuangjie Xu, Wei Wu, Xiaoyi Zou, Tongyi Cao

Abstract: LiDAR and cameras are two complementary sensors for 3D perception in autonomous driving. LiDAR point clouds have accurate spatial and geometry information, while RGB images provide textural and color data for context reasoning. To exploit LiDAR and cameras jointly, existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration, namely one-to-one map**.… ▽ More LiDAR and cameras are two complementary sensors for 3D perception in autonomous driving. LiDAR point clouds have accurate spatial and geometry information, while RGB images provide textural and color data for context reasoning. To exploit LiDAR and cameras jointly, existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration, namely one-to-one map**. However, the performance of these approaches highly relies on the calibration quality, which is sensitive to the temporal and spatial synchronization of sensors. Therefore, we propose a Dynamic Cross Attention (DCA) module with a novel one-to-many cross-modality map** that learns multiple offsets from the initial projection towards the neighborhood and thus develops tolerance to calibration error. Moreover, a \textit{dynamic query enhancement} is proposed to perceive the model-independent calibration, which further strengthens DCA's tolerance to the initial misalignment. The whole fusion architecture named Dynamic Cross Attention Network (DCAN) exploits multi-level image features and adapts to multiple representations of point clouds, which allows DCA to serve as a plug-in fusion module. Extensive experiments on nuScenes and KITTI prove DCA's effectiveness. The proposed DCAN outperforms state-of-the-art methods on the nuScenes detection challenge. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.05825 [pdf, ps, other]

$L^p$ estimates for Hilbert transform and maximal operator associated to variable polynomial

Authors: Renhui Wan

Abstract: We investigate the Hilbert transform and the maximal operator along a class of variable non-flat polynomial curves $(P(t),u(x)t)$ with measurable $u(x)$, and prove uniform $L^p$ estimates for $1<p<\infty$. In particular, via the change of variable, these uniform estimates are equal to the ones for the curves $(P(v(x)t),t)$ with measurable $v(x)$. To obtain the desired bound, we make full use of ti… ▽ More We investigate the Hilbert transform and the maximal operator along a class of variable non-flat polynomial curves $(P(t),u(x)t)$ with measurable $u(x)$, and prove uniform $L^p$ estimates for $1<p<\infty$. In particular, via the change of variable, these uniform estimates are equal to the ones for the curves $(P(v(x)t),t)$ with measurable $v(x)$. To obtain the desired bound, we make full use of time-frequency techniques and establish a crucial $ε$-improving estimate for some special separate sets. △ Less

Submitted 31 May, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: make some changes including the title

arXiv:2207.11744 [pdf, ps, other]

New MDS self-dual codes over finite fields $\F_{r^2}$

Authors: Ruhao Wan, Yang Li, Shixin Zhu

Abstract: MDS self-dual codes have nice algebraic structures and are uniquely determined by lengths. Recently, the construction of MDS self-dual codes of new lengths has become an important and hot issue in coding theory. In this paper, we develop the existing theory and construct six new classes of MDS self-dual codes. Together with our constructions, the proportion of all known MDS self-dual codes relativ… ▽ More MDS self-dual codes have nice algebraic structures and are uniquely determined by lengths. Recently, the construction of MDS self-dual codes of new lengths has become an important and hot issue in coding theory. In this paper, we develop the existing theory and construct six new classes of MDS self-dual codes. Together with our constructions, the proportion of all known MDS self-dual codes relative to possible MDS self-dual codes generally exceed 57\%. As far as we know, this is the largest known ratio. Moreover, some new families of MDS self-orthogonal codes and MDS almost self-dual codes are also constructed. △ Less

Submitted 3 October, 2022; v1 submitted 24 July, 2022; originally announced July 2022.

Comments: 16 pages, 3 table

MSC Class: 94B05; 81p70 ACM Class: E.4

arXiv:2207.04232 [pdf, ps, other]

Construction of MDS self-dual codes from generalized Reed-Solomon codes

Authors: Ruhao Wan, Shixin Zhu, ** Li

Abstract: MDS codes and self-dual codes are important families of classical codes in coding theory. It is of interest to investigate MDS self-dual codes. The existence of MDS self-dual codes over finite field $F_q$ is completely solved for $q$ is even. In this paper, for finite field with odd characteristic, we construct some new classes of MDS self-dual codes by (extended) generalized Reed-Solomon codes. MDS codes and self-dual codes are important families of classical codes in coding theory. It is of interest to investigate MDS self-dual codes. The existence of MDS self-dual codes over finite field $F_q$ is completely solved for $q$ is even. In this paper, for finite field with odd characteristic, we construct some new classes of MDS self-dual codes by (extended) generalized Reed-Solomon codes. △ Less

Submitted 27 August, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: 24 pages,2 table

MSC Class: 94B05; 81p70 ACM Class: E.4

arXiv:2206.06615 [pdf, ps, other]

MDS Codes with Euclidean and Hermitian Hulls of Flexible Dimensions and Their Applications to EAQECCs

Authors: Yang Li, Ruhao Wan, Shixin Zhu

Abstract: The hull of a linear code is the intersection of itself with its dual code with respect to certain inner product. Both Euclidean and Hermitian hulls are of theorical and practical significance. In this paper, we construct several new classes of MDS codes via (extended) generalized Reed-Solomon (GRS) codes and determine their Euclidean or Hermitian hulls. Specifically, four new classes of MDS codes… ▽ More The hull of a linear code is the intersection of itself with its dual code with respect to certain inner product. Both Euclidean and Hermitian hulls are of theorical and practical significance. In this paper, we construct several new classes of MDS codes via (extended) generalized Reed-Solomon (GRS) codes and determine their Euclidean or Hermitian hulls. Specifically, four new classes of MDS codes with Hermitian hulls of flexible dimensions and six new classes of MDS codes with Euclidean hulls of flexible dimensions are constructed. For the former, we further construct four new classes of entanglement-assisted quantum error-correcting codes (EAQECCs) and four new classes of MDS EAQECCs of length $n>q+1$. For the latter, we also give some examples on Euclidean self-orthogonal and one-dimensional Euclidean hull MDS codes. △ Less

Submitted 5 October, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 25 pages, 5 tables

MSC Class: 94B05; 81p70

arXiv:2204.12148 [pdf, other]

Morest: Model-based RESTful API Testing with Execution Feedback

Authors: Yi Liu, Yuekang Li, Gelei Deng, Yang Liu, Ruiyuan Wan, Runchao Wu, Dandan Ji, Shiheng Xu, Minli Bao

Abstract: RESTful APIs are arguably the most popular endpoints for accessing Web services. Blackbox testing is one of the emerging techniques for ensuring the reliability of RESTful APIs. The major challenge in testing RESTful APIs is the need for correct sequences of API operation calls for in-depth testing. To build meaningful operation call sequences, researchers have proposed techniques to learn and uti… ▽ More RESTful APIs are arguably the most popular endpoints for accessing Web services. Blackbox testing is one of the emerging techniques for ensuring the reliability of RESTful APIs. The major challenge in testing RESTful APIs is the need for correct sequences of API operation calls for in-depth testing. To build meaningful operation call sequences, researchers have proposed techniques to learn and utilize the API dependencies based on OpenAPI specifications. However, these techniques either lack the overall awareness of how all the APIs are connected or the flexibility of adaptively fixing the learned knowledge. In this paper, we propose Morest, a model-based RESTful API testing technique that builds and maintains a dynamically updating RESTful-service Property Graph (RPG) to model the behaviors of RESTful-services and guide the call sequence generation. We empirically evaluated Morest and the results demonstrate that Morest can successfully request an average of 152.66%-232.45% more API operations, cover 26.16%-103.24% more lines of code, and detect 40.64%-215.94% more bugs than state-of-the-art techniques. In total, we applied Morest to 6 real-world projects and found 44 bugs (13 of them cannot be detected by existing approaches). Specifically, 2 of the confirmed bugs are from Bitbucket, a famous code management service with more than 6 million users. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Journal ref: 44th International Conference on Software Engineering (ICSE 2022)

arXiv:2204.12088 [pdf, ps, other]

A physics-informed deep neural network for surrogate modeling in classical elasto-plasticity

Authors: Mahdad Eghbalian, Mehdi Pouragha, Richard Wan

Abstract: In this work, we present a deep neural network architecture that can efficiently approximate classical elasto-plastic constitutive relations. The network is enriched with crucial physics aspects of classical elasto-plasticity, including additive decomposition of strains into elastic and plastic parts, and nonlinear incremental elasticity. This leads to a Physics-Informed Neural Network (PINN) surr… ▽ More In this work, we present a deep neural network architecture that can efficiently approximate classical elasto-plastic constitutive relations. The network is enriched with crucial physics aspects of classical elasto-plasticity, including additive decomposition of strains into elastic and plastic parts, and nonlinear incremental elasticity. This leads to a Physics-Informed Neural Network (PINN) surrogate model named here as Elasto-Plastic Neural Network (EPNN). Detailed analyses show that embedding these physics into the architecture of the neural network facilitates a more efficient training of the network with less training data, while also enhancing the extrapolation capability for loading regimes outside the training data. The architecture of EPNN is model and material-independent, i.e. it can be adapted to a wide range of elasto-plastic material types, including geomaterials and metals; and experimental data can potentially be directly used in training the network. To demonstrate the robustness of the proposed architecture, we adapt its general framework to the elasto-plastic behavior of sands. We use synthetic data generated from material point simulations based on a relatively advanced dilatancy-based constitutive model for granular materials to train the neural network. The superiority of EPNN over regular neural network architectures is explored through predicting unseen strain-controlled loading paths for sands with different initial densities. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: 53 pages, 30 figures, preprint submitted to Elsevier

MSC Class: 74C05; 65N99 ACM Class: J.2; I.6.5

arXiv:2202.13234 [pdf, other]

Safe Exploration for Efficient Policy Evaluation and Comparison

Authors: Runzhe Wan, Branislav Kveton, Rui Song

Abstract: High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm f… ▽ More High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm for computing it. Both theoretical analysis and experiments support the usefulness of the proposed methods. △ Less

Submitted 18 June, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

arXiv:2202.13227 [pdf, other]

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

Authors: Runzhe Wan, Lin Ge, Rui Song

Abstract: Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the parameter space can be factorized to item-level. The novel bandit algorithm is general to be applied to many popular problems,scalable to the huge parameter and action… ▽ More Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the parameter space can be factorized to item-level. The novel bandit algorithm is general to be applied to many popular problems,scalable to the huge parameter and action spaces, and robust to the specification of the generalization model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Both theoretical analysis and numerical results support the usefulness of the proposed method. △ Less

Submitted 26 February, 2022; originally announced February 2022.

arXiv:2202.10574 [pdf, other]

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Authors: Chengchun Shi, Runzhe Wan, Ge Song, Shikai Luo, Rui Song, Hongtu Zhu

Abstract: The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multi… ▽ More The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at https://github.com/RunzheStat/CausalMARL. △ Less

Submitted 26 March, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

arXiv:2201.05972 [pdf, other]

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

Authors: Shuangjie Xu, Rui Wan, Maosheng Ye, Xiaoyi Zou, Tongyi Cao

Abstract: Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grou** processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-… ▽ More Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grou** processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-view (BEV) dense centroid representation that downplays geometry. However, the long-range geometry relationship has not been sufficiently modeled by local feature learning from the above methods. To this end, we present SCAN, a novel sparse cross-scale attention network to first align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context, which can boost the regression accuracy of the over-segmented large objects. For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features to solve the under-segmentation on small objects, but also reduce the computation amount of the network through sparse convolution. Our method outperforms previous methods by a large margin in the SemanticKITTI dataset for the challenging 3D PS task, achieving 1st place with a real-time inference speed. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: Accepted by the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2201.03145 [pdf, other]

Enhancing Low-Light Images in Real World via Cross-Image Disentanglement

Authors: Lanqing Guo, Renjie Wan, Wenhan Yang, Alex Kot, Bihan Wen

Abstract: Images captured in the low-light condition suffer from low visibility and various imaging artifacts, e.g., real noise. Existing supervised enlightening algorithms require a large set of pixel-aligned training image pairs, which are hard to prepare in practice. Though weakly-supervised or unsupervised methods can alleviate such challenges without using paired training images, some real-world artifa… ▽ More Images captured in the low-light condition suffer from low visibility and various imaging artifacts, e.g., real noise. Existing supervised enlightening algorithms require a large set of pixel-aligned training image pairs, which are hard to prepare in practice. Though weakly-supervised or unsupervised methods can alleviate such challenges without using paired training images, some real-world artifacts inevitably get falsely amplified because of the lack of corresponded supervision. In this paper, instead of using perfectly aligned images for training, we creatively employ the misaligned real-world images as the guidance, which are considerably easier to collect. Specifically, we propose a Cross-Image Disentanglement Network (CIDN) to separately extract cross-image brightness and image-specific content features from low/normal-light images. Based on that, CIDN can simultaneously correct the brightness and suppress image artifacts in the feature domain, which largely increases the robustness to the pixel shifts. Furthermore, we collect a new low-light image enhancement dataset consisting of misaligned training images with real-world corruptions. Experimental results show that our model achieves state-of-the-art performances on both the newly proposed dataset and other popular low-light datasets. △ Less

Submitted 7 July, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

ACM Class: I.4.3; I.4.4

Showing 1–50 of 104 results for author: Wan, R