Skip to main content

Showing 1–50 of 53 results for author: Sheng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14118  [pdf, other

    eess.IV cs.CV

    Prediction and Reference Quality Adaptation for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Temporal prediction is one of the most important technologies for video compression. Various prediction coding modes are designed in traditional video codecs. Traditional video codecs will adaptively to decide the optimal coding mode according to the prediction quality and reference quality. Recently, learned video codecs have made great progress. However, they ignore the prediction and reference… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2405.19203  [pdf, other

    cs.CV

    $E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

    Authors: Weitian Zhang, Yichao Yan, Yunhui Liu, Xingdong Sheng, Xiaokang Yang

    Abstract: This paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: (1) The unstructured nature of 3D Gaussian makes it incompatible with current generation pipelines; (2) the expressive animation of 3D Gaussian in a generative setting that involves training with multiple subjects remains unexplored. In this paper, we pr… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: https://olivia23333.github.io/E3Gen

  3. arXiv:2404.15033  [pdf, other

    cs.CV

    IPAD: Industrial Process Anomaly Detection Dataset

    Authors: **fan Liu, Yichao Yan, Junjie Li, Weiming Zhao, Pengzhi Chu, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

    Abstract: Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames, and existing large-scale VAD researches primarily focus on road traffic and human activity scenes. In industrial scenes, there are often a variety of unpredictable anomalies, and the VAD method can play a significant role in these scenarios. However, there is a lack of applicable datasets and methods… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  4. arXiv:2404.14177  [pdf, other

    cs.CV

    Face2Face: Label-driven Facial Retouching Restoration

    Authors: Guanhua Zhao, Yu Gu, Xuhan Sheng, Yujie Hu, Jian Zhang

    Abstract: With the popularity of social media platforms such as Instagram and TikTok, and the widespread availability and convenience of retouching tools, an increasing number of individuals are utilizing these tools to beautify their facial photographs. This poses challenges for fields that place high demands on the authenticity of photographs, such as identity verification and social media. By altering fa… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  5. arXiv:2404.12611  [pdf, other

    cs.CV

    Rethinking Clothes Changing Person ReID: Conflicts, Synthesis, and Optimization

    Authors: Junjie Li, Guanshuo Wang, Fufu Yu, Yichao Yan, Qiong Jia, Shouhong Ding, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

    Abstract: Clothes-changing person re-identification (CC-ReID) aims to retrieve images of the same person wearing different outfits. Mainstream researches focus on designing advanced model structures and strategies to capture identity information independent of clothing. However, the same-clothes discrimination as the standard ReID learning objective in CC-ReID is persistently ignored in previous researches.… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  6. arXiv:2404.10312  [pdf, other

    cs.CV eess.IV

    OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

    Authors: Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

    Abstract: Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation method… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  7. arXiv:2404.09624  [pdf, other

    cs.CV

    AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception

    Authors: Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, Guangming Shi

    Abstract: The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic M… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  8. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  9. arXiv:2401.16204  [pdf

    cs.ET cs.AR

    Computing High-Degree Polynomial Gradients in Memory

    Authors: T. Bhattacharya, G. H. Hutchinson, G. Pedretti, X. Sheng, J. Ignowski, T. Van Vaerenbergh, R. Beausoleil, J. P. Strachan, D. B. Strukov

    Abstract: Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine learning, and operations research applications. Prior work on such hardware, performed in the context of the Ising Machines and related concepts, is limited to quadr… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 36 pages, 16 figures

  10. arXiv:2401.15864  [pdf, other

    cs.CV eess.IV

    Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Video compression performance is closely related to the accuracy of inter prediction. It tends to be difficult to obtain accurate inter prediction for the local video regions with inconsistent motion and occlusion. Traditional video coding standards propose various technologies to handle motion inconsistency and occlusion, such as recursive partitions, geometric partitions, and long-term reference… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  11. arXiv:2401.08276  [pdf, other

    cs.CV cs.CL

    AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception

    Authors: Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, Haoning Wu, Pengfei Chen, Yuzhe Yang, Leida Li, Weisi Lin

    Abstract: With collective endeavors, multimodal large language models (MLLMs) are undergoing a flourishing development. However, their performances on image aesthetics perception remain indeterminate, which is highly desired in real-world applications. An obvious obstacle lies in the absence of a specific benchmark to evaluate the effectiveness of MLLMs on aesthetic perception. This blind gro** may impede… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  12. arXiv:2312.16051  [pdf, other

    cs.CV

    Inter-X: Towards Versatile Human-Human Interaction Analysis

    Authors: Liang Xu, Xintao Lv, Yichao Yan, Xin **, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang

    Abstract: The analysis of the ubiquitous human-human interactions is pivotal for understanding humans as social beings. Existing human-human interaction datasets typically suffer from inaccurate body motions, lack of hand gestures and fine-grained textual descriptions. To better perceive and generate human-human interactions, we propose Inter-X, a currently largest human-human interaction dataset with accur… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Project page: https://liangxuy.github.io/inter-x/

  13. arXiv:2312.15867  [pdf, other

    cs.CL cs.CR

    Punctuation Matters! Stealthy Backdoor Attack for Language Models

    Authors: Xuan Sheng, Zhicheng Li, Zhaoyang Han, Xiangmao Chang, Piji Li

    Abstract: Recent studies have pointed out that natural language processing (NLP) models are vulnerable to backdoor attacks. A backdoored model produces normal outputs on the clean samples while performing improperly on the texts with triggers that the adversary injects. However, previous studies on textual backdoor attack pay little attention to stealthiness. Moreover, some attack methods even cause grammat… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: NLPCC 2023

  14. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  15. arXiv:2312.10007  [pdf, other

    cs.CL cs.LG

    Faithful Persona-based Conversational Dataset Generation with Large Language Models

    Authors: Pegah Jandaghi, XiangHai Sheng, Xinyi Bai, Jay Pujara, Hakim Sidahmed

    Abstract: High-quality conversational datasets are essential for develo** AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and comprehensive persona-based… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  16. arXiv:2312.08727  [pdf, other

    cs.IR

    Calibration-compatible Listwise Distillation of Privileged Features for CTR Prediction

    Authors: Xiaoqiang Gui, Yueyao Cheng, Xiang-Rong Sheng, Yunfeng Zhao, Guoxian Yu, Shuguang Han, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: In machine learning systems, privileged features refer to the features that are available during offline training but inaccessible for online serving. Previous studies have recognized the importance of privileged features and explored ways to tackle online-offline discrepancies. A typical practice is privileged features distillation (PFD): train a teacher model using all features (including privil… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by WSDM'24

  17. arXiv:2310.04984  [pdf, other

    cs.IT cs.LG eess.SP math.PR stat.ML

    Model-adapted Fourier sampling for generative compressed sensing

    Authors: Aaron Berk, Simone Brugiapaglia, Yaniv Plan, Matthew Scott, Xia Sheng, Ozgur Yilmaz

    Abstract: We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each comp… ▽ More

    Submitted 17 November, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: 12 pages, 4 figures. Submitted to the NeurIPS 2023 Workshop on Deep Learning and Inverse Problems. This revision features additional attribution of work, aknowledgmenents, and a correction in definition 1.1

  18. arXiv:2309.09044  [pdf, other

    cs.LG

    Study of Enhanced MISC-Based Sparse Arrays with High uDOFs and Low Mutual Coupling

    Authors: X. Sheng, D. Lu, Y. Li, R. C. de Lamare

    Abstract: In this letter, inspired by the maximum inter-element spacing (IES) constraint (MISC) criterion, an enhanced MISC-based (EMISC) sparse array (SA) with high uniform degrees-of-freedom (uDOFs) and low mutual-coupling (MC) is proposed, analyzed and discussed in detail. For the EMISC SA, an IES set is first determined by the maximum IES and number of elements. Then, the EMISC SA is composed of seven u… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 6 pages 4 figures

  19. arXiv:2308.09247  [pdf, other

    cs.CV cs.AI

    Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos

    Authors: Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao, Longguang Wang, Yulan Guo, Hehe Fan

    Abstract: We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. Previous methods commonly conduct representation learning at the clip or frame level and cannot well capture fine-grained semantics. Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting contrastive… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  20. arXiv:2308.09245  [pdf, other

    cs.CV cs.AI

    Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos

    Authors: Zhiqiang Shen, Xiaoxiao Sheng, Hehe Fan, Longguang Wang, Yulan Guo, Qiong Liu, Hao Wen, Xi Zhou

    Abstract: Recently, the community has made tremendous progress in develo** effective methods for point cloud video understanding that learn from massive amounts of labeled data. However, annotating point cloud videos is usually notoriously expensive. Moreover, training via one or only a few traditional tasks (e.g., classification) may be insufficient to learn subtle details of the spatio-temporal structur… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  21. arXiv:2308.04768  [pdf, other

    cs.IR

    Entire Space Cascade Delayed Feedback Modeling for Effective Conversion Rate Prediction

    Authors: Yunfeng Zhao, Xu Yan, Xiaoqiang Gui, Shuguang Han, Xiang-Rong Sheng, Guoxian Yu, Jufeng Chen, Zhao Xu, Bo Zheng

    Abstract: Conversion rate (CVR) prediction is an essential task for large-scale e-commerce platforms. However, refund behaviors frequently occur after conversion in online shop** systems, which drives us to pay attention to effective conversion for building healthier shop** services. This paper defines the probability of item purchasing without any subsequent refund as an effective conversion rate (ECVR… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted to CIKM'23

  22. arXiv:2307.05092  [pdf, other

    cs.CV eess.IV

    Offline and Online Optical Flow Enhancement for Deep Video Compression

    Authors: Chuanbo Tang, Xihua Sheng, Zhuoyuan Li, Haotian Zhang, Li Li, Dong Liu

    Abstract: Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows,… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 9 pages, 6 figures

  23. arXiv:2306.10681  [pdf, other

    eess.IV cs.CV

    VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

    Authors: Xihua Sheng, Li Li, Dong Liu, Houqiang Li

    Abstract: Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, w… ▽ More

    Submitted 1 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  24. arXiv:2306.10482  [pdf, other

    math.OC cs.CV eess.IV

    Weighted structure tensor total variation for image denoising

    Authors: Xiuhan Sheng, Lijuan Yang, **gya Chang

    Abstract: For image denoising problems, the structure tensor total variation (STV)-based models show good performances when compared with other competing regularization approaches. However, the STV regularizer does not couple the local information of the image and may not maintain the image details. Therefore, we employ the anisotropic weighted matrix introduced in the anisotropic total variation (ATV) mode… ▽ More

    Submitted 4 April, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

  25. arXiv:2306.03516  [pdf, other

    cs.IR cs.LG

    COPR: Consistency-Oriented Pre-Ranking for Online Advertising

    Authors: Zhishan Zhao, **gyue Gao, Yu Zhang, Shuguang Han, Siyuan Lou, Xiang-Rong Sheng, Zhe Wang, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Cascading architecture has been widely adopted in large-scale advertising systems to balance efficiency and effectiveness. In this architecture, the pre-ranking model is expected to be a lightweight approximation of the ranking model, which handles more candidates with strict latency requirements. Due to the gap in model capacity, the pre-ranking and ranking models usually generate inconsistent ra… ▽ More

    Submitted 9 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  26. arXiv:2305.12959  [pdf, other

    cs.CV

    Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning

    Authors: Xiaoxiao Sheng, Zhiqiang Shen, Gang Xiao

    Abstract: We present a new self-supervised paradigm on point cloud sequence understanding. Inspired by the discriminative and generative self-supervised methods, we design two tasks, namely point cloud sequence based Contrastive Prediction and Reconstruction (CPR), to collaboratively learn more comprehensive spatiotemporal representations. Specifically, dense point cloud segments are first input into an enc… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by AAAI2023

  27. arXiv:2305.12837  [pdf, other

    cs.IR cs.AI cs.LG

    Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach

    Authors: Zhangming Chan, Yu Zhang, Shuguang Han, Yong Bai, Xiang-Rong Sheng, Siyuan Lou, Jiacen Hu, Baolin Liu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Conversion rate (CVR) prediction is one of the core components in online recommender systems, and various approaches have been proposed to obtain accurate and well-calibrated CVR estimation. However, we observe that a well-trained CVR prediction model often performs sub-optimally during sales promotions. This can be largely ascribed to the problem of the data distribution shift, in which the conve… ▽ More

    Submitted 26 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at KDD 2023. This work has already been deployed on the display advertising system in Alibaba, bringing substantial economic gains

  28. arXiv:2305.05177  [pdf, other

    cs.CV

    Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution

    Authors: Ming Cheng, Haoyu Ma, Qiufang Ma, Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Xuhan Sheng, Shijie Zhao, Junlin Li, Li Zhang

    Abstract: Multi-stage strategies are frequently employed in image restoration tasks. While transformer-based methods have exhibited high efficiency in single-image super-resolution tasks, they have not yet shown significant advantages over CNN-based methods in stereo super-resolution tasks. This can be attributed to two key factors: first, current single-image super-resolution transformers are unable to lev… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 10 pages, 3 figures, accepted by CVPR workshop 2023

  29. arXiv:2305.04075  [pdf, other

    cs.CV

    PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos

    Authors: Zhiqiang Shen, Xiaoxiao Sheng, Longguang Wang, Yulan Guo, Qiong Liu, Xi Zhou

    Abstract: Self-supervised learning can extract representations of good quality from solely unlabeled data, which is appealing for point cloud videos due to their high labelling cost. In this paper, we propose a contrastive mask prediction (PointCMP) framework for self-supervised learning on point cloud videos. Specifically, our PointCMP employs a two-branch structure to achieve simultaneous learning of both… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: Accepted by CVPR 2023

  30. arXiv:2304.13471  [pdf, other

    eess.IV cs.CV

    OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

    Authors: Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

    Abstract: 360° omnidirectional images have gained research attention due to their immersive and interactive experience, particularly in AR/VR applications. However, they suffer from lower angular resolution due to being captured by fisheye lenses with the same sensor size for capturing planar images. To solve the above issues, we propose a two-stage framework for 360° omnidirectional image superresolution.… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPRW 2023

  31. arXiv:2303.05644  [pdf

    physics.optics cs.ET cs.NE physics.app-ph

    High-Speed and Energy-Efficient Non-Volatile Silicon Photonic Memory Based on Heterogeneously Integrated Memresonator

    Authors: Bassem Tossoun, Di Liang, Stanley Cheung, Zhuoran Fang, Xia Sheng, John Paul Strachan, Raymond G. Beausoleil

    Abstract: Recently, interest in programmable photonics integrated circuits has grown as a potential hardware framework for deep neural networks, quantum computing, and field programmable arrays (FPGAs). However, these circuits are constrained by the limited tuning speed and large power consumption of the phase shifters used. In this paper, introduced for the first time are memresonators, or memristors heter… ▽ More

    Submitted 25 May, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  32. arXiv:2212.10829  [pdf, other

    cs.RO

    Perching on Moving Inclined Surfaces using Uncertainty Tolerant Planner and Thrust Regulation

    Authors: Sensen Liu, Wenkang Hu, Zhaoying Wang, Wei Dong, Xinjun Sheng

    Abstract: Quadrotors with the ability to perch on moving inclined surfaces can save energy and extend their travel distance by leveraging ground vehicles. Achieving dynamic perching places high demands on the performance of trajectory planning and terminal state accuracy in SE(3). However, in the perching process, uncertainties in target surface prediction, tracking control and external disturbances may cau… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  33. arXiv:2211.11958  [pdf, other

    cs.CL cs.CR

    A Survey on Backdoor Attack and Defense in Natural Language Processing

    Authors: Xuan Sheng, Zhaoyang Han, Piji Li, Xiangmao Chang

    Abstract: Deep learning is becoming increasingly popular in real-life applications, especially in natural language processing (NLP). Users often choose training outsourcing or adopt third-party data and models due to data and computation resources being limited. In such a situation, training data and models are exposed to the public. As a result, attackers can manipulate the training process to inject some… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 12 pages, QRS2022

  34. arXiv:2211.01856  [pdf, other

    cs.LG cs.CE eess.SP physics.bio-ph

    Conditional Generative Models for Simulation of EMG During Naturalistic Movements

    Authors: Shihan Ma, Alexander Kenneth Clarke, Kostiantyn Maksymenko, Samuel Deslauriers-Gauthier, Xinjun Sheng, Xiangyang Zhu, Dario Farina

    Abstract: Numerical models of electromyographic (EMG) signals have provided a huge contribution to our fundamental understanding of human neurophysiology and remain a central pillar of motor neuroscience and the development of human-machine interfaces. However, whilst modern biophysical simulations based on finite element methods are highly accurate, they are extremely computationally expensive and thus are… ▽ More

    Submitted 5 October, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

  35. arXiv:2209.06053  [pdf, other

    cs.IR cs.AI cs.LG

    Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models

    Authors: Zhao-Yu Zhang, Xiang-Rong Sheng, Yu**g Zhang, Biye Jiang, Shuguang Han, Hongbo Deng, Bo Zheng

    Abstract: Deep learning techniques have been applied widely in industrial recommendation systems. However, far less attention has been paid to the overfitting problem of models in recommendation systems, which, on the contrary, is recognized as a critical issue for deep neural networks. In the context of Click-Through Rate (CTR) prediction, we observe an interesting one-epoch overfitting problem: the model… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Comments: Accepted by CIKM2022

  36. arXiv:2208.08054  [pdf, other

    cs.RO

    Hierarchical Motion Planning Framework for Cooperative Transportation of Multiple Mobile Manipulators

    Authors: Heng Zhang, Haoyi Song, Wenhang Liu, Xinjun Sheng, Zhenhua Xiong, Xiangyang Zhu

    Abstract: Multiple mobile manipulators show superiority in the tasks requiring mobility and dexterity compared with a single robot, especially when manipulating/transporting bulky objects. When the object and the manipulators are rigidly connected, closed-chain will form and the motion of the whole system will be restricted onto a lower-dimensional manifold. However, current research on multi-robot motion p… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  37. arXiv:2208.06164  [pdf, other

    cs.IR cs.LG

    Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model

    Authors: Xiang-Rong Sheng, **gyue Gao, Yueyao Cheng, Siran Yang, Shuguang Han, Hongbo Deng, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Despite the development of ranking optimization techniques, pointwise loss remains the dominating approach for click-through rate prediction. It can be attributed to the calibration ability of the pointwise loss since the prediction can be viewed as the click probability. In practice, a CTR prediction model is also commonly assessed with the ranking ability. To optimize the ranking ability, rankin… ▽ More

    Submitted 28 May, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted at KDD 2023

  38. arXiv:2205.10884  [pdf, other

    cs.CL

    Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

    Authors: Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu

    Abstract: The task of Grammatical Error Correction (GEC) has received remarkable attention with wide applications in Natural Language Processing (NLP) in recent years. While one of the key principles of GEC is to keep the correct parts unchanged and avoid over-correction, previous sequence-to-sequence (seq2seq) models generate results from scratch, which are not guaranteed to follow the original sentence st… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: accepted in AAAI 2022

  39. arXiv:2205.01289  [pdf, other

    cs.IR

    On Ranking Consistency of Pre-ranking Stage

    Authors: Siyu Gu, Xiangrong Sheng

    Abstract: Industrial ranking systems, such as advertising systems, rank items by aggregating multiple objectives into one final objective to satisfy user demand and commercial intent. Cascade architecture, composed of retrieval, pre-ranking, and ranking stages, is usually adopted to reduce the computational cost. Each stage may employ various models for different objectives and calculate the final objective… ▽ More

    Submitted 3 November, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: 9 pagees, 5 figures

  40. arXiv:2204.07429  [pdf, other

    cs.ET cs.AR cs.LG cs.NE

    Experimentally realized memristive memory augmented neural network

    Authors: Ruibin Mao, Bo Wen, Yahui Zhao, Arman Kazemi, Ann Franchesca Laguna, Michael Neimier, X. Sharon Hu, Xia Sheng, Catherine E. Graves, John Paul Strachan, Can Li

    Abstract: Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have diff… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: 54 pages, 21 figures, 3 tables

  41. arXiv:2203.02304  [pdf, other

    cs.RO

    Hitchhiker: A Quadrotor Aggressively Perching on a Moving Inclined Surface Using Compliant Suction Cup Gripper

    Authors: Sensen Liu, Zhaoying Wang, Xinjun Sheng, Wei Dong

    Abstract: Perching on {the surface} of moving objects, like vehicles, could extend the flight {time} and range of quadrotors. Suction cups are usually adopted for {surface attachment} due to their durability and large adhesive force. To seal on {a surfaces}, suction cups {must} be aligned with {the surface} and {possess proper relative tangential velocity}. {However, quadrotors' attitude and relative veloci… ▽ More

    Submitted 13 March, 2023; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: This paper has been submitted to IEEE Transactions on Automation Science and Engineering at 22-Januray-2022

  42. Attribute Artifacts Removal for Geometry-based Point Cloud Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Zhiwei Xiong

    Abstract: Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on p… ▽ More

    Submitted 28 February, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

  43. arXiv:2111.13850  [pdf, other

    cs.CV cs.LG eess.IV

    Temporal Context Mining for Learned Video Compression

    Authors: Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, Yan Lu

    Abstract: We address end-to-end learned video compression with a special focus on better learning and utilizing temporal contexts. For temporal context mining, we propose to store not only the previously reconstructed frames, but also the propagated features into the generalized decoded picture buffer. From the stored propagated features, we propose to learn multi-scale temporal contexts, and re-fill the le… ▽ More

    Submitted 30 January, 2023; v1 submitted 27 November, 2021; originally announced November 2021.

  44. An Efficient Egocentric Regulator for Continuous Targeting Problems of the Underactuated Quadrotor

    Authors: Ziying Lin, Wei Dong, Sensen Liu, Xinjun Sheng, Xiangyang Zhu

    Abstract: Flying robots such as the quadrotor could provide an efficient approach for medical treatment or sensor placing of wild animals. In these applications, continuously targeting the moving animal is a crucial requirement. Due to the underactuated characteristics of the quadrotor and the coupled kinematics with the animal, nonlinear optimal tracking approaches, other than smooth feedback control, are… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Journal ref: IEEE/ASME Transactions on Mechatronics, vol. 28, no. 1, pp. 116-127, Feb. 2023

  45. arXiv:2104.14121  [pdf, other

    cs.LG

    Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

    Authors: Siyu Gu, Xiang-Rong Sheng, Ying Fan, Guorui Zhou, Xiaoqiang Zhu

    Abstract: One of the difficulties of conversion rate (CVR) prediction is that the conversions can delay and take place long after the clicks. The delayed feedback poses a challenge: fresh data are beneficial to continuous training but may not have complete label information at the time they are ingested into the training pipeline. To balance model freshness and label certainty, previous methods set a short… ▽ More

    Submitted 12 August, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted at KDD 2021

  46. Tree-based machine learning performed in-memory with memristive analog CAM

    Authors: Giacomo Pedretti, Catherine E. Graves, Can Li, Sergey Serebryakov, Xia Sheng, Martin Foltin, Ruibin Mao, John Paul Strachan

    Abstract: Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, while easier to train, they are difficult to optimize for fast inference without accuracy loss in von Neumann architectures due to non-uniform memo… ▽ More

    Submitted 17 March, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

  47. arXiv:2101.11427  [pdf, other

    cs.IR cs.LG

    One Model to Serve All: Star Topology Adaptive Recommender for Multi-Domain CTR Prediction

    Authors: Xiang-Rong Sheng, Liqin Zhao, Guorui Zhou, Xinyao Ding, Binding Dai, Qiang Luo, Siran Yang, **gshan Lv, Chi Zhang, Hongbo Deng, Xiaoqiang Zhu

    Abstract: Traditional industrial recommenders are usually trained on a single business domain and then serve for this domain. However, in large commercial platforms, it is often the case that the recommenders need to make click-through rate (CTR) predictions for multiple business domains. Different domains have overlap** user groups and items. Thus, there exist commonalities. Since the specific user group… ▽ More

    Submitted 2 November, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: Accepted at CIKM 2021

  48. arXiv:2011.05625  [pdf, other

    cs.IR stat.ML

    CAN: Feature Co-Action for Click-Through Rate Prediction

    Authors: Weijie Bian, Kailun Wu, Lejian Ren, Qi Pi, Yu**g Zhang, Can Xiao, Xiang-Rong Sheng, Yong-Nan Zhu, Zhangming Chan, Na Mou, Xinchen Luo, Shiming Xiang, Guorui Zhou, Xiaoqiang Zhu, Hongbo Deng

    Abstract: Feature interaction has been recognized as an important problem in machine learning, which is also very essential for click-through rate (CTR) prediction tasks. In recent years, Deep Neural Networks (DNNs) can automatically learn implicit nonlinear interactions from original sparse features, and therefore have been widely used in industrial CTR prediction tasks. However, the implicit feature inter… ▽ More

    Submitted 7 December, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: WSDM 2022

    MSC Class: Machine Learning (stat.ML); Information Retrieval (cs.IR); Machine Learning (cs.LG) ACM Class: I.2.6

  49. An Active Sense and Avoid System for Flying Robots in Dynamic Environments

    Authors: Gang Chen, Wei Dong, Xinjun Sheng, Xiangyang Zhu, Han Ding

    Abstract: This paper investigates a novel active-sensing-based obstacle avoidance paradigm for flying robots in dynamic environments. Instead of fusing multiple sensors to enlarge the field of view (FOV), we introduce an alternative approach that utilizes a stereo camera with an independent rotational DOF to sense the obstacles actively. In particular, the sensing direction is planned heuristically by multi… ▽ More

    Submitted 17 February, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

    Comments: Accepted by IEEE Transactions on Mechatronics on 27 Jan 2021

  50. arXiv:1910.05758  [pdf, other

    cs.RO cs.CV

    Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding

    Authors: Gang Chen, Hongzhe Yu, Wei Dong, Xinjun Sheng, Xiangyang Zhu, Han Ding

    Abstract: While training an end-to-end navigation network in the real world is usually of high cost, simulation provides a safe and cheap environment in this training stage. However, training neural network models in simulation brings up the problem of how to effectively transfer the model from simulation to the real world (sim-to-real). In this work, we regard the environment representation as a crucial el… ▽ More

    Submitted 11 November, 2019; v1 submitted 13 October, 2019; originally announced October 2019.

    Comments: 10 pages, 11 figures