Skip to main content

Showing 1–50 of 235 results for author: Zheng, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00921  [pdf, other

    cs.CV

    PointViG: A Lightweight GNN-based Model for Efficient Point Cloud Analysis

    Authors: Qiang Zheng, Yafei Qi, Chen Wang, Chao Zhang, Jian Sun

    Abstract: In the domain of point cloud analysis, despite the significant capabilities of Graph Neural Networks (GNNs) in managing complex 3D datasets, existing approaches encounter challenges like high computational costs and scalability issues with extensive scenarios. These limitations restrict the practical deployment of GNNs, notably in resource-constrained environments. To address these issues, this st… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.19827  [pdf, other

    cs.LG

    Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

    Authors: Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Liqiang Nie

    Abstract: The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data, prompting the development of Dataset Distillation methods to address the challenges of managing large datasets. Among these, Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data wit… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 11 pages

  3. arXiv:2406.19815  [pdf, other

    cs.CV cs.AI

    Emotion Loss Attacking: Adversarial Attack Perception for Skeleton based on Multi-dimensional Features

    Authors: Feng Liu, Qing Xu, Qijian Zheng

    Abstract: Adversarial attack on skeletal motion is a hot topic. However, existing researches only consider part of dynamic features when measuring distance between skeleton graph sequences, which results in poor imperceptibility. To this end, we propose a novel adversarial attack method to attack action recognizers for skeletal motions. Firstly, our method systematically proposes a dynamic distance function… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.04658  [pdf, other

    cs.CR cs.AI cs.LG

    Advanced Payment Security System:XGBoost, CatBoost and SMOTE Integrated

    Authors: Qi Zheng, Chang Yu, ** Cao, Yongshun Xu, Qianwen Xing, Yinxin **

    Abstract: With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security. This study explores the application of advanced machine learning models, specifically XGBoost and LightGBM, for develo** a more accurate and robust Payment Security Protection Model.To enhance data reliability, we meticulously processed the data sources and used SM… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: This paper is received by https://ieee-metacom.org

  6. arXiv:2405.09556  [pdf, other

    eess.SP cs.AI cs.IT

    Co-learning-aided Multi-modal-deep-learning Framework of Passive DOA Estimators for a Heterogeneous Hybrid Massive MIMO Receiver

    Authors: Jiatong Bai, Feng Shu, Qinghe Zheng, Bo Xu, Baihua Shi, Yiwen Chen, Weibin Zhang, Xianpeng Wang

    Abstract: Due to its excellent performance in rate and resolution, fully-digital (FD) massive multiple-input multiple-output (MIMO) antenna arrays has been widely applied in data transmission and direction of arrival (DOA) measurements, etc. But it confronts with two main challenges: high computational complexity and circuit cost. The two problems may be addressed well by hybrid analog-digital (HAD) structu… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 April, 2024; originally announced May 2024.

  7. arXiv:2405.04520  [pdf, other

    cs.CL cs.LG cs.SE

    NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

    Authors: Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on algorithm and data science, insufficiently satisfying challenging requirements prevalent in real-world coding. To fill this gap, we propose NaturalCodeB… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.03181  [pdf, other

    cs.DC

    Collaborative Satellite Computing through Adaptive DNN Task Splitting and Offloading

    Authors: Shifeng Peng, Xuefeng Hou, Zhishu Shen, Qiushi Zheng, Jiong **, Atsushi Tagami, **gling Yuan

    Abstract: Satellite computing has emerged as a promising technology for next-generation wireless networks. This innovative technology provides data processing capabilities, which facilitates the widespread implementation of artificial intelligence (AI)-based applications, especially for image processing tasks involving deep neural network (DNN). With the limited computing resources of an individual satellit… ▽ More

    Submitted 20 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by 29th IEEE Symposium on Computers and Communications (ISCC)

  9. arXiv:2405.02572  [pdf, other

    cs.LG cs.AI

    Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

    Authors: Wenjia Meng, Qian Zheng, Long Yang, Yilong Yin, Gang Pan

    Abstract: Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy data. However, these methods suffer from the high variance of the off-policy policy gradient (OPPG) estimator, which results in poor sample efficiency during trai… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures

  10. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  11. arXiv:2404.06692  [pdf, other

    cs.CV

    Perception-Oriented Video Frame Interpolation via Asymmetric Blending

    Authors: Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng

    Abstract: Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. These issues can be traced back to two pivotal factors: unavoidable motion errors and misalignment in supervision. In practice, motion estimates often prove to be error-prone, resulting in misaligned features. Furthermore, the reconstruction loss tends to bring… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2404.05225  [pdf, other

    cs.CV cs.CL

    LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

    Authors: Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

    Abstract: Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored and utilized the document layout information, which is vital for precise document understanding. In this paper, we propose LayoutLLM, an LLM/MLLM bas… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  13. arXiv:2404.01612  [pdf, other

    cs.CV

    Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo

    Authors: Zongrui Li, Zhan Lu, Haojie Yan, Boxin Shi, Gang Pan, Qian Zheng, Xudong Jiang

    Abstract: Natural Light Uncalibrated Photometric Stereo (NaUPS) relieves the strict environment and light assumptions in classical Uncalibrated Photometric Stereo (UPS) methods. However, due to the intrinsic ill-posedness and high-dimensional ambiguities, addressing NaUPS is still an open question. Existing works impose strong assumptions on the environment lights and objects' material, restricting the effe… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Paper accepted by CVPR2024

  14. arXiv:2404.00934  [pdf, other

    cs.CL

    ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

    Authors: Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong

    Abstract: ChatGLM is a free-to-use AI service powered by the ChatGLM family of large language models (LLMs). In this paper, we present the ChatGLM-RLHF pipeline -- a reinforcement learning from human feedback (RLHF) system -- designed to enhance ChatGLM's alignment with human preferences. ChatGLM-RLHF encompasses three major components: the collection of human preference data, the training of the reward mod… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  15. arXiv:2403.16643  [pdf, other

    eess.IV cs.CV

    Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

    Authors: Qing** Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

    Abstract: Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  16. arXiv:2403.13337  [pdf, other

    cs.CV cs.AI

    Learning Novel View Synthesis from Heterogeneous Low-light Captures

    Authors: Quan Zheng, Hao Sun, Huiyao Xu, Fanjiang Xu

    Abstract: Neural radiance field has achieved fundamental success in novel view synthesis from input views with the same brightness level captured under fixed normal lighting. Unfortunately, synthesizing novel views remains to be a challenge for input views with heterogeneous brightness level captured under low-light condition. The condition is pretty common in the real world. It causes low-contrast images w… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  17. arXiv:2403.11221  [pdf, other

    cs.DC cs.DB

    Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)

    Authors: Qiushi Zheng, Zhanhao Zhao, Wei Lu, Chang Yao, Yuxing Chen, Anqun Pan, Xiaoyong Du

    Abstract: Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node transactions by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas of the entire database. However, migration-based methods… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  18. arXiv:2403.09439  [pdf, other

    cs.CV cs.AI

    3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

    Authors: Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou

    Abstract: Text-driven 3D scene generation techniques have made rapid progress in recent years. Their success is mainly attributed to using existing generative models to iteratively perform image war** and inpainting to generate 3D scenes. However, these methods heavily rely on the outputs of existing models, leading to error accumulation in geometry and appearance that prevent the models from being used i… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 7 figures

  19. arXiv:2403.06702  [pdf, other

    cs.CV

    Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Map** and Geometric Regularization

    Authors: **lu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji

    Abstract: Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality. In this paper, we propose an End-to-End Efficient and Effective network for fast and accurate T3D face generation and manipulation, termed $E^3$-FaceNet. Different from existing complex generation paradigms, $E^3$-FaceNet resor… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  20. arXiv:2403.05105  [pdf, other

    cs.CV cs.AI cs.MM

    Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

    Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, **gdong Wang

    Abstract: Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this proble… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  21. arXiv:2403.02716  [pdf, other

    cs.SE

    Pre-trained Model-based Actionable Warning Identification: A Feasibility Study

    Authors: Xiuting Ge, Chunrong Fang, Quanjun Zhang, Daoyuan Wu, Bowen Yu, Qirui Zheng, An Guo, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

    Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develo… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  22. arXiv:2403.01079  [pdf, other

    cs.LG cs.AI

    Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

    Authors: Junxian Li, Bin Shi, Erfei Cui, Hua Wei, Qinghua Zheng

    Abstract: We study the challenging problem for inference tasks on large-scale graph datasets of Graph Neural Networks: huge time and memory consumption, and try to overcome it by reducing reliance on graph structure. Even though distilling graph knowledge to student MLP is an excellent idea, it faces two major problems of positional information loss and low generalization. To solve the problems, we propose… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 20 pages, with Appendix

  23. arXiv:2402.16594  [pdf, other

    cs.CV

    CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition

    Authors: Qixuan Zheng, Ming Zhang, Hong Yan

    Abstract: To achieve greater accuracy, hypergraph matching algorithms require exponential increases in computational resources. Recent kd-tree-based approximate nearest neighbor (ANN) methods, despite the sparsity of their compatibility tensor, still require exhaustive calculations for large-scale graph matching. This work utilizes CUR tensor decomposition and introduces a novel cascaded second and third-or… ▽ More

    Submitted 30 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  24. arXiv:2402.16255  [pdf, other

    cs.LG cs.AI

    Watch Your Head: Assembling Projection Heads to Save the Reliability of Federated Models

    Authors: **qian Chen, Jihua Zhu, Qinghai Zheng, Zhongyu Li, Zhiqiang Tian

    Abstract: Federated learning encounters substantial challenges with heterogeneous data, leading to performance degradation and convergence issues. While considerable progress has been achieved in mitigating such an impact, the reliability aspect of federated models has been largely disregarded. In this study, we conduct extensive experiments to investigate the reliability of both generic and personalized fe… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted in AAAI-24

  25. arXiv:2402.14152  [pdf, other

    cs.AR cs.CR

    ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM

    Authors: Jonathan Ku, Junyao Zhang, Haoxuan Shan, Saichand Samudrala, Jiawen Wu, Qilin Zheng, Ziru Li, JV Rajendran, Yiran Chen

    Abstract: Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Pr… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: DAC 2024

  26. arXiv:2402.14083  [pdf, other

    cs.AI

    Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrap**

    Authors: Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian

    Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^*$ se… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  27. arXiv:2402.10562  [pdf

    cs.RO physics.med-ph

    Precise Hybrid-Actuation Robotic Fiber for Enhanced Cervical Disease Treatment

    Authors: **shi Zhao, Qindong Zheng, Ali Anil Demircali, Xiaotong Guo, Daniel Simon, Maria Paraskevaidi, Nick W F Linton, Zoltan Takats, Maria Kyrgiou, Burak Temelkuran

    Abstract: Treatment for high-grade precancerous cervical lesions and early-stage cancers, mainly affecting women of reproductive age, often involves fertility-sparing treatment methods. Commonly used local treatments for cervical precancers have shown the risk of leaving a positive cancer margin and engendering subsequent complications according to the precision and depth of excision. An intra-operative dev… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  28. arXiv:2402.03570  [pdf, other

    cs.LG cs.AI

    Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning

    Authors: Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng

    Abstract: We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive queries. We integrate DWM into model-based value estimation, where the short-term return is simulated by fu… ▽ More

    Submitted 16 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  29. arXiv:2401.14427  [pdf, other

    cs.SE cs.CR cs.LG

    Beimingwu: A Learnware Dock System

    Authors: Zhi-Hao Tan, Jian-Dong Liu, Xiao-Dong Bi, Peng Tan, Qin-Cheng Zheng, Hai-Tian Liu, Yi Xie, Xiao-Chuan Zou, Yang Yu, Zhi-Hua Zhou

    Abstract: The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnwa… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  30. arXiv:2401.13976  [pdf, other

    cs.CV cs.AI

    Learning to Manipulate Artistic Images

    Authors: Wei Guo, Yuqi Zhang, De Ma, Qian Zheng

    Abstract: Recent advancement in computer vision has significantly lowered the barriers to artistic creation. Exemplar-based image translation methods have attracted much attention due to flexibility and controllability. However, these methods hold assumptions regarding semantics or require semantic information as the input, while accurate semantics is not easy to obtain in artistic images. Besides, these me… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  31. arXiv:2401.01522  [pdf, other

    cs.CV

    LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

    Authors: Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

    Abstract: Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the corresponding markup sequences from the table images. However, existing approaches either count on additional heuristic rules to recover the table structures, or… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.03730

  32. arXiv:2312.16151  [pdf, other

    cs.CV

    Large-scale Long-tailed Disease Diagnosis on Radiology Images

    Authors: Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Lisong Dai, Hengyu Guan, Yuehua Li, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Develo** a generalist radiology diagnosis system can greatly enhance clinical diagnostics. In this paper, we introduce RadDiag, a foundational model supporting 2D and 3D inputs across various modalities and anatomies, using a transformer-based fusion module for comprehensive disease diagnosis. Due to patient privacy concerns and the lack of large-scale radiology diagnosis datasets, we utilize hi… ▽ More

    Submitted 16 June, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  33. arXiv:2312.15942  [pdf, other

    cs.CV eess.IV

    Pano-NeRF: Synthesizing High Dynamic Range Novel Views with Geometry from Sparse Low Dynamic Range Panoramic Images

    Authors: Zhan Lu, Qian Zheng, Boxin Shi, Xudong Jiang

    Abstract: Panoramic imaging research on geometry recovery and High Dynamic Range (HDR) reconstruction becomes a trend with the development of Extended Reality (XR). Neural Radiance Fields (NeRF) provide a promising scene representation for both tasks without requiring extensive prior data. However, in the case of inputting sparse Low Dynamic Range (LDR) panoramic images, NeRF often degrades with under-const… ▽ More

    Submitted 23 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  34. arXiv:2312.03141  [pdf, other

    cs.AR

    NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

    Authors: Yitu Wang, Shiyu Li, Qilin Zheng, Linghao Song, Zongwang Li, Andrew Chang, Hai "Helen" Li, Yiran Chen

    Abstract: Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size… ▽ More

    Submitted 28 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  35. Towards Fast and Stable Federated Learning: Confronting Heterogeneity via Knowledge Anchor

    Authors: **qian Chen, Jihua Zhu, Qinghai Zheng

    Abstract: Federated learning encounters a critical challenge of data heterogeneity, adversely affecting the performance and convergence of the federated model. Various approaches have been proposed to address this issue, yet their effectiveness is still limited. Recent studies have revealed that the federated model suffers severe forgetting in local training, leading to global forgetting and performance deg… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Published in ACM MM23

    MSC Class: 68T99

  36. arXiv:2311.16876  [pdf, other

    cs.NI cs.LG

    Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing

    Authors: Zhengming Zhang, Yongming Huang, Cheng Zhang, Qingbi Zheng, Luxi Yang, Xiaohu You

    Abstract: Network slicing-based communication systems can dynamically and efficiently allocate resources for diversified services. However, due to the limitation of the network interface on channel access and the complexity of the resource allocation, it is challenging to achieve an acceptable solution in the practical system without precise prior knowledge of the dynamics probability model of the service r… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  37. arXiv:2311.13443  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Guided Flows for Generative Modeling and Decision Making

    Authors: Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, Ricky T. Q. Chen

    Abstract: Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks. While it has previously demonstrated remarkable improvements for the sample quality, it has only been exclusively employed for diffusion models. In this paper, we integrate classifier-free guidance into Flow Matching (FM) models, an alternative simulation-free approach t… ▽ More

    Submitted 7 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

  38. arXiv:2311.09800  [pdf, other

    cs.CL

    $\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning

    Authors: Evgeniia Razumovskaia, Ivan Vulić, Pavle Marković, Tomasz Cichy, Qian Zheng, Tsung-Hsien Wen, Paweł Budzianowski

    Abstract: Factuality is a crucial requirement in information seeking dialogue: the system should respond to the user's queries so that the responses are meaningful and aligned with the knowledge provided to the system. However, most modern large language models suffer from hallucinations, that is, they generate responses not supported by or contradicting the knowledge source. To mitigate the issue and incre… ▽ More

    Submitted 4 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  39. arXiv:2311.09077  [pdf, other

    cs.CV

    Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

    Authors: Zhanfeng Liao, Qian Zheng, Yan Liu, Gang Pan

    Abstract: A crucial reason for the success of existing NeRF-based methods is to build a neural density field for the geometry representation via multiple perceptron layers (MLPs). MLPs are continuous functions, however, real geometry or density field is frequently discontinuous at the interface between the air and the surface. Such a contrary brings the problem of unfaithful geometry representation. To this… ▽ More

    Submitted 4 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  40. arXiv:2311.01686  [pdf, other

    cs.CV cs.LG

    Disentangled Representation Learning with Transmitted Information Bottleneck

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, **gdong Wang, Qinghua Zheng

    Abstract: Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models. Although significant advances have been made by regularizing the information in representations with information theory, two major challenges remain: 1) the representation compression inevitably leads to performance drop;… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  41. arXiv:2310.18859  [pdf, other

    cs.LG cs.DC

    SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

    Authors: Zhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, **gwei Sun, Qilin Zheng, Yongkai Wu, Ang Li, Hai "Helen" Li, Yiran Chen

    Abstract: Mixture-of-Experts (MoE) has emerged as a favorable architecture in the era of large models due to its inherent advantage, i.e., enlarging model capacity without incurring notable computational overhead. Yet, the realization of such benefits often results in ineffective GPU memory utilization, as large portions of the model parameters remain dormant during inference. Moreover, the memory demands o… ▽ More

    Submitted 17 May, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Published on MLSys24. https://openreview.net/forum?id=q26ydTFF5j}

    Journal ref: Seventh Conference on Machine Learning and Systems, (2024)

  42. arXiv:2310.15836  [pdf, other

    cs.CL cs.AI cs.LG

    A Diffusion Weighted Graph Framework for New Intent Discovery

    Authors: Wenkai Shi, Wenbin An, Feng Tian, Qinghua Zheng, QianYing Wang, ** Chen

    Abstract: New Intent Discovery (NID) aims to recognize both new and known intents from unlabeled data with the aid of limited labeled data containing only known intents. Without considering structure relationships between samples, previous methods generate noisy supervisory signals which cannot strike a balance between quantity and quality, hindering the formation of new intent clusters and effective transf… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main

  43. arXiv:2310.15112  [pdf, other

    cs.HC cs.AI

    The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills

    Authors: Qingxiao Zheng, Yun Huang

    Abstract: This study explores the impact of AI-generated digital self-clones on improving online presentation skills. We carried out a mixed-design experiment involving 44 international students, comparing self-recorded videos (control) with self-clone videos (AI group) for English presentation practice. The AI videos utilized voice cloning, face swap**, lip-sync, and body-language simulation to refine pa… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 25 pages

  44. arXiv:2310.15065  [pdf, other

    cs.HC cs.AI

    Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based Agents

    Authors: Qingxiao Zheng, Zhongwei Xu, Abhinav Choudhry, Yuting Chen, Yongming Li, Yun Huang

    Abstract: This empirical study serves as a primer for interested service providers to determine if and how Large Language Models (LLMs) technology will be integrated for their practitioners and the broader community. We investigate the mutual learning journey of non-AI experts and AI through CoAGent, a service co-creation tool with LLM-based agents. Engaging in a three-stage participatory design processes,… ▽ More

    Submitted 29 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: V1.0 on Oct 25th, 2023

  45. arXiv:2310.12182  [pdf, other

    cs.AR

    Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

    Authors: Xueying Wu, Edward Hanson, Nansu Wang, Qilin Zheng, Xiaoxuan Yang, Huanrui Yang, Shiyu Li, Feng Cheng, Partha Pratim Pande, Janardhan Rao Doppa, Krishnendu Chakrabarty, Hai Li

    Abstract: Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activ… ▽ More

    Submitted 27 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 12 pages, 13 figures

  46. arXiv:2310.10151  [pdf, other

    cs.LG cs.CL cs.IR

    DNA: Denoised Neighborhood Aggregation for Fine-grained Category Discovery

    Authors: Wenbin An, Feng Tian, Wenkai Shi, Yan Chen, Qinghua Zheng, QianYing Wang, ** Chen

    Abstract: Discovering fine-grained categories from coarsely labeled data is a practical and challenging task, which can bridge the gap between the demand for fine-grained analysis and the high annotation cost. Previous works mainly focus on instance-level discrimination to learn low-level features, but ignore semantic similarities between data, which may prevent these models learning compact cluster represe… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 Main

  47. arXiv:2310.09909  [pdf, other

    cs.CV cs.CL

    Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

    Authors: Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public. In this study, we aim to assess the performance of OpenAI's newest model, GPT-4V(ision), specifically in the realm of multimodal medical diagnosis. Our evaluation encompasses 17 human body systems, including Central Nerv… ▽ More

    Submitted 4 December, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  48. Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer

    Authors: Zhihao Zhang, Yiwei Chen, Weizhan Zhang, Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen

    Abstract: Viewport prediction is a crucial aspect of tile-based 360 video streaming system. However, existing trajectory based methods lack of robustness, also oversimplify the process of information construction and fusion between different modality inputs, leading to the error accumulation problem. In this paper, we propose a tile classification based viewport prediction method with Multi-modal Fusion Tra… ▽ More

    Submitted 28 September, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: This paper is accepted by ACM-MM 2023

  49. arXiv:2309.07350  [pdf

    cs.RO

    Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation

    Authors: Lingfeng Tao, Jiucai Zhang, Qiaojie Zheng, Xiaoli Zhang

    Abstract: Simulation to Real-World Transfer allows affordable and fast training of learning-based robots for manipulation tasks using Deep Reinforcement Learning methods. Currently, Sim2Real uses Asymmetric Actor-Critic approaches to reduce the rich idealized features in simulation to the accessible ones in the real world. However, the feature reduction from the simulation to the real world is conducted thr… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  50. arXiv:2309.00474  [pdf, other

    cs.CV

    Asymmetric double-winged multi-view clustering network for exploring Diverse and Consistent Information

    Authors: Qun Zheng, Xihong Yang, Siwei Wang, Xinru An, Qi Liu

    Abstract: In unsupervised scenarios, deep contrastive multi-view clustering (DCMVC) is becoming a hot research spot, which aims to mine the potential relationships between different views. Most existing DCMVC algorithms focus on exploring the consistency information for the deep semantic features, while ignoring the diverse information on shallow features. To fill this gap, we propose a novel multi-view clu… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.