Skip to main content

Showing 1–50 of 130 results for author: Yan, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18817  [pdf, other

    cs.CV cs.AI

    Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis

    Authors: Mingyang Zhao, **gen Jiang, Lei Ma, Shiqing Xin, Gaofeng Meng, Dong-Ming Yan

    Abstract: This paper presents a novel non-rigid point set registration method that is inspired by unsupervised clustering analysis. Unlike previous approaches that treat the source and target point sets as separate entities, we develop a holistic framework where they are formulated as clustering centroids and clustering members, separately. We then adopt Tikhonov regularization with an $\ell_1$-induced Lapl… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: [CVPR 2024 Highlight] Project and code at: https://github.com/zikai1/CVPR24_PointSetReg

  2. arXiv:2406.11824  [pdf, other

    cs.CV

    Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

    Authors: Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, Jia Deng

    Abstract: We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constrai… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  3. arXiv:2406.07327  [pdf, other

    cs.AI cs.CL cs.LG

    3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

    Authors: Yuzi Yan, Yibo Miao, Jialian Li, Yipin Zhang, Jian Xie, Zhijie Deng, Dong Yan

    Abstract: Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples. Despite the efficiency, DPO has rarely be used in the state-of-the-art production-level LLMs, implying its potential pathologies. In this work, we revisit DPO with a comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.00347  [pdf, other

    cs.CV

    E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network

    Authors: Hanxiao Wang, Mingyang Zhao, Weize Quan, Zhen Chen, Dong-ming Yan, Peter Wonka

    Abstract: Point cloud normal estimation is a fundamental task in 3D geometry processing. While recent learning-based methods achieve notable advancements in normal prediction, they often overlook the critical aspect of equivariance. This results in inefficient learning of symmetric patterns. To address this issue, we propose E3-Net to achieve equivariance for normal estimation. We introduce an efficient ran… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  5. arXiv:2405.16964  [pdf, other

    cs.CL cs.AI

    Exploring the LLM Journey from Cognition to Expression with Linear Representations

    Authors: Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan

    Abstract: This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretrai… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published in ICML 2024

  6. arXiv:2405.12739  [pdf, other

    cs.LG

    SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

    Authors: Xingzhou Lou, Junge Zhang, Jian Xie, Lifeng Liu, Dong Yan, Kaiqi Huang

    Abstract: Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the complexity of managing multiple reward models. To address these issues, we propose Sequential Preference Optimization (SPO), a method that sequentially fin… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  7. arXiv:2404.17917  [pdf, other

    cs.CV eess.IV

    EvaNet: Elevation-Guided Flood Extent Map** on Earth Imagery

    Authors: Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou

    Abstract: Accurate and timely map** of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectr… ▽ More

    Submitted 12 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted at the International Joint Conference on Artificial Intelligence (IJCAI, 2024)

  8. arXiv:2404.12850  [pdf, other

    cs.LG cs.DC

    CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

    Authors: Zeke Xia, Ming Hu, Dengke Yan, Xiaofei Xie, Tianlin Li, Anran Li, Junlong Zhou, Mingsong Chen

    Abstract: Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL a… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  9. arXiv:2404.12846  [pdf, other

    cs.LG

    KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

    Authors: Zeke Xia, Ming Hu, Dengke Yan, Ruixuan Liu, Anran Li, Xiaofei Xie, Mingsong Chen

    Abstract: Although Split Federated Learning (SFL) is good at enabling knowledge sharing among resource-constrained clients, it suffers from the problem of low training accuracy due to the neglect of data heterogeneity and catastrophic forgetting. To address this issue, we propose a novel SFL approach named KoReA-SFL, which adopts a multi-model aggregation mechanism to alleviate gradient divergence caused by… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  10. arXiv:2404.04545  [pdf, other

    cs.MM cs.CL

    TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

    Authors: Ming Zhou, Weize Quan, Ziqi Zhou, Kai Wang, Tong Wang, Dong-Ming Yan

    Abstract: Multimodal Sentiment Analysis (MSA) endeavors to understand human sentiment by leveraging language, visual, and acoustic modalities. Despite the remarkable performance exhibited by previous MSA approaches, the presence of inherent multimodal heterogeneities poses a challenge, with the contribution of different modalities varying considerably. Past research predominantly focused on improving repres… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  11. Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

    Authors: Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng

    Abstract: Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (… ▽ More

    Submitted 9 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures. IEEE Transactions on Multimedia, 2024

  12. arXiv:2403.19067  [pdf, other

    cs.CV

    Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

    Authors: Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

    Abstract: Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  13. arXiv:2403.10840  [pdf, other

    cs.RO cs.CV

    MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field

    Authors: Dongyu Yan, Guanyu Huang, Fengyu Quan, Haoyao Chen

    Abstract: Panoramic observation using fisheye cameras is significant in robot perception, reconstruction, and remote operation. However, panoramic images synthesized by traditional methods lack depth information and can only provide three degrees-of-freedom (3DoF) rotation rendering in virtual reality applications. To fully preserve and exploit the parallax information within the original fisheye cameras, w… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems 2024

  14. arXiv:2402.18294  [pdf, other

    cs.RO

    Whole-body Humanoid Robot Locomotion with Human Reference

    Authors: Qiang Zhang, Peter Cui, David Yan, **gkai Sun, Yiqun Duan, Arthur Zhang, Ren**g Xu

    Abstract: Recently, humanoid robots have made significant advances in their ability to perform challenging tasks due to the deployment of Reinforcement Learning (RL), however, the inherent complexity of humanoid robots, including the difficulty of designing complicated reward functions and training entire sophisticated systems, still poses a notable challenge. To conquer these challenges, after many iterati… ▽ More

    Submitted 1 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 7pages, 7 figures

  15. arXiv:2402.13008  [pdf, other

    cs.DS cs.DC

    Efficient Enumeration of Large Maximal k-Plexes

    Authors: Qihao Cheng, Da Yan, Tianhao Wu, Lyuheng Yuan, Ji Cheng, Zhongyi Huang, Yang Zhou

    Abstract: Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjace… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by EDBT2025. Camera-ready version

  16. arXiv:2402.10821  [pdf, other

    cs.CV

    Training Class-Imbalanced Diffusion Model Via Overlap Optimization

    Authors: Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

    Abstract: Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance o… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Technique Report

  17. arXiv:2402.10184  [pdf, other

    cs.LG cs.AI cs.CL cs.DM

    Reward Generalization in RLHF: A Topological Perspective

    Authors: Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang

    Abstract: Existing alignment methods share a common topology of information flow, where reward information is collected from humans, modeled with preference learning, and used to tune language models. However, this shared topology has not been systematically characterized, nor have its alternatives been thoroughly explored, leaving the problems of low data efficiency and unreliable generalization unaddresse… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  18. arXiv:2402.06088  [pdf, other

    cs.CV

    Animated Stickers: Bringing Stickers to Life with Video Diffusion

    Authors: David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu

    Abstract: We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion. Due to the domain gap, i.e. differences in visual and motion style, a model which performed well on generating natural videos can n… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  19. arXiv:2402.01441  [pdf, ps, other

    q-fin.TR cs.LG

    Learning the Market: Sentiment-Based Ensemble Trading Agents

    Authors: Andrew Ye, James Xu, Yi Wang, Yifan Yu, Daniel Yan, Ryan Chen, Bosheng Dong, Vipin Chaudhary, Shuai Xu

    Abstract: We propose the integration of sentiment analysis and deep-reinforcement learning ensemble algorithms for stock trading, and design a strategy capable of dynamically altering its employed agent given concurrent market sentiment. In particular, we create a simple-yet-effective method for extracting news sentiment and combine this with general improvements upon existing works, resulting in automated… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  20. arXiv:2401.03914  [pdf, other

    cs.CV

    D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement

    Authors: Danqi Yan, Qing Gao, Yuepeng Qian, Xinxing Chen, Chenglong Fu, Yuquan Leng

    Abstract: Three-dimensional (3D) human pose estimation using a monocular camera has gained increasing attention due to its ease of implementation and the abundance of data available from daily life. However, owing to the inherent depth ambiguity in images, the accuracy of existing monocular camera-based 3D pose estimation methods remains unsatisfactory, and the estimated 3D poses usually include much noise.… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  21. arXiv:2401.03395  [pdf, other

    cs.CV

    Deep Learning-based Image and Video Inpainting: A Survey

    Authors: Weize Quan, Jiaxi Chen, Yanli Liu, Dong-Ming Yan, Peter Wonka

    Abstract: Image and video inpainting is a classic problem in computer vision and computer graphics, aiming to fill in the plausible and realistic content in the missing areas of images and videos. With the advance of deep learning, this problem has achieved significant progress recently. The goal of this paper is to comprehensively review the deep learning-based methods for image and video inpainting. Speci… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: accepted to IJCV

  22. arXiv:2401.01456  [pdf, other

    cs.CV

    ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text

    Authors: Dingkun Yan, Liang Yuan, Yuma Nishioka, Issei Fujishiro, Suguru Saito

    Abstract: Recently, diffusion models have demonstrated their effectiveness in generating extremely high-quality images and have found wide-ranging applications, including automatic sketch colorization. However, most existing models use text to guide the conditional generation, with fewer attempts exploring the potential advantages of using image tokens as conditional inputs for networks. As such, this paper… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  23. arXiv:2312.09154  [pdf, other

    cs.CV

    CMG-Net: Robust Normal Estimation for Point Clouds via Chamfer Normal Distance and Multi-scale Geometry

    Authors: Yingrui Wu, Mingyang Zhao, Keqiang Li, Weize Quan, Tianqi Yu, Jianfeng Yang, Xiaohong Jia, Dong-Ming Yan

    Abstract: This work presents an accurate and robust method for estimating normals from point clouds. In contrast to predecessor approaches that minimize the deviations between the annotated and the predicted normals directly, leading to direction inconsistency, we first propose a new metric termed Chamfer Normal Distance to address this issue. This not only mitigates the challenge but also facilitates netwo… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  24. arXiv:2312.04883  [pdf, other

    cs.LG cs.SI

    Understanding Community Bias Amplification in Graph Representation Learning

    Authors: Shengzhong Zhang, Wenjie Yang, Yimin Zhang, Hongwei Zhang, Divin Yan, Zengfeng Huang

    Abstract: In this work, we discover a phenomenon of community bias amplification in graph representation learning, which refers to the exacerbation of performance bias between different classes by graph representation learning. We conduct an in-depth theoretical study of this phenomenon from a novel spectral perspective. Our analysis suggests that structural bias between communities results in varying local… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  25. arXiv:2311.13841  [pdf, other

    cs.CR

    Adversarial defense based on distribution transfer

    Authors: Jiahao Chen, Diqun Yan, Li Dong

    Abstract: The presence of adversarial examples poses a significant threat to deep learning models and their applications. Existing defense methods provide certain resilience against adversarial examples, but often suffer from decreased accuracy and generalization performance, making it challenging to achieve a trade-off between robustness and generalization. To address this, our paper interprets the adversa… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 27 pages

  26. arXiv:2311.13163  [pdf, other

    cs.LG cs.DC

    Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

    Authors: Dengke Yan, Ming Hu, Zeke Xia, Yanxin Yang, Jun Xia, Xiaofei Xie, Mingsong Chen

    Abstract: Due to its advantages in resource constraint scenarios, Split Federated Learning (SFL) is promising in AIoT systems. However, due to data heterogeneity and stragglers, SFL suffers from the challenges of low inference accuracy and low efficiency. To address these issues, this paper presents a novel SFL approach, named Sliding Split Federated Learning (S$^2$FL), which adopts an adaptive sliding mode… ▽ More

    Submitted 8 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  27. arXiv:2311.10794  [pdf, other

    cs.CV

    Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

    Authors: Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

    Abstract: We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that r… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  28. arXiv:2310.18823  [pdf, other

    cs.LG

    Successfully Applying Lottery Ticket Hypothesis to Diffusion Model

    Authors: Chao Jiang, Bo Hui, Bohan Liu, Da Yan

    Abstract: Despite the success of diffusion models, the training and inference of diffusion models are notoriously expensive due to the long chain of the reverse process. In parallel, the Lottery Ticket Hypothesis (LTH) claims that there exists winning tickets (i.e., aproperly pruned sub-network together with original weight initialization) that can achieve performance competitive to the original dense neura… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  29. arXiv:2310.18765  [pdf, other

    cs.LG

    Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

    Authors: Divin Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, Zengfeng Huang

    Abstract: This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance,… ▽ More

    Submitted 5 February, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems. (NeurIPS 2023)

  30. arXiv:2310.15590  [pdf, other

    cs.CR cs.CV

    Facial Data Minimization: Shallow Model as Your Privacy Filter

    Authors: Yuwen Pu, Jiahao Chen, Jiayu Pan, Hao li, Diqun Yan, Xuhong Zhang, Shouling Ji

    Abstract: Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail w… ▽ More

    Submitted 12 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 14 pages, 11 figures

  31. arXiv:2310.13548  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Towards Understanding Sycophancy in Language Models

    Authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

    Abstract: Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 32 pages, 20 figures

    ACM Class: I.2.6

  32. arXiv:2310.06234  [pdf, other

    cs.CV cs.LG

    Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing

    Authors: Wei Dong, Dawei Yan, Zhijun Lin, Peng Wang

    Abstract: The advent of high-capacity pre-trained models has revolutionized problem-solving in computer vision, shifting the focus from training task-specific models to adapting pre-trained models. Consequently, effectively adapting large pre-trained models to downstream tasks in an efficient manner has become a prominent research area. Existing solutions primarily concentrate on designing lightweight adapt… ▽ More

    Submitted 16 January, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Paper is accepted to NeurIPS 2023

  33. arXiv:2309.10305  [pdf, other

    cs.CL

    Baichuan 2: Open Large-scale Language Models

    Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2

  34. arXiv:2309.05073  [pdf, other

    cs.CV

    FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

    Authors: Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing **g, Ruimao Zhang

    Abstract: Estimating the 3D structure of the human body from natural scenes is a fundamental aspect of visual perception. 3D human pose estimation is a vital step in advancing fields like AIGC and human-robot interaction, serving as a crucial technique for understanding and interacting with human actions in real-world settings. However, the current datasets, often collected under single laboratory condition… ▽ More

    Submitted 3 April, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: CVPR2024 camera ready version. 19 pages, 16 figures. Project page: https://wangjiongw.github.io/freeman/ ; API: https://github.com/wangjiongw/FreeMan_API

  35. arXiv:2309.01480  [pdf, other

    cs.SD cs.AI eess.AS

    BadSQA: Stealthy Backdoor Attacks Using Presence Events as Triggers in Non-Intrusive Speech Quality Assessment

    Authors: Ying Ren, Kailai Shen, Zhe Ye, Diqun Yan

    Abstract: Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting the mean opinion score (MOS) of speech without requiring the reference speech. In practical NISQA scenarios, untrusted third-party resources are often employed during deep neural network training to reduce costs. However, it would introduce a potential security vulnerability as specially designed untrus… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 5 pages, 6 figures,conference

  36. arXiv:2308.04179  [pdf, other

    cs.CR cs.SD eess.AS eess.SP

    Breaking Speaker Recognition with PaddingBack

    Authors: Zhe Ye, Diqun Yan, Li Dong, Kailai Shen

    Abstract: Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transfo… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

  37. Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition

    Authors: JiaCheng Deng, Li Dong, Jiahao Chen, Diqun Yan, Rangding Wang, Dengpan Ye, Lingchen Zhao, **yu Tian

    Abstract: Optical Character Recognition (OCR) enables automatic text extraction from scanned or digitized text images, but it also makes it easy to pirate valuable or sensitive text from these images. Previous methods to prevent OCR piracy by distorting characters in text images are impractical in real-world scenarios, as pirates can capture arbitrary portions of the text images, rendering the defenses inef… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  38. arXiv:2307.05075  [pdf, other

    cs.CV cs.AI

    Uni-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images

    Authors: Yongheng Zhang, Danfeng Yan, Yuanqiang Cai

    Abstract: Removing multiple degradations, such as haze, rain, and blur, from real-world images poses a challenging and illposed problem. Recently, unified models that can handle different degradations have been proposed and yield promising results. However, these approaches focus on synthetic images and experience a significant performance drop when applied to realworld images. In this paper, we introduce U… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  39. arXiv:2306.15875  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion

    Authors: Zhe Ye, Terui Mao, Li Dong, Diqun Yan

    Abstract: Deep speech classification has achieved tremendous success and greatly promoted the emergence of many real-world applications. However, backdoor attacks present a new security threat to it, particularly with untrustworthy third-party platforms, as pre-defined triggers set by the attacker can activate the backdoor. Most of the triggers in existing speech backdoor attacks are sample-agnostic, and ev… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

    Journal ref: Proc. INTERSPEECH 2023, pp. 4923-4927

  40. arXiv:2305.02190  [pdf, other

    cs.LG cs.AI

    Rethinking Graph Lottery Tickets: Graph Sparsity Matters

    Authors: Bo Hui, Da Yan, Xiaolong Ma, Wei-Shinn Ku

    Abstract: Lottery Ticket Hypothesis (LTH) claims the existence of a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance to the original dense network. A recent work, called UGS, extended LTH to prune graph neural networks (GNNs) for effectively accelerating GNN inference. UGS simultaneously prunes the graph adjacency matr… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: ICLR 2023

  41. arXiv:2303.16739  [pdf, other

    cs.RO cs.CV

    Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimization

    Authors: Dongyu Yan, Jianheng Liu, Fengyu Quan, Haoyao Chen, Mengmeng Fu

    Abstract: Actively planning sensor views during object reconstruction is crucial for autonomous mobile robots. An effective method should be able to strike a balance between accuracy and efficiency. In this paper, we propose a seamless integration of the emerging implicit representation with the active reconstruction task. We build an implicit occupancy field as our geometry proxy. While training, the prior… ▽ More

    Submitted 28 May, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 8 pages, 11 figures, Submitted to IEEE Robotics and Automation Letters (RA-L)

  42. arXiv:2303.10613  [pdf, other

    cs.CV cs.GR cs.LG

    SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations

    Authors: Pu Li, Jianwei Guo, Xiaopeng Zhang, Dong-ming Yan

    Abstract: Reverse engineering CAD models from raw geometry is a classic but strenuous research problem. Previous learning-based methods rely heavily on labels due to the supervised design patterns or reconstruct CAD shapes that are not easily editable. In this work, we introduce SECAD-Net, an end-to-end neural network aimed at reconstructing compact and easy-to-edit CAD models in a self-supervised manner. D… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

  43. arXiv:2303.09152  [pdf, other

    cs.CV

    Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation

    Authors: Xiaoyang Lyu, Peng Dai, Zizhang Li, Dongyu Yan, Yi Lin, Yifan Peng, Xiaojuan Qi

    Abstract: Implicit neural rendering, which uses signed distance function (SDF) representation with geometric priors (such as depth or surface normal), has led to impressive progress in the surface reconstruction of large-scale scenes. However, applying this method to reconstruct a room-level scene from images may miss structures in low-intensity areas or small and thin objects. We conducted experiments on t… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  44. arXiv:2303.05092  [pdf, other

    cs.LG

    Task Aware Dreamer for Task Generalization in Reinforcement Learning

    Authors: Chengyang Ying, Zhongkai Hao, Xinning Zhou, Hang Su, Songming Liu, Dong Yan, Jun Zhu

    Abstract: A long-standing goal of reinforcement learning is to acquire agents that can learn on training tasks and generalize well on unseen tasks that may share a similar dynamic but with different reward functions. The ability to generalize across tasks is important as it determines an agent's adaptability to real-world scenarios where reward mechanisms might vary. In this work, we first show that trainin… ▽ More

    Submitted 2 February, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

  45. arXiv:2303.04086  [pdf, other

    cs.GR cs.CV cs.DC

    NEPHELE: A Neural Platform for Highly Realistic Cloud Radiance Rendering

    Authors: Haimin Luo, Siyuan Zhang, Fuqiang Zhao, Haotian **g, Penghao Wang, Zhenxiao Yu, Dongxue Yan, Junran Ding, Boyuan Zhang, Qiang Hu, Shu Yin, Lan Xu, JIngyi Yu

    Abstract: We have recently seen tremendous progress in neural rendering (NR) advances, i.e., NeRF, for photo-real free-view synthesis. Yet, as a local technique based on a single computer/GPU, even the best-engineered Instant-NGP or i-NGP cannot reach real-time performance when rendering at a high resolution, and often requires huge local computing resources. In this paper, we resort to cloud rendering and… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  46. arXiv:2302.14363  [pdf, other

    cs.RO cs.CV

    Efficient Implicit Neural Reconstruction Using LiDAR

    Authors: Dongyu Yan, Xiaoyang Lyu, Jieqi Shi, Yi Lin

    Abstract: Modeling scene geometry using implicit neural representation has revealed its advantages in accuracy, flexibility, and low memory usage. Previous approaches have demonstrated impressive results using color or depth images but still have difficulty handling poor light conditions and large-scale scenes. Methods taking global point cloud as input require accurate registration and ground truth coordin… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 6+2 pages, 8 figures, Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2023

  47. arXiv:2302.11707  [pdf

    cs.LG cs.AI

    A Deep Neural Network Based Approach to Building Budget-Constrained Models for Big Data Analysis

    Authors: Rui Ming, Hai** Xu, Shannon E. Gibbs, Donghui Yan, Ming Shao

    Abstract: Deep learning approaches require collection of data on many different input features or variables for accurate model training and prediction. Since data collection on input features could be costly, it is crucial to reduce the cost by selecting a subset of features and develo** a budget-constrained model (BCM). In this paper, we introduce an approach to eliminating less important features for bi… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 8 pages

  48. arXiv:2302.04626  [pdf, other

    cs.LG

    Self-Supervised Node Representation Learning via Node-to-Neighbourhood Alignment

    Authors: Wei Dong, Dawei Yan, Peng Wang

    Abstract: Self-supervised node representation learning aims to learn node representations from unlabelled graphs that rival the supervised counterparts. The key towards learning informative node representations lies in how to effectively gain contextual information from the graph structure. In this work, we present simple-yet-effective self-supervised node representation learning via aligning the hidden rep… ▽ More

    Submitted 9 February, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.12265

  49. arXiv:2301.06281  [pdf, other

    cs.CV

    DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

    Authors: Youxin Pang, Yong Zhang, Weize Quan, Yanbo Fan, Xiaodong Cun, Ying Shan, Dong-ming Yan

    Abstract: One-shot video-driven talking face generation aims at producing a synthetic talking video by transferring the facial motion from a video to an arbitrary portrait image. Head pose and facial expression are always entangled in facial motion and transferred simultaneously. However, the entanglement sets up a barrier for these methods to be used in video portrait editing directly, where it may require… ▽ More

    Submitted 1 March, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

    Comments: https://carlyx.github.io/DPE/

  50. arXiv:2212.09251  [pdf, other

    cs.CL cs.AI cs.LG

    Discovering Language Model Behaviors with Model-Written Evaluations

    Authors: Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion , et al. (38 additional authors not shown)

    Abstract: As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from inst… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: for associated data visualizations, see https://www.evals.anthropic.com/model-written/ for full datasets, see https://github.com/anthropics/evals