Skip to main content

Showing 1–50 of 419 results for author: Deng, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00696  [pdf, other

    cs.LG

    Graph in Graph Neural Network

    Authors: Jiongshu Wang, **g Yang, Jiankang Deng, Hatice Gunes, Siyang Song

    Abstract: Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    MSC Class: 68T05

  2. arXiv:2407.00056  [pdf, other

    cs.IR cs.AI cs.SI

    MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

    Authors: Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng

    Abstract: Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024

  3. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, **feng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  4. arXiv:2406.17442  [pdf, other

    cs.CV

    Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

    Authors: Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

    Abstract: Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, w… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2406.16623  [pdf, other

    cs.CV

    Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

    Authors: Jianning Deng, Kartic Subr, Hakan Bilen

    Abstract: We propose a novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by using an implicit model from the first observation, distils the part segmentation and articulation from the second observation while rendering th… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages for the maincontent, excluding references and supplementaries

  6. arXiv:2406.16372  [pdf, other

    cs.CL

    UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding

    Authors: Dongyang Li, Taolin Zhang, Jiali Deng, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages. However, previous works rely on shallow unsupervised data generated by token surface matching, regardless of the global context-aware semantics of the surrounding text tokens. In this paper, we propose an Unsupervised Pseu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.13149  [pdf, other

    cs.CV

    High-Fidelity Facial Albedo Estimation via Texture Quantization

    Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

    Abstract: Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo recons… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12709  [pdf, other

    cs.LG cs.AI

    Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

    Authors: Du Yin, **liang Deng, Shuang Ao, Zechen Li, Hao Xue, Arian Prabowo, Renhe Jiang, Xuan Song, Flora Salim

    Abstract: Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in perfo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.11824  [pdf, other

    cs.CV

    Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

    Authors: Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, Jia Deng

    Abstract: We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constrai… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  10. arXiv:2406.11793  [pdf, other

    cs.RO

    FetchBench: A Simulation Benchmark for Robot Fetching

    Authors: Beining Han, Meenal Parakh, Derek Geng, Jack A Defay, Luyang Gan, Jia Deng

    Abstract: Fetching, which includes approaching, gras**, and retrieving, is a critical challenge for robot manipulation tasks. Existing methods primarily focus on table-top scenarios, which do not adequately capture the complexities of environments where both gras** and planning are essential. To address this gap, we propose a new benchmark FetchBench, featuring diverse procedural scenes that integrate b… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11721  [pdf, other

    cs.CL cs.AI cs.LG

    Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

    Authors: Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Huan-ang Gao, Huimin Chen, Zhiyuan Liu, Maosong Sun

    Abstract: Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. This line of research has been limited to examining transfe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 33 pages, 14 figures

  12. arXiv:2406.11711  [pdf, other

    cs.CV

    OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations

    Authors: Yiming Zuo, Jia Deng

    Abstract: Depth completion is the task of generating a dense depth map given an image and a sparse depth map as inputs. It has important applications in various downstream tasks. In this paper, we present OGNI-DC, a novel framework for depth completion. The key to our method is "Optimization-Guided Neural Iterations" (OGNI). It consists of a recurrent unit that refines a depth gradient field and a different… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui **, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui **, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  15. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui **g, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  16. arXiv:2406.08920  [pdf, other

    cs.SD cs.AI eess.AS

    AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

    Authors: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

    Abstract: Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  17. arXiv:2406.08001  [pdf, other

    cs.CV cs.LG

    Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

    Authors: Jiaxin Deng, Junbiao Pang, Baochang Zhang

    Abstract: Sharpness-Aware Minimization (SAM) has emerged as a promising approach for effectively reducing the generalization error. However, SAM incurs twice the computational cost compared to base optimizer (e.g., SGD). We propose Asymptotic Unbiased Sampling with respect to iterations to accelerate SAM (AUSAM), which maintains the model's generalization capacity while significantly enhancing computational… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. arXiv:2406.06973  [pdf, other

    cs.CV

    RWKV-CLIP: A Robust Vision-Language Representation Learner

    Authors: Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng

    Abstract: Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from websites. This paper further explores CLIP from the perspectives of data and model architecture. To address the prevalence of noisy data and enhance the quality of large-scale image-text data crawled from the internet, w… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures

  19. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  20. arXiv:2406.03868  [pdf, other

    cs.DC

    PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

    Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, **yi Deng, Yang Hu, Shouyi Yin

    Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages

  21. arXiv:2406.01597  [pdf, other

    cs.CV cs.GR

    End-to-End Rate-Distortion Optimized 3D Gaussian Representation

    Authors: Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen

    Abstract: 3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

  22. arXiv:2405.20910  [pdf, other

    physics.app-ph cs.AI cs.CV physics.data-an

    Predicting ptychography probe positions using single-shot phase retrieval neural network

    Authors: Ming Du, Tao Zhou, Jun**g Deng, Daniel J. Ching, Steven Henke, Mathew J. Cherukara

    Abstract: Ptychography is a powerful imaging technique that is used in a variety of fields, including materials science, biology, and nanotechnology. However, the accuracy of the reconstructed ptychography image is highly dependent on the accuracy of the recorded probe positions which often contain errors. These errors are typically corrected jointly with phase retrieval through numerical optimization appro… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    MSC Class: 94A08 ACM Class: I.4.0

  23. arXiv:2405.17991  [pdf, other

    cs.CV cs.AI

    VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

    Authors: Roy Miles, Pradyumna Reddy, Ismail Elezi, Jiankang Deng

    Abstract: Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. Despite their success, training and fine-tuning these models is still far too computationally and memory intensive. In this paper, we identify and characterise the important components needed for effective model convergence using gradient descent. In doing so we find that the intermedi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  24. arXiv:2405.17293  [pdf, other

    cs.LG cs.AI

    Efficient Ensembles Improve Training Data Attribution

    Authors: Junwei Deng, Ting-Wei Li, Shichang Zhang, Jiaqi Ma

    Abstract: Training data attribution (TDA) methods aim to quantify the influence of individual training data points on the model predictions, with broad applications in data-centric AI, such as mislabel detection, data selection, and copyright compensation. However, existing methods in this field, which can be categorized as retraining-based and gradient-based, have struggled with the trade-off between compu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  25. arXiv:2405.17221  [pdf, other

    cs.AI cs.AR

    Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

    Authors: **yi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, **xi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

    Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  26. arXiv:2405.16570  [pdf, other

    cs.CV cs.AI

    ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling

    Authors: Francesca Babiloni, Alexandros Lattas, Jiankang Deng, Stefanos Zafeiriou

    Abstract: We propose ID-to-3D, a method to generate identity- and text-guided 3D human heads with disentangled expressions, starting from even a single casually captured in-the-wild image of a subject. The foundation of our approach is anchored in compositionality, alongside the use of task-specific 2D diffusion models as priors for optimization. First, we extend a foundational model with a lightweight expr… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: Explore our 3D results at: https://idto3d.github.io ; fixed broken url to project page

  27. arXiv:2405.16036  [pdf, other

    cs.LG cs.CR cs.CV

    Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness

    Authors: Jieren Deng, Hanbin Hong, Aaron Palmer, Xin Zhou, **bo Bi, Kaleel Mahmood, Yuan Hong, Derek Aguiar

    Abstract: Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage h… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  28. arXiv:2405.15349  [pdf, other

    cs.CL

    UnKE: Unstructured Knowledge Editing in Large Language Models

    Authors: **gcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by l… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  29. arXiv:2405.14793  [pdf, other

    cs.CV

    SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow

    Authors: Yihan Wang, Lahav Lipson, Jia Deng

    Abstract: We introduce SEA-RAFT, a more simple, efficient, and accurate RAFT for optical flow. Compared with RAFT, SEA-RAFT is trained with a new loss (mixture of Laplace). It directly regresses an initial flow for faster convergence in iterative refinements and introduces rigid-motion pre-training to improve generalization. SEA-RAFT achieves state-of-the-art accuracy on the Spring benchmark with a 3.69 end… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.14292  [pdf, other

    cs.CV cs.RO

    A New Method in Facial Registration in Clinics Based on Structure Light Images

    Authors: Pengfei Li, Ziyue Ma, Hong Wang, Juan Deng, Yan Wang, Zhenyu Xu, Feng Yan, Wenjun Tu, Hong Sha

    Abstract: Background and Objective: In neurosurgery, fusing clinical images and depth images that can improve the information and details is beneficial to surgery. We found that the registration of face depth images was invalid frequently using existing methods. To abundant traditional image methods with depth information, a method in registering with depth images and traditional clinical images was investi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.10864  [pdf, other

    cs.CV cs.LG

    Improving face generation quality and prompt following with synthetic captions

    Authors: Michail Tarasiou, Stylianos Moschoglou, Jiankang Deng, Stefanos Zafeiriou

    Abstract: Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere closely to the text prompts remains a considerable challenge. This issue is particularly pronounced when trying to generate photorealistic images of humans. Without s… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  32. arXiv:2405.10800  [pdf, other

    cs.LG

    Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting

    Authors: Zheng Dong, Renhe Jiang, Haotian Gao, Hangchen Liu, **liang Deng, Qingsong Wen, Xuan Song

    Abstract: Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heter… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD'24 Research Track

  33. arXiv:2405.03255  [pdf, other

    cs.LG

    Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning

    Authors: Jiewen Deng, Renhe Jiang, Jiaqi Zhang, Xuan Song

    Abstract: Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST fore… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024 Main Track

  34. arXiv:2405.01990  [pdf, other

    cs.LG

    Soft Label PU Learning

    Authors: Puning Zhao, **tao Deng, Xu Cheng

    Abstract: PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels acc… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  35. arXiv:2404.19149  [pdf, other

    cs.CV

    SAGS: Structure-Aware 3D Gaussian Splatting

    Authors: Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, Stefanos Zafeiriou

    Abstract: Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inhe… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures, 3 tables

  36. An Element-Wise Weights Aggregation Method for Federated Learning

    Authors: Yi Hu, Hanchi Ren, Chen Hu, **g**g Deng, Xianghua Xie

    Abstract: Federated learning (FL) is a powerful Machine Learning (ML) paradigm that enables distributed clients to collaboratively learn a shared global model while kee** the data on the original device, thereby preserving privacy. A central challenge in FL is the effective aggregation of local model weights from disparate and potentially unbalanced participating clients. Existing methods often treat each… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

  37. arXiv:2404.15263  [pdf, other

    cs.CV

    Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

    Authors: Lahav Lipson, Jia Deng

    Abstract: We introduce a new system for Multi-Session SLAM, which tracks camera motion across multiple disjoint videos under a single global reference. Our approach couples the prediction of optical flow with solver layers to estimate camera pose. The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. The full system can connect disjoint sequences, perform vi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  38. arXiv:2404.14693  [pdf, other

    cs.CR cs.CV eess.IV

    Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition

    Authors: Yunming Zhang, Dengpan Ye, Sipeng Shen, Caiyun Xie, Ziyi Liu, Jiacheng Deng, Long Tang

    Abstract: The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  39. arXiv:2404.12699  [pdf, other

    cs.LG

    SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models

    Authors: Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Liangming Xia, Yijie Bai, Haiqin Weng, Wenyuan Xu

    Abstract: Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-t… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Symposium on Security and Privacy 2024

  40. arXiv:2404.10985  [pdf, ps, other

    cs.CV stat.ML

    Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images

    Authors: Junbiao Pang, Zailin Dong, Jiaxin Deng, Mengyuan Zhu, Yunwei Zhang

    Abstract: Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD revision, semantic-based management, and the generation of 3D prototypes in both the architecture and engineering industries. Labeling symbols from a CAD drawing is a challenging yet notorious task from a practical point of view. In this work, we propose to label and spot symbols from CAD images that are converted from CAD… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 10 pages, 10 figures,6 tables

  41. arXiv:2404.06666  [pdf, other

    cs.CV cs.AI cs.CL cs.CR

    SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

    Authors: Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu

    Abstract: Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexual scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing impro… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Journal ref: ACM Conference on Computer and Communications Security (CCS 2024)

  42. arXiv:2404.06211  [pdf, other

    cs.CV

    Unified Physical-Digital Attack Detection Challenge

    Authors: Haocheng Yuan, Ajian Liu, Junze Zheng, Jun Wan, Jiankang Deng, Sergio Escalera, Hugo Jair Escalante, Isabelle Guyon, Zhen Lei

    Abstract: Face Anti-Spoofing (FAS) is crucial to safeguard Face Recognition (FR) Systems. In real-world scenarios, FRs are confronted with both physical and digital attacks. However, existing algorithms often address only one type of attack at a time, which poses significant limitations in real-world scenarios where FR systems face hybrid physical-digital threats. To facilitate the research of Unified Attac… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 10 figures

  43. arXiv:2404.04990  [pdf, other

    cs.CL

    MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models

    Authors: Zihao Wei, **gcheng Deng, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  44. arXiv:2404.02663  [pdf

    eess.SP cs.IT

    Ground-to-UAV sub-Terahertz channel measurement and modeling

    Authors: Da Li, Peian Li, Jiabiao Zhao, Jianjian Liang, Jiacheng Liu, Guohao Liu, Yuanshuai Lei, Wenbo Liu, Jianqin Deng, Fuyong Liu, Jianjun Ma

    Abstract: Unmanned Aerial Vehicle (UAV) assisted terahertz (THz) wireless communications have been expected to play a vital role in the next generation of wireless networks. UAVs can serve as either repeaters or data collectors within the communication link, thereby potentially augmenting the efficacy of communication systems. Despite their promise, the channel analysis and modeling specific to THz wireless… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Submitted to Optics Express

  45. arXiv:2404.02078  [pdf, other

    cs.AI cs.CL cs.LG

    Advancing LLM Reasoning Generalists with Preference Trees

    Authors: Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun

    Abstract: We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 1… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Models and data are available at https://github.com/OpenBMB/Eurus

  46. arXiv:2404.01127  [pdf, other

    cs.CV cs.AI

    Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

    Authors: Yulin Chen, Guoheng Huang, Kai Huang, Zi** Lin, Guo Zhong, Shenghong Luo, Jie Deng, Jian Zhou

    Abstract: Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  47. arXiv:2404.00549  [pdf

    eess.IV cs.CV

    Pneumonia App: a mobile application for efficient pediatric pneumonia diagnosis using explainable convolutional neural networks (CNN)

    Authors: Jiaming Deng, Zhenglin Chen, Minjiang Chen, Lulu Xu, Jiaqi Yang, Zhendong Luo, Peiwu Qin

    Abstract: Mycoplasma pneumoniae pneumonia (MPP) poses significant diagnostic challenges in pediatric healthcare, especially in regions like China where it's prevalent. We introduce PneumoniaAPP, a mobile application leveraging deep learning techniques for rapid MPP detection. Our approach capitalizes on convolutional neural networks (CNNs) trained on a comprehensive dataset comprising 3345 chest X-ray (CXR)… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 27 Pages,7 figures

    MSC Class: 68 ACM Class: J.3

  48. arXiv:2404.00301  [pdf, other

    cs.CV

    Monocular Identity-Conditioned Facial Reflectance Reconstruction

    Authors: Xingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang

    Abstract: Recent 3D face reconstruction methods have made remarkable advancements, yet there remain huge challenges in monocular high-quality facial reflectance reconstruction. Existing methods rely on a large amount of light-stage captured data to learn facial reflectance models. However, the lack of subject diversity poses challenges in achieving good generalization and widespread applicability. In this p… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  49. arXiv:2403.19708  [pdf, other

    cs.CL cs.LG

    Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention

    Authors: Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, Pengfei Zuo

    Abstract: Interacting with humans through multi-turn conversations is a fundamental feature of large language models (LLMs). However, existing LLM serving engines executing multi-turn conversations are inefficient due to the need to repeatedly compute the key-value (KV) caches of historical tokens, incurring high serving costs. To address the problem, this paper proposes CachedAttention, a new attention mec… ▽ More

    Submitted 30 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted to USENIX Annual Technical Conference (ATC) 2024

  50. arXiv:2403.18493  [pdf, other

    cs.CV

    VersaT2I: Improving Text-to-Image Models with Versatile Reward

    Authors: Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.