Skip to main content

Showing 1–50 of 243 results for author: Liang, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01016  [pdf, other

    cs.CV

    SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection

    Authors: Dingkang Liang, Wei Hua, Chunsheng Shi, Zhikang Zou, Xiaoqing Ye, Xiang Bai

    Abstract: Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects common in aerial images unexplored. At the same time, the annotation cost of multi-oriented objects is significantly higher than that of their horizontal counterparts. Ther… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.14952  [pdf, other

    cs.CL

    ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

    Authors: Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

    Abstract: Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Pre-print

  3. arXiv:2406.11397  [pdf, other

    cs.LG cs.AI stat.ML

    DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting

    Authors: Daojun Liang, Haixia Zhang, Dongfeng Yuan

    Abstract: Traditional regression and prediction tasks often only provide deterministic point estimates. To estimate the uncertainty or distribution information of the response variable, methods such as Bayesian inference, model ensembling, or MC Dropout are typically used. These methods either assume that the posterior distribution of samples follows a Gaussian process or require thousands of forward passes… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.07594  [pdf, other

    cs.CL cs.AI cs.CR

    MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

    Authors: Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang, Yan Teng, Yu Qiao, Yingchun Wang

    Abstract: Powered by remarkable advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks. However, the practical application scenarios of MLLMs are intricate, exposing them to potential malicious instructions and thereby posing safety risks. While current benchmarks do incorporate certain safety considerations, they often la… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.04801  [pdf, other

    cs.CV

    MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

    Authors: Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai

    Abstract: The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency. However, training MoE models from scratch demands extensive data and computational resources. Moreover, public repositories like timm mainly provide pre-trained dense checkpoints, lacking similar resources for M… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

    ACM Class: I.2

  6. arXiv:2405.19736  [pdf, other

    cs.AI

    Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications

    Authors: Dayang Liang, **yang Lai, Yunlong Liu

    Abstract: How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications, and current approaches usually learn task-relevant state representations within visual reinforcement learning to address this problem. While prior work typically introduces one-step behavioral similarity metrics with elements (e.g., rewards and actions) to extract task-relevant state… ▽ More

    Submitted 30 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  7. arXiv:2405.13152  [pdf, other

    cs.CV cs.AI

    Enhancing Interaction Modeling with Agent Selection and Physical Methods for Trajectory Prediction

    Authors: Shiji Huang, Lei Ye, Min Chen, Wenhai Luo, Chenqi Xu, Deyuan Liang, Dihong Wang

    Abstract: In this study, we address the limitations inherent in most existing vehicle trajectory prediction methodologies that indiscriminately incorporate all agents within a predetermined proximity when accounting for inter-agent interactions. These approaches commonly employ attention-based architecture or graph neural networks for encoding interactions, which introduces three challenges: (i) The indiscr… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: code:https://github.com/kkk00714/ASPILin

  8. arXiv:2405.12434  [pdf, other

    cs.CL

    Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference

    Authors: Yonghao Liu, Mengyu Li, Di Liang, Ximing Li, Fausto Giunchiglia, Lan Huang, Xiaoyue Feng, Renchu Guan

    Abstract: Natural Language Inference (NLI) is a crucial task in natural language processing that involves determining the relationship between two sentences, typically referred to as the premise and the hypothesis. However, traditional NLI models solely rely on the semantic information inherent in independent sentences and lack relevant situational visual information, which can hinder a complete understandi… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: IJCAI24

  9. arXiv:2405.12119  [pdf, other

    cs.IR cs.AI cs.CL

    Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation

    Authors: Zhankui He, Zhouhang Xie, Harald Steck, Dawen Liang, Rahul Jha, Nathan Kallus, Julian McAuley

    Abstract: Large language models (LLMs) are revolutionizing conversational recommender systems by adeptly indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, controlling the distribution of recommended items remains a challenge. This leads to suboptimal performance due to the failure to capture rapidly changing data distributions, such as item p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  10. arXiv:2405.05763  [pdf

    cs.CV cs.AI

    DP-MDM: Detail-Preserving MR Reconstruction via Multiple Diffusion Models

    Authors: Mengxiao Geng, Jiahao Zhu, Xiaolin Zhu, Qiqing Liu, Dong Liang, Qiegen Liu

    Abstract: Detail features of magnetic resonance images play a cru-cial role in accurate medical diagnosis and treatment, as they capture subtle changes that pose challenges for doc-tors when performing precise judgments. However, the widely utilized naive diffusion model has limitations, as it fails to accurately capture more intricate details. To en-hance the quality of MRI reconstruction, we propose a com… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  11. arXiv:2404.08450  [pdf, other

    cs.CV

    Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

    Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

    Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

  12. arXiv:2404.08023  [pdf, other

    q-bio.QM cs.LG

    Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis

    Authors: Zeyu Zhang, Yuanshen Zhao, **gxian Duan, Yaou Liu, Hairong Zheng, Dong Liang, Zhenyu Zhang, Zhi-Cheng Li

    Abstract: The diagnosis and prognosis of cancer are typically based on multi-modal clinical data, including histology images and genomic data, due to the complex pathogenesis and high heterogeneity. Despite the advancements in digital pathology and high-throughput genome sequencing, establishing effective multi-modal fusion models for survival prediction and revealing the potential association between histo… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  13. arXiv:2404.04586  [pdf, other

    cs.CV

    PIE: Physics-inspired Low-light Enhancement

    Authors: Dong Liang, Zhengyan Xu, Ling Li, Mingqiang Wei, Songcan Chen

    Abstract: In this paper, we propose a physics-inspired contrastive learning paradigm for low-light enhancement, called PIE. PIE primarily addresses three issues: (i) To resolve the problem of existing learning-based methods often training a LLE model with strict pixel-correspondence image pairs, we eliminate the need for pixel-correspondence paired training data and instead train with unpaired images. (ii)… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2112.06451

  14. arXiv:2404.03813  [pdf, ps, other

    quant-ph cs.LG

    Agnostic Tomography of Stabilizer Product States

    Authors: Sabee Grewal, Vishnu Iyer, William Kretschmer, Daniel Liang

    Abstract: We define a quantum learning task called agnostic tomography, where given copies of an arbitrary state $ρ$ and a class of quantum states $\mathcal{C}$, the goal is to output a succinct description of a state that approximates $ρ$ at least as well as any state in $\mathcal{C}$ (up to some small error $\varepsilon$). This task generalizes ordinary quantum tomography of states in $\mathcal{C}$ and is… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 20 pages

  15. Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

    Authors: Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng

    Abstract: Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (… ▽ More

    Submitted 9 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures. IEEE Transactions on Multimedia, 2024

  16. arXiv:2404.00126  [pdf, ps, other

    quant-ph cs.CC

    Pseudoentanglement Ain't Cheap

    Authors: Sabee Grewal, Vishnu Iyer, William Kretschmer, Daniel Liang

    Abstract: We show that any pseudoentangled state ensemble with a gap of $t$ bits of entropy requires $Ω(t)$ non-Clifford gates to prepare. This bound is tight up to polylogarithmic factors if linear-time quantum-secure pseudorandom functions exist. Our result follows from a polynomial-time algorithm to estimate the entanglement entropy of a quantum state across any cut of qubits. When run on an $n$-qubit st… ▽ More

    Submitted 11 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: 15 pages; v2: slight edits to concurrent work section

  17. arXiv:2403.15132  [pdf, other

    cs.CV eess.IV

    Transfer CLIP for Generalizable Image Denoising

    Authors: Jun Cheng, Dong Liang, Shan Tan

    Abstract: Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-w… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  18. arXiv:2403.14972  [pdf, other

    cs.AI cs.CL cs.MA cs.MM

    A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

    Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

    Abstract: This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the iss… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Work in progress

  19. arXiv:2403.09493  [pdf, other

    cs.CV

    Anomaly Detection by Adapting a pre-trained Vision Language Model

    Authors: Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Bai

    Abstract: Recently, large vision and language models have shown their success when adapting them to many downstream tasks. In this paper, we present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model. To this end, we make two important improvements: 1) To acquire unified anomaly detection across industrial images of multiple categories, we introduce the learnable p… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  20. arXiv:2403.06323  [pdf, other

    cs.LG

    Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL

    Authors: Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun

    Abstract: We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via reductions to standard RL: one based on optimistic algorithms and another based on policy optimization.… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  21. arXiv:2403.05385  [pdf, other

    cs.LG

    Switching the Loss Reduces the Cost in Batch Reinforcement Learning

    Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári

    Abstract: We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  22. arXiv:2403.02151  [pdf, other

    cs.CV

    TripoSR: Fast 3D Object Reconstruction from a Single Image

    Authors: Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, Yan-Pei Cao

    Abstract: This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0.5 seconds. Building upon the LRM network architecture, TripoSR integrates substantial improvements in data processing, model design, and training techniques. Evaluations on public datasets show that TripoSR exh… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Model: https://huggingface.co/stabilityai/TripoSR Code: https://github.com/VAST-AI-Research/TripoSR Demo: https://huggingface.co/spaces/stabilityai/TripoSR

  23. arXiv:2403.01439  [pdf, other

    cs.CV

    Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

    Authors: Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, Xiang Bai

    Abstract: Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. However, existing methods for model adaptation usually update all model parameters, i.e., full fine-tuning paradigm, which is inefficient as it relies on high computational costs (e.g., training GPU memory) and massive storage space. In this paper, we aim to study parameter-efficient transfer… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Code is available at https://github.com/LMD0311/DAPT

  24. arXiv:2402.17521  [pdf, other

    cs.CV

    AVS-Net: Point Sampling with Adaptive Voxel Size for 3D Scene Understanding

    Authors: Hongcheng Yang, Dingkang Liang, Dingyuan Zhang, Zhe Liu, Zhikang Zou, Xingyu Jiang, Yingying Zhu

    Abstract: The recent advancements in point cloud learning have enabled intelligent vehicles and robots to comprehend 3D environments better. However, processing large-scale 3D scenes remains a challenging problem, such that efficient downsampling methods play a crucial role in point cloud learning. Existing downsampling methods either require a huge computational burden or sacrifice fine-grained geometric i… ▽ More

    Submitted 15 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 10 pages, 7 figures

  25. arXiv:2402.14598  [pdf, other

    cs.NE cs.LG

    Brain-inspired Distributed Memorization Learning for Efficient Feature-free Unsupervised Domain Adaptation

    Authors: Jianming Lv, Depin Liang, Zequan Liang, Yaobin Zhang, Sijun Xia

    Abstract: Compared with gradient based artificial neural networks, biological neural networks usually show a more powerful generalization ability to quickly adapt to unknown environments without using any gradient back-propagation procedure. Inspired by the distributed memory mechanism of human brains, we propose a novel gradient-free Distributed Memorization Learning mechanism, namely DML, to support quick… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 15 pages,15 figures

  26. arXiv:2402.13188  [pdf, other

    cs.CL

    Question Calibration and Multi-Hop Modeling for Temporal Question Answering

    Authors: Chao Xue, Di Liang, Pengfei Wang, **g Zhang

    Abstract: Many models that leverage knowledge graphs (KGs) have recently demonstrated remarkable success in question answering (QA) tasks. In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention. Despite the fruitful efforts of previous models in temporal KGQA, they still have several limitations. (I) They adopt pre-trained language models (PL… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024

  27. arXiv:2402.10739  [pdf, other

    cs.CV

    PointMamba: A Simple State Space Model for Point Cloud Analysis

    Authors: Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai

    Abstract: Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity, making the design of a linear complexity method with global modeling appealing. In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM)… ▽ More

    Submitted 29 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Update the architecture and performance. The code is available at https://github.com/LMD0311/PointMamba

  28. arXiv:2402.05954  [pdf, other

    cs.LG

    EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features

    Authors: Jianming Lv, Sijun Xia, Depin Liang, Wei Chen

    Abstract: Traditional model-free feature selection methods treat each feature independently while disregarding the interrelationships among features, which leads to relatively poor performance compared with the model-aware methods. To address this challenge, we propose an efficient model-free feature selection framework via elastic expansion and compression of the features, namely EasyFS, to achieve better… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  29. arXiv:2402.05740  [pdf, other

    cs.IR

    CounterCLR: Counterfactual Contrastive Learning with Non-random Missing Data in Recommendation

    Authors: Jun Wang, Haoxuan Li, Chi Zhang, Dongxu Liang, Enyun Yu, Wenwu Ou, Wenjia Wang

    Abstract: Recommender systems are designed to learn user preferences from observed feedback and comprise many fundamental tasks, such as rating prediction and post-click conversion rate (pCVR) prediction. However, the observed feedback usually suffer from two issues: selection bias and data sparsity, where biased and insufficient feedback seriously degrade the performance of recommender systems in terms of… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 2023 IEEE International Conference on Data Mining (ICDM)

  30. arXiv:2402.02332  [pdf, other

    cs.LG

    Minusformer: Improving Time Series Forecasting by Progressively Learning Residuals

    Authors: Daojun Liang, Haixia Zhang, Dongfeng Yuan, Bingzheng Zhang, Minggao Zhang

    Abstract: In this paper, we find that ubiquitous time series (TS) forecasting models are prone to severe overfitting. To cope with this problem, we embrace a de-redundancy approach to progressively reinstate the intrinsic values of TS for future intervals. Specifically, we introduce a dual-stream and subtraction mechanism, which is a deep Boosting ensemble learning method. And the vanilla Transformer is ren… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  31. arXiv:2401.17509  [pdf, other

    cs.CV

    Anything in Any Scene: Photorealistic Video Object Insertion

    Authors: Chen Bai, Zeman Shao, Guoxiang Zhang, Di Liang, Jie Yang, Zhuorui Zhang, Yujian Guo, Chengzhang Zhong, Yiqiao Qiu, Zhendong Wang, Yichen Guan, Xiaoyin Zheng, Tao Wang, Cheng Lu

    Abstract: Realistic video simulation has shown significant potential across diverse applications, from virtual reality to film production. This is particularly true for scenarios where capturing videos in real-world settings is either impractical or expensive. Existing approaches in video simulation often fail to accurately model the lighting environment, represent the object geometry, or achieve high level… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  32. You Only Look Bottom-Up for Monocular 3D Object Detection

    Authors: Kaixin Xiong, Dingyuan Zhang, Dingkang Liang, Zhe Liu, Hongcheng Yang, Wondimu Dikubab, Jianwei Cheng, Xiang Bai

    Abstract: Monocular 3D Object Detection is an essential task for autonomous driving. Meanwhile, accurate 3D object detection from pure images is very challenging due to the loss of depth information. Most existing image-based methods infer objects' location in 3D space based on their 2D sizes on the image plane, which usually ignores the intrinsic position clues from images, leading to unsatisfactory perfor… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Robotics and Automation Letters (RA-L)

  33. arXiv:2401.10516  [pdf, other

    cs.LG cs.AI

    Episodic Reinforcement Learning with Expanded State-reward Space

    Authors: Dayang Liang, Yaru Zhang, Yunlong Liu

    Abstract: Empowered by deep neural networks, deep reinforcement learning (DRL) has demonstrated tremendous empirical successes in various domains, including games, health care, and autonomous driving. Despite these advancements, DRL is still identified as data-inefficient as effective policies demand vast numbers of environmental samples. Recently, episodic control (EC)-based model-free DRL methods enable s… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted at AAMAS'24

  34. arXiv:2312.09147  [pdf, other

    cs.CV

    Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

    Authors: Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang

    Abstract: Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and opt… ▽ More

    Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Project Page: https://zouzx.github.io/TriplaneGaussian/

  35. arXiv:2312.08754  [pdf, other

    cs.CV

    UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

    Authors: Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang

    Abstract: Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects. Despite these developments, a prevalent limitation arises from the use of RGB data in diffusion or reconstruction models, which often results in models with inherent lighting and shadows effects that detract fro… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  36. arXiv:2312.06725  [pdf, other

    cs.CV

    EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

    Authors: Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng

    Abstract: Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image. Recent methods that introduce 3D global representation into diffusion models have shown the potential to generate consistent multiviews, but they have reduced generation speed and face challenges in maintaining generalizability and quality. To address this issue, we propose E… ▽ More

    Submitted 2 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: https://huanngzh.github.io/EpiDiff/

  37. arXiv:2312.04032  [pdf, other

    cs.CL cs.LG

    RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training

    Authors: Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, Madian Khabsa

    Abstract: Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives of robustness for LMs have been studied independently, but lacking a unified consideration in multiple perspectives. In this paper, we propose Robustifying LMs… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 33 pages, accepted at EMNLP 2023 Findings

  38. arXiv:2311.14473  [pdf, other

    eess.IV cs.CV

    Joint Diffusion: Mutual Consistency-Driven Diffusion Model for PET-MRI Co-Reconstruction

    Authors: Taofeng Xie, Zhuo-Xu Cui, Chen Luo, Huayu Wang, Congcong Liu, Yuanzhi Zhang, Xuemei Wang, Yanjie Zhu, Qiyu **, Guoqing Chen, Yihang Zhou, Dong Liang, Haifeng Wang

    Abstract: Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI sy… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  39. arXiv:2311.10725  [pdf, ps, other

    cs.CY cs.AI

    Should they? Mobile Biometrics and Technopolicy meet Queer Community Considerations

    Authors: Anaelia Ovalle, Davi Liang, Alicia Boyd

    Abstract: Smartphones are integral to our daily lives and activities, providing us with basic functions like texting and phone calls to more complex motion-based functionalities like navigation, mobile gaming, and fitness-tracking. To facilitate these functionalities, smartphones rely on integrated sensors like accelerometers and gyroscopes. These sensors provide personalized measurements that, in turn, con… ▽ More

    Submitted 6 October, 2023; originally announced November 2023.

    Comments: To appear at 2023 ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization

    ACM Class: K.4; K.5; I.2

  40. arXiv:2311.09544  [pdf, other

    cs.IR cs.AI cs.LG

    Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

    Authors: Wei Zhang, Dai Li, Chen Liang, Fang Zhou, Zhongke Zhang, Xuewei Wang, Ru Li, Yi Zhou, Yaning Huang, Dong Liang, Kai Wang, Zhangyuan Wang, Zhengxing Chen, Fenggang Wu, Minghai Chen, Huayu Li, Yunnan Wu, Zhan Shu, Mindi Yuan, Sri Reddy

    Abstract: Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user represe… ▽ More

    Submitted 22 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 8 pages, 3 figures

    MSC Class: 68T05; 68T30 ACM Class: I.2.1; H.3.5; H.3.3

    Journal ref: Companion Proceedings of the ACM Web Conference 2024 (WWW '24 Companion), May 13--17, 2024, Singapore, Singapore

  41. arXiv:2311.03074  [pdf, other

    eess.IV cs.CV

    A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection

    Authors: Wenxin Wang, Zhuo-Xu Cui, Guanxun Cheng, Chentao Cao, Xi Xu, Ziwei Liu, Haifeng Wang, Yulong Qi, Dong Liang, Yanjie Zhu

    Abstract: Accurate detection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework Two-Stage Generative Model (TSGM) that combines Cyc… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 11 pages,9 figures,3 tables

  42. arXiv:2311.02849  [pdf, other

    cs.CL cs.AI

    Co-training and Co-distillation for Quality Improvement and Compression of Language Models

    Authors: Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Hongbo Zhang, Sung Ju Hwang, Alexander Min

    Abstract: Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed. To address this issue, we p… ▽ More

    Submitted 7 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Findings of EMNLP 2023

  43. arXiv:2310.19415  [pdf, other

    cs.CV cs.AI cs.GR

    Text-to-3D with Classifier Score Distillation

    Authors: Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan Qi

    Abstract: Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the ro… ▽ More

    Submitted 31 October, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation

  44. arXiv:2310.19019  [pdf, other

    cs.CL cs.AI

    TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

    Authors: Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan

    Abstract: Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of jus… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: 5 figures, 15 pages

  45. arXiv:2310.15433  [pdf, other

    cs.LG cs.IR

    Off-Policy Evaluation for Large Action Spaces via Policy Convolution

    Authors: Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Develo** accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiase… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Under review. 36 pages, 31 figures

  46. arXiv:2310.07585  [pdf, other

    cs.CV

    A Discrepancy Aware Framework for Robust Anomaly Detection

    Authors: Yuxuan Cai, Dingkang Liang, Dongliang Luo, Xinwei He, Xin Yang, Xiang Bai

    Abstract: Defect detection is a critical research area in artificial intelligence. Recently, synthetic data-based self-supervised learning has shown great potential on this task. Although many sophisticated synthesizing strategies exist, little research has been done to investigate the robustness of models when faced with different strategies. In this paper, we focus on this issue and find that existing met… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Transactions on Industrial Informatics. Code is available at: https://github.com/caiyuxuan1120/DAF

  47. arXiv:2309.14764  [pdf, other

    cs.CV

    InvKA: Gait Recognition via Invertible Koopman Autoencoder

    Authors: Fan Li, Dong Liang, **g Lian, Qidong Liu, Hegui Zhu, Jizhao Liu

    Abstract: Most current gait recognition methods suffer from poor interpretability and high computational cost. To improve interpretability, we investigate gait features in the embedding space based on Koopman operator theory. The transition matrix in this space captures complex kinematic features of gait cycles, namely the Koopman operator. The diagonal elements of the operator matrix can represent the over… ▽ More

    Submitted 27 September, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  48. arXiv:2309.13916  [pdf, other

    eess.AS cs.SD

    Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors

    Authors: Di Liang, Nian Shao, Xiaofei Li

    Abstract: This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion. To frame-wisely detect a flexible number of speakers and extract/update their corresponding attractors, we propose to leverage a causal speaker embedding encoder and an online non-autoregressive self-attention-based attractor decoder. A look-ahead mechanism is adopted to… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  49. arXiv:2309.13571  [pdf, other

    eess.IV cs.CV

    Matrix Completion-Informed Deep Unfolded Equilibrium Models for Self-Supervised k-Space Interpolation in MRI

    Authors: Chen Luo, Huayu Wang, Taofeng Xie, Qiyu **, Guoqing Chen, Zhuo-Xu Cui, Dong Liang

    Abstract: Recently, regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL while retaining the theoretical guarantees of regularization models. However, most of these methods are tailored for supervised learning scenarios that necessitate fully sampled labels, which can pose challenges in practical MRI app… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  50. arXiv:2309.12628  [pdf, other

    cs.LG

    Sequential Action-Induced Invariant Representation for Reinforcement Learning

    Authors: Dayang Liang, Qihang Chen, Yunlong Liu

    Abstract: How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.