Skip to main content

Showing 1–19 of 19 results for author: Lian, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.03757  [pdf, other

    cs.CV cs.CL cs.LG

    The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs

    Authors: Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang

    Abstract: Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typi… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  2. arXiv:2401.02906  [pdf, other

    cs.CR cs.CL cs.CV

    MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

    Authors: Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang

    Abstract: The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not cons… ▽ More

    Submitted 17 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  3. arXiv:2311.09677  [pdf, other

    cs.CL

    R-Tuning: Instructing Large Language Models to Say `I Don't Know'

    Authors: Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang

    Abstract: Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. A predominant issue is the propensity for these models to generate non-existent facts, a concern termed hallucination. Our research is motivated by the observation that previous instruction tuning methods force the model to complete a sentence no matter whether the m… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  4. arXiv:2310.11346  [pdf, other

    cs.CV

    Towards Generalizable Multi-Camera 3D Object Detection via Perspective Debiasing

    Authors: Hao Lu, Yunpeng Zhang, Qing Lian, Dalong Du, Yingcong Chen

    Abstract: Detecting objects in 3D space using multiple cameras, known as Multi-Camera 3D Object Detection (MC3D-Det), has gained prominence with the advent of bird's-eye view (BEV) approaches. However, these methods often struggle when faced with unfamiliar testing environments due to the lack of diverse training data encompassing various viewpoints and environments. To address this, we propose a novel meth… ▽ More

    Submitted 25 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  5. arXiv:2309.09599  [pdf, other

    cs.CV cs.LG cs.RO

    MEDL-U: Uncertainty-aware 3D Automatic Annotation based on Evidential Deep Learning

    Authors: Helbert Paat, Qing Lian, Weilong Yao, Tong Zhang

    Abstract: Advancements in deep learning-based 3D object detection necessitate the availability of large-scale datasets. However, this requirement introduces the challenge of manual annotation, which is often both burdensome and time-consuming. To tackle this issue, the literature has seen the emergence of several weakly supervised frameworks for 3D object detection which can automatically generate pseudo la… ▽ More

    Submitted 15 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted at ICRA 2024. Code: https://github.com/paathelb/MEDL-U

  6. arXiv:2309.02476  [pdf, other

    stat.ML cs.LG

    Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

    Authors: Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang

    Abstract: Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational resources. To mitigate these challenges, researchers have explored the use of informative subset selection techniques, including coreset selection and active learning. Specifically, coreset selection involves sampling data with both input ($\bx$) and o… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  7. arXiv:2309.01351  [pdf, other

    cs.CV

    Adv3D: Generating 3D Adversarial Examples in Driving Scenarios with NeRF

    Authors: Leheng Li, Qing Lian, Ying-Cong Chen

    Abstract: Deep neural networks (DNNs) have been proven extremely susceptible to adversarial examples, which raises special safety-critical concerns for DNN-based autonomous driving stacks (i.e., 3D object detection). Although there are extensive works on image-level attacks, most are restricted to 2D pixel spaces, and such attacks are not always physically realistic in our 3D world. Here we present Adv3D, t… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  8. arXiv:2304.03526  [pdf, other

    cs.CV

    Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field

    Authors: Leheng Li, Qing Lian, Luozhou Wang, Ningning Ma, Ying-Cong Chen

    Abstract: This work explores the use of 3D generative models to synthesize training data for 3D vision tasks. The key requirements of the generative models are that the generated data should be photorealistic to match the real-world scenarios, and the corresponding 3D attributes should be aligned with given sampling labels. However, we find that the recent NeRF-based 3D GANs hardly meet the above requiremen… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  9. arXiv:2303.16628  [pdf, other

    cs.CV

    DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking

    Authors: Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang

    Abstract: Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation. However, they typically assume all the objects are static and directly aggregate features across frames. This work begins with a theoretical and empirical analysis to reveal that ignoring the motion of moving objects can result in serious loca… ▽ More

    Submitted 18 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  10. arXiv:2211.07190  [pdf, other

    cs.NI cs.AI cs.CV

    TriDoNet: A Triple Domain Model-driven Network for CT Metal Artifact Reduction

    Authors: Baoshun Shi, Ke Jiang, Shaolei Zhang, Qiusheng Lian, Yanwei Qin

    Abstract: Recent deep learning-based methods have achieved promising performance for computed tomography metal artifact reduction (CTMAR). However, most of them suffer from two limitations: (i) the domain knowledge is not fully embedded into the network training; (ii) metal artifacts lack effective representation models. The aforementioned limitations leave room for further performance improvement. Against… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: 6 pages, 3 figures

  11. arXiv:2207.12716  [pdf, other

    cs.CV cs.RO

    MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

    Authors: Tai Wang, Qing Lian, Chenming Zhu, Xinge Zhu, Wenwei Zhang

    Abstract: In this technical report, we present our solution, dubbed MV-FCOS3D++, for the Camera-Only 3D Detection track in Waymo Open Dataset Challenge 2022. For multi-view camera-only 3D detection, methods based on bird-eye-view or 3D geometric representations can leverage the stereo cues from overlapped regions between adjacent views and directly perform 3D detection without hand-crafted post-processing.… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Technical report

  12. arXiv:2203.08563  [pdf, other

    cs.CV

    MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

    Authors: Qing Lian, Peiliang Li, Xiaozhi Chen

    Abstract: Due to the inherent ill-posed nature of 2D-3D projection, monocular 3D object detection lacks accurate depth recovery ability. Although the deep neural network (DNN) enables monocular depth-sensing from high-level learned features, the pixel-level cues are usually omitted due to the deep convolution mechanism. To benefit from both the powerful feature representation in DNN and pixel-level geometri… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  13. arXiv:2108.09076  [pdf, other

    cs.LG cs.IR

    PASTO: Strategic Parameter Optimization in Recommendation Systems -- Probabilistic is Better than Deterministic

    Authors: Weicong Ding, Hanlin Tang, **gshuo Feng, Lei Yuan, Sen Yang, Guangxu Yang, Jie Zheng, **g Wang, Qiang Su, Dong Zheng, Xuezhong Qiu, Yongqi Liu, Yuxuan Chen, Yang Liu, Chao Song, Dongying Kong, Kai Ren, Peng Jiang, Qiao Lian, Ji Liu

    Abstract: Real-world recommendation systems often consist of two phases. In the first phase, multiple predictive models produce the probability of different immediate user actions. In the second phase, these predictions are aggregated according to a set of 'strategic parameters' to meet a diverse set of business goals, such as longer user engagement, higher revenue potential, or more community/network inter… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  14. arXiv:2106.00925  [pdf, other

    cs.LG stat.ML

    Contrastive ACE: Domain Generalization Through Alignment of Causal Mechanisms

    Authors: Yunqi Wang, Furui Liu, Zhitang Chen, Qing Lian, Shoubo Hu, Jianye Hao, Yik-Chung Wu

    Abstract: Domain generalization aims to learn knowledge invariant across different distributions while semantically meaningful for downstream tasks from multiple source domains, to improve the model's generalization ability on unseen target domains. The fundamental objective is to understand the underlying "invariance" behind these observational distributions and such invariance has been shown to have a clo… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

  15. arXiv:2105.00381  [pdf, other

    cs.CV cs.AI

    AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for Automated Evaluation of Root Canal Therapy

    Authors: Yunxiang Li, Guodong Zeng, Yifan Zhang, Jun Wang, Qianni Zhang, Qun **, Lingling Sun, Qisi Lian, Neng Xia, Ruizi Peng, Kai Tang, Yaqi Wang, Shuai Wang

    Abstract: Accurate evaluation of the treatment result on X-ray images is a significant and challenging step in root canal therapy since the incorrect interpretation of the therapy results will hamper timely follow-up which is crucial to the patients' treatment outcome. Nowadays, the evaluation is performed in a manual manner, which is time-consuming, subjective, and error-prone. In this paper, we aim to aut… ▽ More

    Submitted 28 October, 2021; v1 submitted 1 May, 2021; originally announced May 2021.

    Comments: under review

  16. arXiv:2104.05858  [pdf, other

    cs.CV

    Exploring Geometric Consistency for Monocular 3D Object Detection

    Authors: Qing Lian, Botao Ye, Ruijia Xu, Weilong Yao, Tong Zhang

    Abstract: This paper investigates the geometric consistency for monocular 3D object detection, which suffers from the ill-posed depth estimation. We first conduct a thorough analysis to reveal how existing methods fail to consistently localize objects when different geometric shifts occur. In particular, we design a series of geometric manipulations to diagnose existing detectors and then illustrate their v… ▽ More

    Submitted 21 May, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

  17. arXiv:2010.02637  [pdf, other

    cs.LG cs.AI stat.ML

    Weakly Supervised Disentangled Generative Causal Representation Learning

    Authors: Xinwei Shen, Furui Liu, Hanze Dong, Qing Lian, Zhitang Chen, Tong Zhang

    Abstract: This paper proposes a Disentangled gEnerative cAusal Representation (DEAR) learning method under appropriate supervised information. Unlike existing disentanglement methods that enforce independence of the latent variables, we consider the general case where the underlying factors of interests can be causally related. We show that previous methods with independent priors fail to disentangle causal… ▽ More

    Submitted 24 August, 2022; v1 submitted 6 October, 2020; originally announced October 2020.

    Journal ref: Journal of Machine Learning Research 23(241): 1-55, 2022

  18. arXiv:1908.09547  [pdf, other

    cs.CV

    Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach

    Authors: Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong

    Abstract: We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains. Our approach draws on an insight connecting two existing works: curriculum domain adaptation and self-training. Inspired by the former, PyCDA constructs a pyramid curriculum which c… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  19. arXiv:1905.01068  [pdf, other

    cs.CV

    Known-class Aware Self-ensemble for Open Set Domain Adaptation

    Authors: Qing Lian, Wen Li, Lin Chen, Lixin Duan

    Abstract: Existing domain adaptation methods generally assume different domains have the identical label space, which is quite restrict for real-world applications. In this paper, we focus on a more realistic and challenging case of open set domain adaptation. Particularly, in open set domain adaptation, we allow the classes from the source and target domains to be partially overlapped. In this case, the as… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.