Skip to main content

Showing 1–50 of 259 results for author: Tan, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00993  [pdf, other

    cs.AI cs.CL

    Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

    Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

    Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

    Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

    Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (IJCV), 2024

  3. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  4. arXiv:2406.11369  [pdf, other

    cs.CG cs.DS

    Approximation Algorithms for Smallest Intersecting Balls

    Authors: Jiaqi Zheng, Tiow-Seng Tan

    Abstract: We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vecto… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.08481  [pdf, other

    cs.CV

    Enhancing End-to-End Autonomous Driving with Latent World Model

    Authors: Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan

    Abstract: End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.07914  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Understand Spatial Audio?

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  7. arXiv:2406.01154  [pdf, other

    cs.CV

    UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation

    Authors: Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan

    Abstract: Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namel… ▽ More

    Submitted 20 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  8. arXiv:2405.17921  [pdf

    cs.AI cs.CY

    Towards Clinical AI Fairness: Filling Gaps in the Puzzle

    Authors: Mingxuan Liu, Yilin Ning, Salinelat Teixayavong, Xiaoxuan Liu, Mayli Mertens, Yuqing Shang, Xin Li, Di Miao, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Narrendar RaviChandran, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan Liu

    Abstract: The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  9. arXiv:2405.14646  [pdf, other

    cs.CL

    Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

    Authors: Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

    Abstract: The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACL24 Finding

  10. arXiv:2405.09157  [pdf, other

    math.OC cs.CG cs.DC cs.DS

    A Primal-Dual Framework for Symmetric Cone Programming

    Authors: Jiaqi Zheng, Antonios Varvitsiotis, Tiow-Seng Tan, Wayne Lin

    Abstract: In this paper, we introduce a primal-dual algorithmic framework for solving Symmetric Cone Programs (SCPs), a versatile optimization model that unifies and extends Linear, Second-Order Cone (SOCP), and Semidefinite Programming (SDP). Our work generalizes the primal-dual framework for SDPs introduced by Arora and Kale, leveraging a recent extension of the Multiplicative Weights Update method (MWU)… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  11. arXiv:2404.18373  [pdf, other

    cs.NI

    6G comprehensive intelligence: network operations and optimization based on Large Language Models

    Authors: Sifan Long, Fengxiao Tang, Yangfan Li, Tiao Tan, Zhengjie **, Ming Zhao, Nei Kato

    Abstract: The sixth generation mobile communication standard (6G) can promote the development of Industrial Internet and Internet of Things (IoT). To achieve comprehensive intelligent development of the network and provide customers with higher quality personalized services. This paper proposes a network performance optimization and intelligent operation network architecture based on Large Language Model (L… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures, 15 preferences

  12. arXiv:2404.18151  [pdf, other

    cs.LO

    Decidability of Graph Neural Networks via Logical Characterizations

    Authors: Michael Benedikt, Chia-Hsuan Lu, Boris Motik, Tony Tan

    Abstract: We present results concerning the expressiveness and decidability of a popular graph learning formalism, graph neural networks (GNNs), exploiting connections with logic. We use a family of recently-discovered decidable logics involving "Presburger quantifiers". We show how to use these logics to measure the expressiveness of classes of GNNs, in some cases getting exact correspondences between the… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  13. arXiv:2404.17747  [pdf, other

    cs.CV

    MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion

    Authors: **gxue Huang, Xilai Li, Tianshu Tan, Xiaosong Li, Tao Ye

    Abstract: Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space, thereby producing an informative fused image. However, the existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality in certain regions of the fusion results. In this study, we analyzed the… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  14. arXiv:2404.09498  [pdf, other

    cs.CV

    FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

    Authors: Xinyu Xie, Yawen Cui, Chio-In Ieong, Tao Tan, Xiaozhi Zhang, Xubin Zheng, Zitong Yu

    Abstract: Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confro… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  15. arXiv:2404.00838  [pdf, other

    cs.CV

    3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching

    Authors: Yibin Ye, Xichao Teng, Shuo Chen, Yijie Bian, Tao Tan, Zhang Li

    Abstract: Optical-SAR image matching is a fundamental task for image fusion and visual navigation. However, all large-scale open SAR dataset for methods development are collected from single platform, resulting in limited satellite types and spatial resolutions. Since images captured by different sensors vary significantly in both geometric and radiometric appearance, existing methods may fail to match corr… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 20pages 17 figures

  16. arXiv:2403.19278  [pdf, other

    cs.CV

    CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

    Authors: Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan

    Abstract: Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, es… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted into CVPR 2024

  17. arXiv:2403.11172  [pdf, other

    cs.CV

    Artifact Feature Purification for Cross-domain Detection of AI-generated Images

    Authors: Zheling Meng, Bo Peng, **g Dong, Tieniu Tan

    Abstract: In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and image scenes. To relieve this problem, we propose Artifact Purification Network (APN) to facilitate the artifact extraction fr… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: This work is under consideration at Computer Vision and Image Understanding

  18. arXiv:2403.07408  [pdf, other

    cs.CV

    NightHaze: Nighttime Image Dehazing via Self-Prior Learning

    Authors: Beibei Lin, Yeying **, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan

    Abstract: Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with s… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  19. arXiv:2403.07350  [pdf, other

    cs.CL cs.AI cs.CV

    VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

    Authors: Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan

    Abstract: Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 9+11 pages (main+appendix), 7 figures, 13 tables. $\href{https://github.com/VLKEB/VLKEB}{\text{get code and data}}$

  20. arXiv:2403.05262  [pdf, other

    cs.CV

    Debiasing Multimodal Large Language Models

    Authors: Yi-Fan Zhang, Weichen Yu, Qingsong Wen, Xue Wang, Zhang Zhang, Liang Wang, Rong **, Tieniu Tan

    Abstract: In the realms of computer vision and natural language processing, Large Vision-Language Models (LVLMs) have become indispensable tools, proficient in generating textual descriptions based on visual inputs. Despite their advancements, our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior ra… ▽ More

    Submitted 27 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 38 pages, 17 figures

  21. arXiv:2402.11622  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

    Authors: Junfei Wu, Qiang Liu, Ding Wang, **ghao Zhang, Shu Wu, Liang Wang, Tieniu Tan

    Abstract: Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational re… ▽ More

    Submitted 28 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accept to ACL 2024; 19 Pages, 15 Figures, 6 Tables

  22. arXiv:2402.10551  [pdf, other

    cs.LG q-bio.QM

    Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information

    Authors: Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt, Anand D. Jeyasekharan, Vaibhav Rajan

    Abstract: Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  23. arXiv:2402.10083  [pdf

    cs.AI

    Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

    Authors: Ting Fang Tan, Kabilan Elangovan, Liyuan **, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting

    Abstract: Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 13 Pages, 1 Figure, 8 Tables

  24. arXiv:2402.04087  [pdf, other

    cs.CV cs.AI cs.LG

    A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

    Authors: Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on develo** efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with lim… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  25. arXiv:2402.04050  [pdf, other

    cs.LG cs.AI cs.CV

    Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

    Authors: Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: With the emergence of pretrained vision-language models (VLMs), considerable efforts have been devoted to fine-tuning them for downstream tasks. Despite the progress made in designing efficient fine-tuning methods, such methods require access to the model's parameters, which can be challenging as model owners often opt to provide their models as a black box to safeguard model ownership. This paper… ▽ More

    Submitted 3 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  26. arXiv:2402.01158  [pdf, other

    cs.CL

    LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

    Authors: Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Han Ma, Yaofei Duan, Yanlan Kang, Songhua Yang, Baoyu Fan, Tao Tan

    Abstract: ChatGPT and other general large language models (LLMs) have achieved remarkable success, but they have also raised concerns about the misuse of AI-generated texts. Existing AI-generated text detection models, such as based on BERT and RoBERTa, are prone to in-domain over-fitting, leading to poor out-of-domain (OOD) detection performance. In this paper, we first collected Chinese text responses gen… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 17 pages, 13 tables, 7 figures

  27. arXiv:2401.09336  [pdf, other

    eess.IV cs.CV

    To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection

    Authors: Luyi Han, Tao Tan, Tianyu Zhang, Yuan Gao, Xin Wang, Valentina Longo, Sofía Ventura-Díaz, Anna D'Angelo, Jonas Teuwen, Ritse Mann

    Abstract: Clinicians compare breast DCE-MRI after neoadjuvant chemotherapy (NAC) with pre-treatment scans to evaluate the response to NAC. Clinical evidence supports that accurate longitudinal deformable registration without deforming treated tumor regions is key to quantifying tumor changes. We propose a conditional pyramid registration network based on unsupervised keypoint detection and selective volume-… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  28. Semantic Segmentation in Multiple Adverse Weather Conditions with Domain Knowledge Retention

    Authors: Xin Yang, Wending Yan, Yuan Yuan, Michael Bi Mi, Robby T. Tan

    Abstract: Semantic segmentation's performance is often compromised when applied to unlabeled adverse weather conditions. Unsupervised domain adaptation is a potential approach to enhancing the model's adaptability and robustness to adverse weather. However, existing methods encounter difficulties when sequentially adapting the model to multiple unlabeled adverse weather conditions. They struggle to acquire… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  29. arXiv:2401.06030  [pdf, other

    cs.CR

    Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation

    Authors: Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Model adaptation tackles the distribution shift problem with a pre-trained model instead of raw data, becoming a popular paradigm due to its great privacy protection. Existing methods always assume adapting to a clean target domain, overlooking the security risks of unlabeled samples. In this paper, we explore the potential backdoor attacks on model adaptation launched by well-designed poisoning t… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 11 pages, 4 figures

  30. arXiv:2401.02329  [pdf, other

    cs.LG cs.CV

    Not all Minorities are Equal: Empty-Class-Aware Distillation for Heterogeneous Federated Learning

    Authors: Kuangpu Guo, Yuhe Ding, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Data heterogeneity, characterized by disparities in local data distribution across clients, poses a significant challenge in federated learning. Substantial efforts have been devoted to addressing the heterogeneity in local label distribution. As minority classes suffer from worse accuracy due to overfitting on local imbalanced data, prior methods often incorporate class-balanced learning techniqu… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  31. arXiv:2312.17538  [pdf, other

    cs.CV cs.LG eess.IV

    Distance Guided Generative Adversarial Network for Explainable Binary Classifications

    Authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, Wei Ke, Chan-Tong Lam, Jiangang Chen, Mingfeng Jiang, Mingwei Wang, Hui Xie, Tong Tong, Qinquan Gao, Hao Chen, Tao Tan

    Abstract: Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classi… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: 12 pages, 8 figures. This work has been submitted to the IEEE TNNLS for possible publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media

  32. arXiv:2312.17492  [pdf, other

    cs.CV

    HEAP: Unsupervised Object Discovery and Localization with Contrastive Grou**

    Authors: Xin Zhang, **heng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan

    Abstract: Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at… ▽ More

    Submitted 4 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24

  33. arXiv:2312.14557  [pdf, other

    cs.CL

    Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

    Authors: Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan

    Abstract: Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets wi… ▽ More

    Submitted 1 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 10 pages, 2 figures

  34. arXiv:2312.12918  [pdf, other

    cs.CL

    Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors

    Authors: Yi-Fan Zhang, Zhang Zhang, Liang Wang, Tieniu Tan, Rong **

    Abstract: To combat the potential misuse of Natural Language Generation (NLG) technology, a variety of algorithms have been developed for the detection of AI-generated texts. Traditionally, this task is treated as a binary classification problem. Although supervised learning has demonstrated promising results, acquiring labeled data for detection purposes poses real-world challenges and the risk of overfitt… ▽ More

    Submitted 20 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 8 pages, 3 figures, AAAI 2024 Workshop on Responsible Language Models

  35. arXiv:2312.10921  [pdf, other

    cs.CV cs.SD eess.AS

    AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis

    Authors: Dongze Li, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, **g Dong, Tieniu Tan

    Abstract: Audio-driven talking head synthesis is a promising topic with wide applications in digital human, film making and virtual reality. Recent NeRF-based approaches have shown superiority in quality and fidelity compared to previous studies. However, when it comes to few-shot talking head generation, a practical scenario where only few seconds of talking video is available for one identity, two limitat… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  36. arXiv:2312.10586  [pdf, other

    cs.CV

    Few-Shot Learning from Augmented Label-Uncertain Queries in Bongard-HOI

    Authors: Qinqian Lei, Bo Wang, Robby T. Tan

    Abstract: Detecting human-object interactions (HOI) in a few-shot setting remains a challenge. Existing meta-learning methods struggle to extract representative features for classification due to the limited data, while existing few-shot HOI models rely on HOI text labels for classification. Moreover, some query images may display visual similarity to those outside their class, such as similar backgrounds b… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 9 pages, 4 figures

  37. arXiv:2311.16420  [pdf, other

    cs.LG cs.CV

    Model-free Test Time Adaptation for Out-Of-Distribution Detection

    Authors: YiFan Zhang, Xue Wang, Tian Zhou, Kun Yuan, Zhang Zhang, Liang Wang, Rong **, Tieniu Tan

    Abstract: Out-of-distribution (OOD) detection is essential for the reliability of ML models. Most existing methods for OOD detection learn a fixed decision criterion from a given in-distribution dataset and apply it universally to decide if a data point is OOD. Recent work~\cite{fang2022is} shows that given only in-distribution data, it is impossible to reliably detect OOD data without extra assumptions. Mo… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 10 figures

  38. arXiv:2311.15090  [pdf, other

    eess.IV cs.CV cs.LG

    Fine-Grained Unsupervised Cross-Modality Domain Adaptation for Vestibular Schwannoma Segmentation

    Authors: Luyi Han, Tao Tan, Ritse Mann

    Abstract: The domain adaptation approach has gained significant acceptance in transferring styles across various vendors and centers, along with filling the gaps in modalities. However, multi-center application faces the challenge of the difficulty of domain adaptation due to their intra-domain differences. We focus on introducing a fine-grained unsupervised framework for domain adaptation to facilitate cro… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  39. arXiv:2311.14388  [pdf, other

    cs.CV cs.LG

    A Parameterized Generative Adversarial Network Using Cyclic Projection for Explainable Medical Image Classification

    Authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, Chan-Tong Lam, Tong Tong, Hao Chen, Qinquan Gao, Wei Ke, Tao Tan

    Abstract: Although current data augmentation methods are successful to alleviate the data insufficiency, conventional augmentation are primarily intra-domain while advanced generative adversarial networks (GANs) generate images remaining uncertain, particularly in small-scale datasets. In this paper, we propose a parameterized GAN (ParaGAN) that effectively controls the changes of synthetic samples among do… ▽ More

    Submitted 14 December, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures. This work has been submitted to the IEEE ICASSP for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  40. arXiv:2311.10349  [pdf, other

    eess.IV cs.CV cs.LG

    Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation

    Authors: Tao Wang, Yuanbin Chen, Xinlin Zhang, Yuanbo Zhou, Junlin Lan, Bizhe Bai, Tao Tan, Min Du, Qinquan Gao, Tong Tong

    Abstract: Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose t… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  41. arXiv:2310.13289  [pdf, other

    cs.SD cs.CL eess.AS

    SALMONN: Towards Generic Hearing Abilities for Large Language Models

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

    Abstract: Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music. In this paper, we propose SALMONN, a speech audio language music open neural network, built by integrating a pre-trained text-based large… ▽ More

    Submitted 8 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  42. arXiv:2310.05863  [pdf, other

    eess.AS cs.AI cs.CV cs.SD

    Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

    Abstract: Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs. To this end, a fine-grained audio-visual joint representation (FAVOR) learning framework for multimodal LLMs is proposed in this paper, which extends a text-based L… ▽ More

    Submitted 10 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  43. arXiv:2309.13963  [pdf, other

    eess.AS cs.CL cs.SD

    Connecting Speech Encoder and Large Language Model for ASR

    Authors: Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

    Abstract: The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR), with several pioneering studies attempting to build integrated ASR models by connecting a speech encoder with an LLM. This paper presents a comparative study of three commonly used structures as connectors, including fully connected layers, multi-head c… ▽ More

    Submitted 26 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  44. arXiv:2309.12659  [pdf, other

    cs.LG cs.DS

    OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

    Authors: Yi-Fan Zhang, Qingsong Wen, Xue Wang, Weiqi Chen, Liang Sun, Zhang Zhang, Liang Wang, Rong **, Tieniu Tan

    Abstract: Online updating of time series forecasting models aims to address the concept drifting problem by efficiently updating forecasting models based on streaming data. Many algorithms are designed for online time series forecasting, with some exploiting cross-variable dependency while others assume independence among variables. Given every data assumption has its own pros and cons in online time series… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 32 pages, 11 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  45. arXiv:2309.12183  [pdf, other

    cs.CV

    ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding

    Authors: Yu Cheng, Bo Wang, Robby T. Tan

    Abstract: In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion, which is common in the wild videos. The recent human neural rendering approaches focusing on novel view synthesis initialized by the off-the-shelf human shape and pose methods have the potential to correct the initial human shape. However, the exis… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 8 pages, 8 figures

  46. arXiv:2309.07648  [pdf, other

    eess.AS cs.CL cs.SD

    Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

    Authors: Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

    Abstract: Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along… ▽ More

    Submitted 8 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted in INTERSPEECH 2024

  47. arXiv:2308.16573  [pdf, other

    eess.IV cs.CV

    Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for Semi-Supervised Medical Image Segmentation

    Authors: Yuanbin Chen, Tao Wang, Hui Tang, Longxuan Zhao, Ruige Zong, Shun Chen, Tao Tan, Xinlin Zhang, Tong Tong

    Abstract: While supervised learning has achieved remarkable success, obtaining large-scale labeled datasets in biomedical imaging is often impractical due to high costs and the time-consuming annotations required from radiologists. Semi-supervised learning emerges as an effective strategy to overcome this limitation by leveraging useful information from unlabeled datasets. In this paper, we present a novel… ▽ More

    Submitted 18 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

  48. arXiv:2308.12919  [pdf, other

    cs.CV cs.LG

    Towards Realistic Unsupervised Fine-tuning with CLIP

    Authors: Jian Liang, Lijun Sheng, Zhengbo Wang, Ran He, Tieniu Tan

    Abstract: The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsup… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  49. arXiv:2308.08942  [pdf, other

    cs.CV

    Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

    Authors: Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

    Abstract: Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupte… ▽ More

    Submitted 2 September, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accpeted to ICCV2023

  50. End-to-end Alternating Optimization for Real-World Blind Super Resolution

    Authors: Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, Tieniu Tan

    Abstract: Blind Super-Resolution (SR) usually involves two sub-problems: 1) estimating the degradation of the given low-resolution (LR) image; 2) super-resolving the LR image to its high-resolution (HR) counterpart. Both problems are ill-posed due to the information loss in the degrading process. Most previous methods try to solve the two problems independently, but often fall into a dilemma: a good super-r… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Extension of our previous NeurIPS paper. Accepted to IJCV

    Journal ref: International Journal of Computer Vision (IJCV) 2023