Skip to main content

Showing 1–50 of 104 results for author: Bai, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, **g Sun, ** Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general pu… ▽ More

    Submitted 3 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.18035  [pdf, other

    cs.LG stat.ML

    Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

    Authors: Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai

    Abstract: Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense o… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2211.11623

  3. arXiv:2406.09196  [pdf, other

    cs.CV cs.LG

    Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang

    Abstract: Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  4. arXiv:2406.06543  [pdf, other

    cs.AR cs.LG cs.NE eess.SP

    SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

    Authors: Zhanglu Yan, Zhenyu Bai, Tulika Mitra, Weng-Fai Wong

    Abstract: Heart disease is one of the leading causes of death worldwide. Given its high risk and often asymptomatic nature, real-time continuous monitoring is essential. Unlike traditional artificial neural networks (ANNs), spiking neural networks (SNNs) are well-known for their energy efficiency, making them ideal for wearable devices and energy-constrained edge computing platforms. However, current energy… ▽ More

    Submitted 6 May, 2024; originally announced June 2024.

  5. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2405.17025  [pdf, other

    cs.AR cs.AI

    SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs

    Authors: Zhenyu Bai, Pranav Dangi, Huize Li, Tulika Mitra

    Abstract: Efficiently supporting long context length is crucial for Transformer models. The quadratic complexity of the self-attention computation plagues traditional Transformers. Sliding window-based static sparse attention mitigates the problem by limiting the attention scope of the input tokens, reducing the theoretical complexity from quadratic to linear. Although the sparsity induced by window attenti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepeted paper for DAC'22

  7. arXiv:2405.16127  [pdf, other

    cs.IR

    Finetuning Large Language Model for Personalized Ranking

    Authors: Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this st… ▽ More

    Submitted 20 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  8. arXiv:2405.13787  [pdf, other

    cs.LG

    Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target

    Authors: Jiajie Zhao, Zhiwei Bai, Yaoyu Zhang

    Abstract: Overparameterized models like deep neural networks have the intriguing ability to recover target functions with fewer sampled data points than parameters (see arXiv:2307.08921). To gain insights into this phenomenon, we concentrate on a single-neuron target recovery scenario, offering a systematic examination of how initialization and sample size influence the performance of two-layer neural netwo… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 22 pages, 11 figures

  9. arXiv:2405.13721  [pdf, other

    cs.LG

    Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

    Authors: Zhiwei Bai, Jiajie Zhao, Yaoyu Zhang

    Abstract: Matrix factorization models have been extensively studied as a valuable test-bed for understanding the implicit biases of overparameterized models. Although both low nuclear norm and low rank regularization have been studied for these models, a unified understanding of when, how, and why they achieve different implicit regularization effects remains elusive. In this work, we systematically investi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 34 pages

  10. arXiv:2404.18930  [pdf, other

    cs.CV

    Hallucination of Multimodal Large Language Models: A Survey

    Authors: Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

    Abstract: This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge k… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 140 references

  11. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, **g Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jian** Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  12. arXiv:2404.17466  [pdf, other

    physics.comp-ph cs.LG physics.plasm-ph

    FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks

    Authors: Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams

    Abstract: Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model dem… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 18 pages, 10 figures

    MSC Class: 76W05; 68T45 ACM Class: J.2; I.2.10

  13. arXiv:2404.15067  [pdf, other

    cs.CL

    Enhancing Textual Personality Detection toward Social Media: Integrating Long-term and Short-term Perspectives

    Authors: Haohao Zhu, Xiaokun Zhang, Junyu Lu, Youlin Wu, Zewen Bai, Changrong Min, Liang Yang, Bo Xu, Dongyu Zhang, Hongfei Lin

    Abstract: Textual personality detection aims to identify personality characteristics by analyzing user-generated content toward social media platforms. Numerous psychological literature highlighted that personality encompasses both long-term stable traits and short-term dynamic states. However, existing studies often concentrate only on either long-term or short-term personality representations, without eff… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 11 pages, 9 figures

  14. arXiv:2404.13896  [pdf, other

    cs.CV

    CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory

    Authors: Yunlong Ran, Yanxu Li, Qi Ye, Yuchi Huo, Zechun Bai, Jiahao Sun, Jiming Chen

    Abstract: Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address t… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.01543  [pdf, other

    cs.CV cs.GR

    Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

    Authors: Ziqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, ** Tan, Yinda Zhang

    Abstract: 3D head avatars built with neural implicit volumetric representations have achieved unprecedented levels of photorealism. However, the computational cost of these methods remains a significant barrier to their widespread adoption, particularly in real-time applications such as virtual reality and teleconferencing. While attempts have been made to develop fast neural rendering approaches for static… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: In CVPR2024. Project page: https://augmentedperception.github.io/monoavatar-plus

  16. arXiv:2403.05428  [pdf, other

    cs.MM

    Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

    Authors: Bingbing Wang, Bin Liang, Chun-Mei Feng, Wangmeng Zuo, Zhixin Bai, Shijue Huang, Kam-Fai Wong, Xi Zeng, Ruifeng Xu

    Abstract: In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, design… ▽ More

    Submitted 16 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  17. arXiv:2403.05427  [pdf, other

    cs.MM

    Reply with Sticker: New Dataset and Model for Sticker Retrieval

    Authors: Bin Liang, Bingbing Wang, Zhixin Bai, Qiwei Lang, Mingwei Sun, Kaiheng Hou, Kam-Fai Wong, Ruifeng Xu

    Abstract: Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the curren… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  18. arXiv:2402.15627  [pdf, other

    cs.LG cs.DC

    MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

    Authors: Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao , et al. (7 additional authors not shown)

    Abstract: We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model bl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  19. arXiv:2402.14660  [pdf, other

    cs.CL cs.AI

    ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

    Authors: Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: The benchmark dataset will be released soon

  20. arXiv:2402.13724  [pdf, other

    cs.HC cs.CV

    Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters

    Authors: Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian

    Abstract: Animating virtual characters has always been a fundamental research problem in virtual reality (VR). Facial animations play a crucial role as they effectively convey emotions and attitudes of virtual humans. However, creating such facial animations can be challenging, as current methods often involve utilization of expensive motion capture devices or significant investments of time and effort from… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 9 pages. To appear in IEEE-VR

  21. arXiv:2402.11909  [pdf, other

    cs.CV

    One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation

    Authors: Zhixuan Yu, Ziqian Bai, Abhimitra Meka, Feitong Tan, Qiangeng Xu, Rohit Pandey, Sean Fanello, Hyun Soo Park, Yinda Zhang

    Abstract: Traditional methods for constructing high-quality, personalized head avatars from monocular videos demand extensive face captures and training time, posing a significant challenge for scalability. This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user. We learn a generative model for 3D animatable photo-realistic head avatar from… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  22. arXiv:2402.03933  [pdf

    cs.SE stat.AP

    Development of a Evaluation Tool for Age-Appropriate Software in Aging Environments: A Delphi Study

    Authors: Zhenggang Bai, Yougxiang Fang, Hongtu Chen, Xinru Chen, Ning An, Min Zhang, Guoxin Rui, **g **

    Abstract: Objective: We aimed to develop a dependable reliable tool for assessing software ageappropriateness. Methods: We conducted a systematic review to get the indicators of technology ageappropriateness from studies from January 2000 to April 2023.This study engaged 25 experts from the fields of anthropology, sociology,and social technology research across, three rounds of Delphi consultations were con… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  23. arXiv:2402.02791  [pdf, other

    cs.CL cs.AI cs.LG

    Rethinking Optimization and Architecture for Tiny Language Models

    Authors: Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang

    Abstract: The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing lang… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  24. arXiv:2402.01345  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

    Authors: Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou

    Abstract: Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal halluc… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  25. arXiv:2401.14856  [pdf, other

    cs.CV cs.AI

    Memory-Inspired Temporal Prompt Interaction for Text-Image Classification

    Authors: Xinyao Yu, Hao Sun, Ziwei Niu, Rui Qin, Zhenjia Bai, Yen-Wei Chen, Lanfen Lin

    Abstract: In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy i… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  26. arXiv:2401.06951  [pdf, other

    cs.CL cs.AI

    E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

    Authors: Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

    Abstract: Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issue… ▽ More

    Submitted 22 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  27. arXiv:2401.01529  [pdf, other

    cs.CV

    Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

    Authors: Ziyi Bai, Rui** Wang, Xilin Chen

    Abstract: Video Question Answering (VideoQA) has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors. Despite the recent success of large vision language models in many multi-modal tasks, complex situation reasoning over videos involving multiple human-object interaction events still remains challenging. In contrast, humans can easily tackle it by using a series of episod… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Accepted in NeurIPS 2023

  28. arXiv:2312.17276  [pdf, other

    cs.CL cs.LG

    PanGu-$Ï€$: Enhancing Language Model Architectures via Nonlinearity Compensation

    Authors: Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, Dacheng Tao

    Abstract: The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  29. arXiv:2312.13108  [pdf, other

    cs.CV

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Authors: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper… ▽ More

    Submitted 1 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project Page: https://showlab.github.io/assistgui/

  30. arXiv:2312.04008  [pdf, other

    cs.CV

    Natural-language-driven Simulation Benchmark and Copilot for Efficient Production of Object Interactions in Virtual Road Scenes

    Authors: Kairui Yang, Zihao Guo, Gengjie Lin, Haotian Dong, Die Zuo, Jibin Peng, Zhao Huang, Zhecheng Xu, Fupeng Li, Ziyun Bai, Di Lin

    Abstract: We advocate the idea of the natural-language-driven(NLD) simulation to efficiently produce the object interactions between multiple objects in the virtual road scenes, for teaching and testing the autonomous driving systems that should take quick action to avoid collision with obstacles with unpredictable motions. The NLD simulation allows the brief natural-language description to control the obje… ▽ More

    Submitted 15 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: 12 pages, 6 figures

  31. Texture-Semantic Collaboration Network for ORSI Salient Object Detection

    Authors: Gongyang Li, Zhen Bai, Zhi Liu

    Abstract: Salient object detection (SOD) in optical remote sensing images (ORSIs) has become increasingly popular recently. Due to the characteristics of ORSIs, ORSI-SOD is full of challenges, such as multiple objects, small objects, low illuminations, and irregular shapes. To address these challenges, we propose a concise yet effective Texture-Semantic Collaboration Network (TSCNet) to explore the collabor… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, Accepted by IEEE Transactions on Circuits and Systems II: Express Briefs 2023

  32. arXiv:2311.16568  [pdf, ps, other

    cs.IT eess.SP

    Active Reconfigurable Intelligent Surface Enhanced Spectrum Sensing for Cognitive Radio Networks

    Authors: Jungang Ge, Ying-Chang Liang, Sumei Sun, Yonghong Zeng, Zhidong Bai

    Abstract: In opportunistic cognitive radio networks, when the primary signal is very weak compared to the background noise, the secondary user requires long sensing time to achieve a reliable spectrum sensing performance, leading to little remaining time for the secondary transmission. To tackle this issue, we propose an active reconfigurable intelligent surface (RIS) assisted spectrum sensing system, where… ▽ More

    Submitted 26 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  33. arXiv:2311.01689  [pdf, other

    cs.CL cs.AI

    Data-Free Distillation of Language Model by Text-to-Text Transfer

    Authors: Zheyuan Bai, Xinduo Liu, Hailin Hu, Tianyu Guo, Qinghua Zhang, Yunhe Wang

    Abstract: Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when original training data is unavailable. Previous works for DFKD in NLP mainly focus on distilling encoder-only structures like BERT on classification tasks, which overlook the notable progress of generative language modeling. In this work, we propose a novel DFKD framework, namely DFKD-T$^{3}$, where the pretra… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  34. arXiv:2310.17780  [pdf, other

    eess.IV cs.CV

    AutoCT: Automated CT registration, segmentation, and quantification

    Authors: Zhe Bai, Abdelilah Essiari, Talita Perciano, Kristofer E. Bouchard

    Abstract: The processing and analysis of computed tomography (CT) imaging is important for both basic scientific development and clinical applications. In AutoCT, we provide a comprehensive pipeline that integrates an end-to-end automatic preprocessing, registration, segmentation, and quantitative analysis of 3D CT scans. The engineered pipeline enables atlas-based CT segmentation and quantification leverag… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  35. arXiv:2310.04677  [pdf, other

    eess.IV cs.CV

    AG-CRC: Anatomy-Guided Colorectal Cancer Segmentation in CT with Imperfect Anatomical Knowledge

    Authors: Rongzhao Zhang, Zhian Bai, Ruoying Yu, Wenrao Pang, Lingyun Wang, Lifeng Zhu, Xiaofan Zhang, Huan Zhang, Weiguo Hu

    Abstract: When delineating lesions from medical images, a human expert can always keep in mind the anatomical structure behind the voxels. However, although high-quality (though not perfect) anatomical information can be retrieved from computed tomography (CT) scans with modern deep learning algorithms, it is still an open problem how these automatically generated organ masks can assist in addressing challe… ▽ More

    Submitted 30 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: under review

  36. arXiv:2309.12865  [pdf, other

    cs.CV

    Bridging Sensor Gaps via Single-Direction Tuning for Hyperspectral Image Classification

    Authors: Xizhe Xue, Haokui Zhang, Ying Li, Liuwei Wan, Zongwen Bai, Mike Zheng Shou

    Abstract: Recently, some researchers started exploring the use of ViTs in tackling HSI classification and achieved remarkable results. However, the training of ViT models requires a considerable number of training samples, while hyperspectral data, due to its high annotation costs, typically has a relatively small number of training samples. This contradiction has not been effectively addressed. In this pap… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  37. arXiv:2309.09858  [pdf, other

    cs.CV

    Unsupervised Open-Vocabulary Object Localization in Videos

    Authors: Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

    Abstract: In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. We propose a method that first localizes objects in videos via an object-centric approach with slot attention and then assigns text to the obtained slots. The latter is achieved by an unsupervised way to… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023; Presented on CVPR 2024 Workshop CORR; Project Page:https://github.com/amazon-science/object-centric-vol

  38. Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

    Authors: Gongyang Li, Zhen Bai, Zhi Liu, Xinpeng Zhang, Haibin Ling

    Abstract: Existing methods for Salient Object Detection in Optical Remote Sensing Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the backbone, such as VGG and ResNet. Since CNNs can only extract features within certain receptive fields, most ORSI-SOD methods generally follow the local-to-contextual paradigm. In this paper, we propose a novel Global Extraction Local Exploration Networ… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 13 pages, 6 figures, Accepted by IEEE Transactions on Image Processing 2023

  39. arXiv:2309.00233  [pdf, other

    cs.CV

    Object-Centric Multiple Object Tracking

    Authors: Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

    Abstract: Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art model… ▽ More

    Submitted 5 September, 2023; v1 submitted 31 August, 2023; originally announced September 2023.

    Comments: ICCV 2023 camera-ready version

  40. arXiv:2307.08921  [pdf, other

    cs.LG stat.ML

    Optimistic Estimate Uncovers the Potential of Nonlinear Models

    Authors: Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu

    Abstract: We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convoluti… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  41. arXiv:2305.18149  [pdf, other

    cs.CL cs.AI

    Multiscale Positive-Unlabeled Detection of AI-Generated Texts

    Authors: Yuchuan Tian, Hanting Chen, Xutao Wang, Zheyuan Bai, Qinghua Zhang, Ruifeng Li, Chao Xu, Yunhe Wang

    Abstract: Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are astonishing at generating human-like texts, but they may impact the authenticity of texts. Previous works proposed methods to detect these AI-generated texts, including simple ML classifiers, pretrained-model-based zero-shot methods, and finetuned language classification models. However, mainstream detectors always fail on short te… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: ICLR2024 (Spotlight)

  42. arXiv:2304.09091  [pdf, other

    cs.HC cs.AI

    Participatory Design of AI with Children: Reflections on IDC Design Challenge

    Authors: Zhen Bai, Frances Judd, Naomi Polinsky, Elmira Yadollahi

    Abstract: Children growing up in the era of Artificial Intelligence (AI) will be most impacted by the technology across their life span. Participatory Design (PD) is widely adopted by the Interaction Design and Children (IDC) community, which empowers children to bring their interests, needs, and creativity to the design process of future technologies. While PD has drawn increasing attention to human-center… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  43. arXiv:2304.08173  [pdf

    cs.CL

    A Corpus-based Analysis of Attitudinal Changes in Lin Yutang's Self-translation of Between Tears and Laughter

    Authors: Zhi** Bai

    Abstract: Attitude is omnipresent in almost every type of text. There has yet to be any relevant research on attitudinal shifts in self-translation. The Chinese version of Between Tears and Laughter is a rare case of self-translation and co-translation in that the first 11 chapters are self-translated by Lin Yutang, and the last 12 chapters by Xu Chengbin. The current study conducted a word frequency analys… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: 30 pages, 5 tables

  44. arXiv:2304.01436  [pdf, other

    cs.CV cs.GR

    Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

    Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, ** Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

    Abstract: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduc… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/

  45. arXiv:2302.03128  [pdf, other

    cs.CV cs.MA

    Cooperverse: A Mobile-Edge-Cloud Framework for Universal Cooperative Perception with Mixed Connectivity and Automation

    Authors: Zhengwei Bai, Guoyuan Wu, Matthew J. Barth, Yongkang Liu, Emrah Akin Sisbot, Kentaro Oguchi

    Abstract: Cooperative perception (CP) is attracting increasing attention and is regarded as the core foundation to support cooperative driving automation, a potential key solution to addressing the safety, mobility, and sustainability issues of contemporary transportation systems. However, current research on CP is still at the beginning stages where a systematic problem formulation of CP is still missing,… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: 6 pages, 7 figures

  46. arXiv:2212.07060  [pdf, other

    cs.CV

    VINet: Lightweight, Scalable, and Heterogeneous Cooperative Perception for 3D Object Detection

    Authors: Zhengwei Bai, Guoyuan Wu, Matthew J. Barth, Yongkang Liu, Emrah Akin Sisbot, Kentaro Oguchi

    Abstract: Utilizing the latest advances in Artificial Intelligence (AI), the computer vision community is now witnessing an unprecedented evolution in all kinds of perception tasks, particularly in object detection. Based on multiple spatially separated perception nodes, Cooperative Perception (CP) has emerged to significantly advance the perception of automated driving. However, current cooperative object… ▽ More

    Submitted 21 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

  47. arXiv:2211.11891  [pdf, other

    stat.ML cs.LG

    A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis

    Authors: Dong Min Roh, Zhaojun Bai, Ren-Cang Li

    Abstract: Much like the classical Fisher linear discriminant analysis (LDA), the recently proposed Wasserstein discriminant analysis (WDA) is a linear dimensionality reduction method that seeks a projection matrix to maximize the dispersion of different data classes and minimize the dispersion of same data classes via a bi-level optimization. In contrast to LDA, WDA can account for both global and local int… ▽ More

    Submitted 27 July, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

  48. arXiv:2211.11623  [pdf, other

    cs.LG stat.ML

    Linear Stability Hypothesis and Rank Stratification for Nonlinear Models

    Authors: Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu

    Abstract: Models with nonlinear architectures/parameterizations such as deep neural networks (DNNs) are well known for their mysteriously good generalization performance at overparameterization. In this work, we tackle this mystery from a novel perspective focusing on the transition of the target recovery/fitting accuracy as a function of the training data size. We propose a rank stratification for general… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  49. arXiv:2210.16435  [pdf, other

    cs.LG stat.ML

    Scalable Spectral Clustering with Group Fairness Constraints

    Authors: Ji Wang, Ding Lu, Ian Davidson, Zhaojun Bai

    Abstract: There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entiret… ▽ More

    Submitted 14 April, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6613-6629, 2023

  50. arXiv:2210.05677  [pdf

    q-bio.GN cs.LG

    Application of Deep Learning on Single-Cell RNA-sequencing Data Analysis: A Review

    Authors: Matthew Brendel, Chang Su, Zilong Bai, Hao Zhang, Olivier Elemento, Fei Wang

    Abstract: Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during development of complex organisms and improved our understanding o… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.