Skip to main content

Showing 1–50 of 119 results for author: Cao, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  2. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  3. arXiv:2405.01394  [pdf, other

    cs.AI

    Analysis of a Modular Autonomous Driving Architecture: The Top Submission to CARLA Leaderboard 2.0 Challenge

    Authors: Weize Zhang, Mohammed Elmahgiubi, Kasra Rezaee, Behzad Khamidehi, Hamidreza Mirkhani, Fazel Arasteh, Chunlin Li, Muhammad Ahsan Kaleem, Eduardo R. Corral-Soto, Dhruv Sharma, Tongtong Cao

    Abstract: In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assiste… ▽ More

    Submitted 21 March, 2024; originally announced May 2024.

  4. arXiv:2404.10584  [pdf, other

    cs.CV

    ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

    Authors: Chunli Peng, Xuan Dong, Tiantian Cao, Zhengqing Li, Kun Dong, Weixin Li

    Abstract: The fusion of images from dual camera systems featuring a wide-angle and a telephoto camera has become a hotspot problem recently. By integrating simultaneously captured wide-angle and telephoto images from these systems, the resulting fused image achieves a wide field of view (FOV) coupled with high-definition quality. Existing approaches are mostly deep learning methods, and predominantly rely o… ▽ More

    Submitted 29 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  5. arXiv:2404.06162  [pdf, other

    cs.CL cs.AI cs.LG

    Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

    Authors: Tianyu Cao, Natraj Raman, Danial Dervovic, Chenhao Tan

    Abstract: As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study… ▽ More

    Submitted 8 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  6. arXiv:2403.15385  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

    Authors: Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

    Abstract: Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: See the project website at https://research.nvidia.com/labs/toronto-ai/LATTE3D/

    MSC Class: 68T45 ACM Class: I.2.6; I.2.7; I.3.6; I.3.7

  7. arXiv:2403.04997  [pdf, other

    cs.CL cs.CV

    DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation

    Authors: Jiapeng Wang, Chengyu Wang, Tingfeng Cao, Jun Huang, Lianwen **

    Abstract: We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models (e.g., Stable Diffusion) for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt, which can be leveraged to create the target image of h… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  8. arXiv:2403.03431  [pdf, other

    cs.CV

    Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

    Authors: Bingyan Liu, Chengyu Wang, Tingfeng Cao, Kui Jia, Jun Huang

    Abstract: Deep Text-to-Image Synthesis (TIS) models such as Stable Diffusion have recently gained significant popularity for creative Text-to-image generation. Yet, for domain-specific scenarios, tuning-free Text-guided Image Editing (TIE) is of greater importance for application developers, which modify objects or object properties in images by manipulating feature components in attention layers during the… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  9. arXiv:2403.02253  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection

    Authors: Yuexin Li, Chengyu Huang, Shumin Deng, Mei Lin Lock, Tri Cao, Nay Oo, Hoon Wei Lim, Bryan Hooi

    Abstract: Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that the… ▽ More

    Submitted 15 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by USENIX Security 2024

  10. arXiv:2403.01417  [pdf, other

    cs.LG cs.DC

    Asyn2F: An Asynchronous Federated Learning Framework with Bidirectional Model Aggregation

    Authors: Tien-Dung Cao, Nguyen T. Vuong, Thai Q. Le, Hoang V. N. Dao, Tram Truong-Huu

    Abstract: In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on develo** an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  11. arXiv:2402.13822  [pdf, other

    cs.CV

    MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification

    Authors: Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyen

    Abstract: Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design an… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  12. arXiv:2402.10631  [pdf, other

    cs.CL

    BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

    Authors: Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

    Abstract: The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD)… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  13. arXiv:2402.05981  [pdf, other

    cs.LG cs.PF

    Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance

    Authors: Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Ying Zhang, Yun Ma, Ting Cao, Xuanzhe Liu

    Abstract: Deep Learning (DL) is increasingly being integrated into Web applications through a method known as "in-browser inference", where the DL processes occur directly within Web browsers. However, the actual performance of this method and its effect on user experience quality (QoE) is not well-understood. This gap in knowledge necessitates new forms of QoE measurement, going beyond traditional metrics… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  14. arXiv:2401.12216  [pdf, other

    stat.ML cs.LG math.OC

    Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

    Authors: Philip Amortila, Tongyi Cao, Akshay Krishnamurthy

    Abstract: A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ. As distribution shift typically results in a degradation in performance, much attention has been devoted to algorithmic interventions that mitigate these detrimental effects. In this paper, we study the effect of distribution shift in the pres… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  15. arXiv:2312.16199  [pdf, other

    cs.IR cs.LG

    Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns

    Authors: Xin Liu, Zheng Li, Yifan Gao, **gfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song

    Abstract: The goal of session-based recommendation in E-commerce is to predict the next item that an anonymous user will purchase based on the browsing and purchase history. However, constructing global or local transition graphs to supplement session data can lead to noisy correlations and user intent vanishing. In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that char… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted by NeurIPS 2023

  16. arXiv:2312.11184  [pdf, other

    cs.CV

    View Transition based Dual Camera Image Fusion

    Authors: Tiantian Cao, Xuan Dong, Chunli Peng, Zhengqing Li, Xinyu Guo, Weixin Li

    Abstract: The dual camera system of wide-angle ($\bf{W}$) and telephoto ($\bf{T}$) cameras has been widely adopted by popular phones. In the overlap region, fusing the $\bf{W}$ and $\bf{T}$ images can generate a higher quality image. Related works perform pixel-level motion alignment or high-dimensional feature alignment of the $\bf{T}$ image to the view of the $\bf{W}$ image and then perform image/feature… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  17. arXiv:2312.09445  [pdf, other

    eess.SP cs.CV cs.LG

    IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis

    Authors: Tue Minh Cao, Nhat Hong Tran, Le Phi Nguyen, Hieu Huy Pham, Hung Thanh Nguyen

    Abstract: Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha… ▽ More

    Submitted 16 November, 2023; originally announced December 2023.

  18. arXiv:2312.07141  [pdf, other

    cs.CL

    Multilingual large language models leak human stereotypes across language boundaries

    Authors: Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

    Abstract: Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language mode… ▽ More

    Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  19. arXiv:2311.09708  [pdf, other

    cs.CL

    A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection

    Authors: Thi-Nhung Nguyen, Hoang Ngo, Kiem-Hieu Nguyen, Tuan-Dung Cao

    Abstract: Our work addresses the problem of unsupervised Aspect Category Detection using a small set of seed words. Recent works have focused on learning embedding spaces for seed words and sentences to establish similarities between sentences and aspects. However, aspect representations are limited by the quality of initial seed words, and model performances are compromised by noise. To mitigate this limit… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023

  20. arXiv:2311.08100  [pdf, other

    cs.CV cs.RO

    PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

    Authors: Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

    Abstract: We present a new interaction mechanism of prediction and planning for end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving), which considers the timestep-wise interaction to better integrate prediction and planning. An ego vehicle performs motion planning at each timestep based on the trajectory prediction of surrounding agents (e.g., vehi… ▽ More

    Submitted 27 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  21. arXiv:2311.07879  [pdf, other

    cs.CL cs.AI

    Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

    Authors: Yang Trista Cao, Lovely-Frances Domingo, Sarah Ann Gilbert, Michelle Mazurek, Katie Shilton, Hal Daumé III

    Abstract: Extensive efforts in automated approaches for content moderation have been focused on develo** models to identify toxic, offensive, and hateful content with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks have truly addressed moderators' needs in accomplishing their work. In this paper, we surface gaps between past research efforts tha… ▽ More

    Submitted 16 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  22. arXiv:2311.06758  [pdf, other

    cs.CL

    Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension

    Authors: Tingfeng Cao, Chengyu Wang, Chuanqi Tan, Jun Huang, **hui Zhu

    Abstract: In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cro… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: emnlp 2023

  23. arXiv:2311.06752  [pdf, other

    cs.CL

    BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

    Authors: Tingfeng Cao, Chengyu Wang, Bingyan Liu, Ziheng Wu, **hui Zhu, Jun Huang

    Abstract: Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt engineering by humans in order to produce satisfactory results for real-world applications. We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simp… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: emnlp 2023

  24. arXiv:2311.01792  [pdf, other

    cs.CL cs.AI

    AFPQ: Asymmetric Floating Point Quantization for LLMs

    Authors: Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

    Abstract: Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth. Low-bit weight quantization can save memory and accelerate inference. Although floating-point (FP) formats show good performance in LLM quantization, they tend to perform poorly with small group sizes or sub-4 bits. We find the reason is that the absence… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  25. arXiv:2310.13772  [pdf, other

    cs.CV cs.LG

    TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

    Authors: Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin

    Abstract: We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture sy… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Videos and more results on https://research.nvidia.com/labs/toronto-ai/texfusion/

    ACM Class: I.3.3

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023) 4169-4181

  26. arXiv:2310.12551  [pdf, other

    cs.RO eess.IV

    Iterative PnP and its application in 3D-2D vascular image registration for robot navigation

    Authors: **gwei Song, Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari

    Abstract: This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications… ▽ More

    Submitted 11 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Submitted to ICRA 2024 Errors in Eq. 4 and Eq. 6 have been corrected. Updates include some minor improvements in Section II

  27. arXiv:2310.00057  [pdf, other

    cs.CE

    A multi-fidelity deep operator network (DeepONet) for fusing simulation and monitoring data: Application to real-time settlement prediction during tunnel construction

    Authors: Chen Xu, Ba Trung Cao, Yong Yuan, Günther Meschke

    Abstract: Ground settlement prediction during the process of mechanized tunneling is of paramount importance and remains a challenging research topic. Typically, two paradigms are existing: a physics-driven approach utilizing process-oriented computational simulation models for the tunnel-soil interaction and the settlement prediction, and a data-driven approach employing machine learning techniques to esta… ▽ More

    Submitted 12 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

  28. arXiv:2309.16110  [pdf, other

    cs.CV

    Learning Effective NeRFs and SDFs Representations with 3D Generative Adversarial Networks for 3D Object Generation: Technical Report for ICCV 2023 OmniObject3D Challenge

    Authors: Zheyuan Yang, Yibo Liu, Guile Wu, Tongtong Cao, Yuan Ren, Yang Liu, Bingbing Liu

    Abstract: In this technical report, we present a solution for 3D object generation of ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has made great process and achieved promising results, but it remains a challenging task due to the difficulty of generating complex, textured and high-fidelity results. To resolve this problem, we study learning effective NeRFs and SDFs representation… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  29. arXiv:2309.08978  [pdf, other

    cs.AI

    Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

    Authors: Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang

    Abstract: Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the firs… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  30. arXiv:2308.16451  [pdf, other

    cs.RO

    Optical flow-based vascular respiratory motion compensation

    Authors: Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari, **gwei Song

    Abstract: This paper develops a new vascular respiratory motion compensation algorithm, Motion-Related Compensation (MRC), to conduct vascular respiratory motion compensation by extrapolating the correlation between invisible vascular and visible non-vascular. Robot-assisted vascular intervention can significantly reduce the radiation exposure of surgeons. In robot-assisted image-guided intervention, blood… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: This manuscript has been accepted by IEEE Robotics and Automation Letters

  31. arXiv:2308.13323  [pdf, other

    cs.CV cs.RO

    SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation

    Authors: Xuechao Chen, Shuangjie Xu, Xiaoyi Zou, Tongyi Cao, Dit-Yan Yeung, Lu Fang

    Abstract: LiDAR-based semantic perception tasks are critical yet challenging for autonomous driving. Due to the motion of objects and static/dynamic occlusion, temporal information plays an essential role in reinforcing perception by enhancing and completing single-frame knowledge. Previous approaches either directly stack historical frames to the current frame or build a 4D spatio-temporal neighborhood usi… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Received by ICCV2023

  32. arXiv:2308.12066  [pdf, other

    cs.LG cs.AI cs.AR

    Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

    Authors: Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang

    Abstract: Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which… ▽ More

    Submitted 27 April, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

  33. arXiv:2308.08140  [pdf, other

    cs.CV

    GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds

    Authors: Ziyu Li, **gming Guo, Tongtong Cao, Liu Bingbing, Wankou Yang

    Abstract: LiDAR-based 3D detection has made great progress in recent years. However, the performance of 3D detectors is considerably limited when deployed in unseen environments, owing to the severe domain gap problem. Existing domain adaptive 3D detection methods do not adequately consider the problem of the distributional discrepancy in feature space, thereby hindering generalization of detectors across d… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  34. Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

    Authors: Junyan Li, Li Lyna Zhang, Jiahang Xu, Yu**g Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang

    Abstract: Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ranking-distilled token pruning method ToP, which selectively removes unnecessary tokens as input sequence passes through layers, allowing the model t… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: KDD 2023

  35. arXiv:2306.11925  [pdf, other

    cs.CV

    LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

    Authors: Duy M. H. Nguyen, Hoang Nguyen, Nghiem T. Diep, Tan N. Pham, Tri Cao, Binh T. Nguyen, Paul Swoboda, Nhat Ho, Shadi Albarqouni, Pengtao Xie, Daniel Sonntag, Mathias Niepert

    Abstract: Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and me… ▽ More

    Submitted 18 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  36. arXiv:2306.04640  [pdf, other

    cs.CL cs.AI cs.LG

    ModuleFormer: Modularity Emerges from Mixture-of-Experts

    Authors: Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan

    Abstract: Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models.… ▽ More

    Submitted 11 September, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  37. arXiv:2305.19982  [pdf, other

    cs.LG cs.AI

    Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

    Authors: Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

    Abstract: Running out of GPU memory has become a main bottleneck for large-scale DNN training. How to reduce the memory footprint during training has received intensive research attention. We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients. To address this… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  38. arXiv:2305.19549  [pdf, other

    cs.CL

    Accurate and Structured Pruning for Efficient Automatic Speech Recognition

    Authors: Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, **yu Li, Mao Yang, Lili Qiu

    Abstract: Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy on resource-limited devices. In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the m… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023

  39. arXiv:2305.12356  [pdf, other

    cs.LG cs.AI

    Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

    Authors: Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

    Abstract: Efficient deployment of large language models (LLMs) necessitates low-bit quantization to minimize model size and inference cost. While low-bit integer formats (e.g., INT8/INT4) have been the conventional choice, emerging low-bit floating-point formats (e.g., FP8/FP4) offer a compelling alternative and are gaining support from cutting-edge hardware, such as NVIDIA's H100 GPU. However, the superior… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  40. arXiv:2304.11080  [pdf, other

    eess.SP cs.LG

    Multimodal contrastive learning for diagnosing cardiovascular diseases from electrocardiography (ECG) signals and patient metadata

    Authors: Tue M. Cao, Nhat H. Tran, Phi Le Nguyen, Hieu Pham

    Abstract: This work discusses the use of contrastive learning and deep learning for diagnosing cardiovascular diseases from electrocardiography (ECG) signals. While the ECG signals usually contain 12 leads (channels), many healthcare facilities and devices lack access to all these 12 leads. This raises the problem of how to use only fewer ECG leads to produce meaningful diagnoses with high performance. We i… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted for presentation at the Midwest Machine Learning Symposium (MMLS 2023), Chicago, IL, USA

  41. arXiv:2303.14898  [pdf, other

    cs.LG cs.AI cs.CL cs.SI

    Mutually-paced Knowledge Distillation for Cross-lingual Temporal Knowledge Graph Reasoning

    Authors: Ruijie Wang, Zheng Li, **gfeng Yang, Tianyu Cao, Chao Zhang, Bing Yin, Tarek Abdelzaher

    Abstract: This paper investigates cross-lingual temporal knowledge graph reasoning problem, which aims to facilitate reasoning on Temporal Knowledge Graphs (TKGs) in low-resource languages by transfering knowledge from TKGs in high-resource ones. The cross-lingual distillation ability across TKGs becomes increasingly crucial, in light of the unsatisfying performance of existing reasoning methods on those se… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: This paper is accepted by The Web Conference 2023

  42. arXiv:2303.13845  [pdf, other

    cs.CV

    Anomaly Detection under Distribution Shift

    Authors: Tri Cao, Jiawen Zhu, Guansong Pang

    Abstract: Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data. Most existing AD studies assume that the training and test data are drawn from the same data distribution, but the test data can have large distribution shifts arising in many real-world applications due to different natural variatio… ▽ More

    Submitted 1 September, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted at ICCV 2023

  43. arXiv:2303.11219  [pdf, other

    cs.CV cs.AI

    NeTO:Neural Reconstruction of Transparent Objects with Self-Occlusion Aware Refraction-Tracing

    Authors: Zongcheng Li, Xiaoxiao Long, Yusen Wang, Tuo Cao, Wen** Wang, Fei Luo, Chunxia Xiao

    Abstract: We present a novel method, called NeTO, for capturing 3D geometry of solid transparent objects from 2D images via volume rendering. Reconstructing transparent objects is a very challenging task, which is ill-suited for general-purpose reconstruction techniques due to the specular light transport phenomena. Although existing refraction-tracing based methods, designed specially for this task, achiev… ▽ More

    Submitted 8 September, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  44. arXiv:2303.09730  [pdf, other

    cs.CV

    ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

    Authors: Chen Tang, Li Lyna Zhang, Huiqiang Jiang, Jiahang Xu, Ting Cao, Quanlu Zhang, Yuqing Yang, Zhi Wang, Mao Yang

    Abstract: Neural Architecture Search (NAS) has shown promising performance in the automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing lightweight and low-latency ViT models for diverse mobile devices remains a big challenge. In this work, we propose ElasticViT, a two-stage NAS approach that trains a high-quality ViT supernet over a very large search space that supports a wid… ▽ More

    Submitted 21 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  45. arXiv:2303.08365  [pdf, other

    cs.DC

    Gamify Stencil Dwarf on Cloud for Democratizing Scientific Computing

    Authors: Kun Li, Zhichun Li, Yuetao Chen, Zixuan Wang, Yiwei Zhang, Liang Yuan, Haipeng Jia, Yunquan Zhang, Ting Cao, Mao Yang

    Abstract: Stencil computation is one of the most important kernels in various scientific computing. Nowadays, most Stencil-driven scientific computing still relies heavily on supercomputers, suffering from expensive access, poor scalability, and duplicated optimizations. This paper proposes Tetris, the first system for high-performance Stencil on heterogeneous CPU+GPU, towards democratizing Stencil-driven… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  46. arXiv:2303.08308  [pdf, other

    cs.CV

    SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

    Authors: Li Lyna Zhang, Xudong Wang, Jiahang Xu, Quanlu Zhang, Yu**g Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang

    Abstract: The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the quantization-unfriendly issue: t… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  47. arXiv:2302.03213  [pdf, other

    cs.LG cs.NE

    LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup

    Authors: Xiaohu Tang, Yang Wang, Ting Cao, Li Lyna Zhang, Qi Chen, Deng Cai, Yunxin Liu, Mao Yang

    Abstract: On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to empower inference by table lookup, to reduce inference cost. LUT-NN learns the typical features for each operator, named centroid, and precompute the results for these centroids to save in lookup tables. During inference, the resu… ▽ More

    Submitted 6 September, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Journal ref: MobiCom 2023: Proceedings of the 29th Annual International Conference on Mobile Computing And Networking

  48. arXiv:2301.06237  [pdf, other

    cs.LO cs.CL

    A separation logic for sequences in pointer programs and its decidability

    Authors: Tianyue Cao, Bowen Zhang, Zhao **, Yongzhi Cao, Hanpin Wang

    Abstract: Separation logic and its variants can describe various properties on pointer programs. However, when it comes to properties on sequences, one may find it hard to formalize. To deal with properties on variable-length sequences and multilevel data structures, we propose sequence-heap separation logic which integrates sequences into logical reasoning on heap-manipulated programs. Quantifiers over seq… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

  49. arXiv:2212.01893  [pdf, other

    cs.CV

    Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering

    Authors: Duy M. H. Nguyen, Hoang Nguyen, Mai T. N. Truong, Tri Cao, Binh T. Nguyen, Nhat Ho, Paul Swoboda, Shadi Albarqouni, Pengtao Xie, Daniel Sonntag

    Abstract: Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: Accepted at AAAI 2023

  50. arXiv:2211.08316  [pdf, other

    cs.CL

    FolkScope: Intention Knowledge Graph Construction for E-commerce Commonsense Discovery

    Authors: Changlong Yu, Weiqi Wang, Xin Liu, Jiaxin Bai, Yangqiu Song, Zheng Li, Yifan Gao, Tianyu Cao, Bing Yin

    Abstract: Understanding users' intentions in e-commerce platforms requires commonsense knowledge. In this paper, we present FolkScope, an intention knowledge graph construction framework to reveal the structure of humans' minds about purchasing items. As commonsense knowledge is usually ineffable and not expressed explicitly, it is challenging to perform information extraction. Thus, we propose a new approa… ▽ More

    Submitted 11 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: ACL Findings 2023