Skip to main content

Showing 1–50 of 213 results for author: Cheng, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18995  [pdf, other

    cs.LG cs.AI

    FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity

    Authors: Zhaobin Sun, Nannan Wu, Junjie Shi, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI 2024

  2. arXiv:2406.13511  [pdf, other

    cs.DC

    Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving

    Authors: Ke Cheng, Wen Hu, Zhi Wang, Hongen Peng, Jianguo Li, Sheng Zhang

    Abstract: Large language models (LLMs) iteratively generate text token by token, with memory usage increasing with the length of generated token sequences. The unpredictability of generation lengths makes it difficult to estimate the time and memory needed to process requests, posing a challenge for effective request scheduling. Conventional sequence-level scheduling (SLS) serves requests in a first-come fi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 22 figures

  3. arXiv:2406.12801  [pdf, other

    cs.HC

    "A Lot of Moving Parts": A Case Study of Open-Source Hardware Design Collaboration in the Thingiverse Community

    Authors: Kathy Cheng, Shurui Zhou, Alison Olechowski

    Abstract: Open-source is a decentralized and collaborative method of development that encourages open contribution from an extensive and undefined network of individuals. Although commonly associated with software development (OSS), the open-source model extends to hardware development, forming the basis of open-source hardware development (OSH). Compared to OSS, OSH is relatively nascent, lacking adequate… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages, 6 figures, to be published in Proceedings of the ACM on Human-Computer Interaction 2024

  4. arXiv:2406.11736  [pdf, other

    cs.CL cs.AI

    Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

    Authors: Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu

    Abstract: One of the primary driving forces contributing to the superior performance of Large Language Models (LLMs) is the extensive availability of human-annotated natural language data, which is used for alignment fine-tuning. This inspired researchers to investigate self-training methods to mitigate the extensive reliance on human annotations. However, the current success of self-training has been prima… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures

  5. arXiv:2406.08343  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

    Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  6. arXiv:2406.04785  [pdf, other

    cs.DC

    Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

    Authors: Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang

    Abstract: Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. Fir… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 12 pages, 14 figures

  7. arXiv:2405.19671  [pdf, other

    cs.CV

    GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction

    Authors: Haodong Xiang, Xinghui Li, Xiansong Lai, Wanting Zhang, Zhichao Liao, Kai Cheng, Xue** Liu

    Abstract: Recently, 3D Gaussian Splatting(3DGS) has revolutionized neural rendering with its high-quality rendering and real-time speed. However, when it comes to indoor scenes with a significant number of textureless areas, 3DGS yields incomplete and noisy reconstruction results due to the poor initialization of the point cloud and under-constrained optimization. Inspired by the continuity of signed distan… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  8. arXiv:2405.17705  [pdf, other

    cs.CV

    DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

    Authors: Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu

    Abstract: We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across vari… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 9 pages,7 figures;project page: https://linhanwang.github.io/dcgaussian/

  9. arXiv:2405.15452  [pdf, other

    cs.CL cs.AI cs.LG

    Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top

    Authors: Keyuan Cheng, Muhammad Asif Ali, Shu Yang, Gang Lin, Yuxuan Zhai, Haoyang Fei, Ke Xu, Lu Yu, Lijie Hu, Di Wang

    Abstract: Multi-hop Question Answering (MQA) under knowledge editing (KE) is a key challenge in Large Language Models (LLMs). While best-performing solutions in this domain use a plan and solve paradigm to split a question into sub-questions followed by response generation, we claim that this approach is sub-optimal as it fails for hard to decompose questions, and it does not explicitly cater to correlated… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 18 pages

  10. arXiv:2405.14156  [pdf, other

    cs.CV

    Unveiling the Tapestry of Consistency in Large Vision-Language Models

    Authors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

    Abstract: Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs an… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: This project is available at https://github.com/foundation-multimodal-models/ConBench

  11. arXiv:2405.11956  [pdf, other

    cs.NI

    PET: Multi-agent Independent PPO-based Automatic ECN Tuning for High-Speed Data Center Networks

    Authors: Kai Cheng, Ting Wang, Xiao Du, Shuyi Du, Haibin Cai

    Abstract: Explicit Congestion Notification (ECN)-based congestion control schemes have been widely adopted in high-speed data center networks (DCNs), where the ECN marking threshold plays a determinant role in guaranteeing a packet lossless DCN. However, existing approaches either employ static settings with immutable thresholds that cannot be dynamically self-adjusted to adapt to network dynamics, or fail… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  12. arXiv:2404.11613  [pdf, other

    cs.CV

    InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

    Authors: Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao

    Abstract: 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant proper… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Project page: https://johanan528.github.io/Infusion

  13. arXiv:2404.09613  [pdf, other

    cs.ET cs.AI cs.AR

    Efficient and accurate neural field reconstruction using resistive memory

    Authors: Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  14. arXiv:2404.07479  [pdf, other

    cs.HC

    RASSAR: Room Accessibility and Safety Scanning in Augmented Reality

    Authors: Xia Su, Han Zhang, Kaiming Cheng, Jaewook Lee, Qiaochu Liu, Wyatt Olson, Jon Froehlich

    Abstract: The safety and accessibility of our homes is critical to quality of life and evolves as we age, become ill, host guests, or experience life events such as having children. Researchers and health professionals have created assessment instruments such as checklists that enable homeowners and trained experts to identify and mitigate safety and access issues. With advances in computer vision, augmente… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: To Appear in CHI 2024

  15. arXiv:2404.05648  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model

    Authors: Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo Wang, Xinyuan Zhang, Binbin Cui, Yi Li, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated st… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  16. arXiv:2404.04518  [pdf, other

    cs.CV

    MedIAnomaly: A comparative study of anomaly detection in medical images

    Authors: Yu Cai, Weiwen Zhang, Hao Chen, Kwang-Ting Cheng

    Abstract: Anomaly detection (AD) aims at detecting abnormal samples that deviate from the expected normal patterns. Generally, it can be trained on merely normal data without the requirement for abnormal samples, and thereby plays an important role in the recognition of rare diseases and health screening in the medical domain. Despite numerous related studies, we observe a lack of a fair and comprehensive e… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Under submission

  17. arXiv:2404.00492  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-hop Question Answering under Temporal Knowledge Editing

    Authors: Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang

    Abstract: Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. However, existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. To address this limitation, we propose a novel framework, namely TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 23 pages

  18. arXiv:2404.00489  [pdf, other

    cs.CL cs.AI cs.LG

    PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression

    Authors: Muhammad Asif Ali, Zheng** Li, Shu Yang, Keyuan Cheng, Yang Cao, Tianhao Huang, Lijie Hu, Lu Yu, Di Wang

    Abstract: Large language models (LLMs) have shown exceptional abilities for multiple different natural language processing tasks. While prompting is a crucial tool for LLM inference, we observe that there is a significant cost associated with exceedingly lengthy prompts. Existing attempts to compress lengthy prompts lead to sub-standard results in terms of readability and interpretability of the compressed… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  19. arXiv:2404.00486  [pdf, other

    cs.CL cs.AI

    Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs

    Authors: Shu Yang, Jiayuan Su, Han Jiang, Mengdi Li, Keyuan Cheng, Muhammad Asif Ali, Lijie Hu, Di Wang

    Abstract: With the rise of large language models (LLMs), ensuring they embody the principles of being helpful, honest, and harmless (3H), known as Human Alignment, becomes crucial. While existing alignment methods like RLHF, DPO, etc., effectively fine-tune LLMs to match preferences in the preference dataset, they often lead LLMs to highly receptive human input and external evidence, even when this informat… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  20. arXiv:2403.19591  [pdf, other

    cs.LG cs.AR cs.NE

    Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

    Authors: **cheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng

    Abstract: Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of… ▽ More

    Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 61st ACM/IEEE Design Automation Conference (DAC) 2024

  21. arXiv:2403.14734  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

    Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

    Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More

    Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 64 pages, 6 figures, 10 tables, 692 references

  22. arXiv:2403.13307  [pdf, other

    cs.CV

    LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment

    Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yu**g Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma

    Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descript… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  23. arXiv:2403.13258  [pdf, other

    cs.CV

    SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts

    Authors: Xian Lin, Yangyang Xiang, Zhehao Wang, Kwang-Ting Cheng, Zengqiang Yan, Li Yu

    Abstract: Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  24. arXiv:2403.12537  [pdf, other

    cs.CV

    Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification

    Authors: Yi Lin, Zhengjie Zhu, Kwang-Ting Cheng, Hao Chen

    Abstract: Multiple instance learning (MIL) has emerged as a popular method for classifying histopathology whole slide images (WSIs). Existing approaches typically rely on frozen pre-trained models to extract instance features, neglecting the substantial domain shift between pre-training natural and histopathological images. To address this issue, we propose PAMT, a novel Prompt-guided Adaptive Model Transfo… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  25. arXiv:2403.09303  [pdf, other

    cs.LG cs.CV

    Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical Perspective

    Authors: Yu Cai, Hao Chen, Kwang-Ting Cheng

    Abstract: Medical anomaly detection aims to identify abnormal findings using only normal training data, playing a crucial role in health screening and recognizing rare diseases. Reconstruction-based methods, particularly those utilizing autoencoders (AEs), are dominant in this field. They work under the assumption that AEs trained on only normal data cannot reconstruct unseen abnormal regions well, thereby… ▽ More

    Submitted 14 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  26. arXiv:2403.08407  [pdf, other

    cs.CV

    Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification

    Authors: Shuhan Li, Yi Lin, Hao Chen, Kwang-Ting Cheng

    Abstract: Accurate and robust classification of diseases is important for proper diagnosis and treatment. However, medical datasets often face challenges related to limited sample sizes and inherent imbalanced distributions, due to difficulties in data collection and variations in disease prevalence across different types. In this paper, we introduce an Iterative Online Image Synthesis (IOIS) framework to a… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  27. arXiv:2402.14650  [pdf, other

    cs.CV

    GaussianPro: 3D Gaussian Splatting with Progressive Propagation

    Authors: Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wen** Wang, Xue** Chen

    Abstract: The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always f… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: See the project page for code, data: https://kcheng1021.github.io/gaussianpro.github.io

  28. arXiv:2402.13415  [pdf, other

    cs.CL

    Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text

    Authors: Kewei Cheng, Nesreen K. Ahmed, Theodore Willke, Yizhou Sun

    Abstract: Although Large Language Models (LLMs) excel at addressing straightforward reasoning tasks, they frequently struggle with difficulties when confronted by more complex multi-step reasoning due to a range of factors. Firstly, natural language often encompasses complex relationships among entities, making it challenging to maintain a clear reasoning chain over longer spans. Secondly, the abundance of… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  29. arXiv:2402.11826  [pdf, other

    cs.CV

    Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

    Authors: Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng, Xiangyang Ji

    Abstract: Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods foc… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  30. arXiv:2402.09353  [pdf, other

    cs.CL cs.CV

    DoRA: Weight-Decomposed Low-Rank Adaptation

    Authors: Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

    Abstract: Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Code available at https://github.com/NVlabs/DoRA

  31. Smart Fitting Room: A One-stop Framework for Matching-aware Virtual Try-on

    Authors: Mingzhe Yu, Yunshan Ma, Lei Wu, Kai Cheng, Xue Li, Lei Meng, Tat-Seng Chua

    Abstract: The development of virtual try-on has revolutionized online shop** by allowing customers to visualize themselves in various fashion items, thus extending the in-store try-on experience to the cyber space. Although virtual try-on has attracted considerable research initiatives, existing systems only focus on the quality of image generation, overlooking whether the fashion item is a good match to… ▽ More

    Submitted 20 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  32. arXiv:2401.16459  [pdf, other

    cs.CV

    Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors

    Authors: Shiyin Dong, Mingrui Zhu, Kun Cheng, Nannan Wang, Xinbo Gao

    Abstract: The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potenti… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 18 pages,11 figures

  33. arXiv:2401.13174  [pdf, other

    cs.CV

    Boundary and Relation Distillation for Semantic Segmentation

    Authors: Dong Zhang, **cheng Dong, Xinting Hu, Long Chen, Kwang-Ting Cheng

    Abstract: Recently, it has been revealed that small semantic segmentation (SS) models exhibit a tendency to make errors in maintaining boundary region completeness and preserving target region connectivity, despite their effective segmentation of the main object regions. To address these errors, we propose a targeted boundary and relation distillation (BRD) strategy using knowledge distillation from large t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  34. arXiv:2401.10935  [pdf, other

    cs.HC cs.AI

    SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

    Authors: Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

    Abstract: Graphical User Interface (GUI) agents are designed to automate complex tasks on digital devices, such as smartphones and desktops. Most existing GUI agents interact with the environment through extracted structured data, which can be notably lengthy (e.g., HTML) and occasionally inaccessible (e.g., on desktops). To alleviate this issue, we propose a novel visual GUI agent -- SeeClick, which only r… ▽ More

    Submitted 22 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  35. arXiv:2401.07437  [pdf, other

    cs.CV

    BoNuS: Boundary Mining for Nuclei Segmentation with Partial Point Labels

    Authors: Yi Lin, Zeyu Wang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

    Abstract: Nuclei segmentation is a fundamental prerequisite in the digital pathology workflow. The development of automated methods for nuclei segmentation enables quantitative analysis of the wide existence and large variances in nuclei morphometry in histopathology images. However, manual annotation of tens of thousands of nuclei is tedious and time-consuming, which requires significant amount of human ef… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Transactions on Medical Imaging

  36. arXiv:2312.16087  [pdf, ps, other

    cs.IT math.CO

    Improved decoding of expander codes: fundamental trade-off between expansion ratio and minimum distance of inner code

    Authors: Kuan Cheng, Minghui Ouyang, Chong Shangguan, Yuanting Shen

    Abstract: Tanner codes are graph-based linear codes whose parity-check matrices can be characterized by a bipartite graph $G$ together with an inner code $C_0$. Expander codes are Tanner codes whose defining bipartite graph $G$ has good expansion property. The landmark work of Sipser and Spielman showed that every bipartite expander $G$ with expansion ratio $δ>3/4$ together with a parity-check code defines… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 28 pages

  37. arXiv:2312.09262  [pdf, other

    cs.LG cs.AR

    Random resistive memory-based deep extreme point learning machine for unified visual processing

    Authors: Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo Wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data rep… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  38. arXiv:2312.09066  [pdf, other

    cs.CV cs.AI

    CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels

    Authors: Chi-hsuan Wu, Shih-yang Liu, Xijie Huang, Xingbo Wang, Rong Zhang, Luca Minciullo, Wong Kai Yiu, Kenny Kwan, Kwang-Ting Cheng

    Abstract: Online learning is a rapidly growing industry. However, a major doubt about online learning is whether students are as engaged as they are in face-to-face classes. An engagement recognition system can notify the instructors about the students condition and improve the learning experience. Current challenges in engagement detection involve poor label quality, extreme data imbalance, and intra-class… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 11 pages

  39. arXiv:2312.08901  [pdf, other

    cs.CL cs.AI

    Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning

    Authors: Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Fan Yang, Mao Yang

    Abstract: Large Language Models (LLMs) have shown impressive capabilities, yet they still struggle with math reasoning. In this work, we propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning to improve LLM mathematical reasoning. Motivated by the observation that adding more concise CoT examples in the prompt can improve LLM reasoning performance, CoT-Inf… ▽ More

    Submitted 15 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  40. arXiv:2312.07294   

    cs.MM

    Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description

    Authors: Mianzhi Pan, Jianfei Li, Mingyue Yu, Zheng Ma, Kanzhi Cheng, Jianbing Zhang, Jiajun Chen

    Abstract: Commonsense reasoning, the ability to make logical assumptions about daily scenes, is one core intelligence of human beings. In this work, we present a novel task and dataset for evaluating the ability of text-to-image generative models to conduct commonsense reasoning, which we call PAINTaboo. Given a description with few visual clues of one object, the goal is to generate images illustrating the… ▽ More

    Submitted 22 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: It is an incomplete work

  41. arXiv:2312.06657  [pdf, other

    cs.CV

    Learning Naturally Aggregated Appearance for Efficient 3D Editing

    Authors: Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen

    Abstract: Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness. In view of such a deficiency, we propose to replace the color field with an explicit 2D appearance aggregation, also called canonical image, with which users can easily customize their 3D editing v… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project Webpage: https://felixcheng97.github.io/AGAP/, Code: https://github.com/felixcheng97/AGAP

  42. arXiv:2312.06053  [pdf, other

    cs.CL cs.LG

    IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions

    Authors: Ziheng Zeng, Kellen Tan Cheng, Srihari Venkat Nanniyur, Jianing Zhou, Suma Bhat

    Abstract: Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  43. arXiv:2311.16945  [pdf, other

    cs.CV

    UC-NeRF: Neural Radiance Field for Under-Calibrated Multi-view Cameras in Autonomous Driving

    Authors: Kai Cheng, Xiaoxiao Long, Wei Yin, ** Wang, Zhiqiang Wu, Yuexin Ma, Kaixuan Wang, Xiaozhi Chen, Xue** Chen

    Abstract: Multi-camera setups find widespread use across various applications, such as autonomous driving, as they greatly expand sensing capabilities. Despite the fast development of Neural radiance field (NeRF) techniques and their wide applications in both indoor and outdoor scenes, applying NeRF to multi-camera systems remains very challenging. This is primarily due to the inherent under-calibration iss… ▽ More

    Submitted 10 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: See the project page for code, data: https://kcheng1021.github.io/ucnerf.github.io

  44. arXiv:2311.14395  [pdf, other

    cs.LG cs.CV

    Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

    Authors: Ke Cheng, Xuecheng Hua, Hu Lu, Juanjuan Tu, Yuanquan Wang, Shitong Wang

    Abstract: The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in how to extract discriminative features from different modalities for matching purposes. While the existing well works primarily focus on minimizing the modal discrepancies, the modality information can not thoroughly be leveraged. To solve this problem, a Multi-scale Semantic Correlation Mining network (MSCM… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  45. arXiv:2311.12079  [pdf, other

    cs.CV

    FreeKD: Knowledge Distillation via Semantic Frequency Prompt

    Authors: Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang

    Abstract: Knowledge distillation (KD) has been applied to various tasks successfully, and mainstream methods typically boost the student model via spatial imitation losses. However, the consecutive downsamplings induced in the spatial domain of teacher model is a type of corruption, hindering the student from analyzing what specific information needs to be imitated, which results in accuracy degradation. To… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  46. arXiv:2311.07164  [pdf, other

    cs.ET cs.AI cs.AR

    Pruning random resistive memory for optimizing analogue AI

    Authors: Yi Li, Songqi Wang, Ya** Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic device… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  47. arXiv:2310.16836  [pdf, other

    cs.CL cs.AI cs.AR cs.CV

    LLM-FP4: 4-Bit Floating-Point Quantized Transformers

    Authors: Shih-yang Liu, Zechun Liu, Xijie Huang, **cheng Dong, Kwang-Ting Cheng

    Abstract: We propose LLM-FP4 for quantizing both weights and activations in large language models (LLMs) down to 4-bit floating-point values, in a post-training manner. Existing post-training quantization (PTQ) solutions are primarily integer-based and struggle with bit widths below 8 bits. Compared to integer quantization, floating-point (FP) quantization is more flexible and can better handle long-tail or… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main Conference

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  48. arXiv:2310.12982  [pdf, other

    cs.CV

    Putting the Object Back into Video Object Segmentation

    Authors: Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing

    Abstract: We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In con… ▽ More

    Submitted 11 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: CVPR 2024 Highlight. Project page: https://hkchengrex.github.io/Cutie

  49. arXiv:2310.03443  [pdf, ps, other

    cs.CL cs.SD eess.AS

    The North System for Formosa Speech Recognition Challenge 2023

    Authors: Li-Wei Chen, Kai-Chen Cheng, Hung-Shin Lee

    Abstract: This report provides a concise overview of the proposed North system, which aims to achieve automatic word/syllable recognition for Taiwanese Hakka (Sixian). The report outlines three key components of the system: the acquisition, composition, and utilization of the training data; the architecture of the model; and the hardware specifications and operational statistics. The demonstration of the sy… ▽ More

    Submitted 5 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  50. arXiv:2309.07510  [pdf, other

    cs.RO cs.AI cs.CV

    Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

    Authors: Kai Cheng, Ruihai Wu, Yan Shen, Chuanruo Ning, Guanqi Zhan, Hao Dong

    Abstract: Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agen… ▽ More

    Submitted 20 November, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: In 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Website at https://chengkaiacademycity.github.io/EnvAwareAfford/