Skip to main content

Showing 1–50 of 513 results for author: Sun, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  2. arXiv:2407.00431  [pdf, other

    cs.CV

    Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones

    Authors: Qiangguo **, Jiapeng Huang, Changming Sun, Hui Cui, ** Xuan, Ran Su, Leyi Wei, Yu-Jie Wu, Chia-An Wu, Henry B. L. Duh, Yueh-Hsun Lu

    Abstract: The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Journal ref: MICCAI 2024

  3. arXiv:2406.20066  [pdf, other

    cs.CV

    ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

    Authors: Ding-Jiun Huang, Zi-Ting Chou, Yu-Chiang Frank Wang, Cheng Sun

    Abstract: NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.18062  [pdf, other

    cs.LG cs.AI

    Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

    Authors: Chung-En Sun, Sicun Gao, Tsui-Wei Weng

    Abstract: Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Published in ICML 2024

  5. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, **g Sun, ** Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general pu… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. Learning Autonomous Race Driving with Action Map** Reinforcement Learning

    Authors: Yuanda Wang, Xin Yuan, Changyin Sun

    Abstract: Autonomous race driving poses a complex control challenge as vehicles must be operated at the edge of their handling limits to reduce lap times while respecting physical and safety constraints. This paper presents a novel reinforcement learning (RL)-based approach, incorporating the action map** (AM) mechanism to manage state-dependent input constraints arising from limited tire-road friction. A… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.08467  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    DafnyBench: A Benchmark for Formal Software Verification

    Authors: Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, Max Tegmark

    Abstract: We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough hints for the Dafny formal verification engine to successfully verify over 750 programs with about 53,000 lines of code. The best model and prompting scheme achieved 68% succe… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Code & dataset available at: https://github.com/sun-wendy/DafnyBench

  8. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin **, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, **g**g Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, **long Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, **gfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  9. arXiv:2406.05397  [pdf, other

    cs.SE

    Metamorphic Relation Generation: State of the Art and Visions for Future Research

    Authors: Rui Li, Huai Liu, Pak-Lok Poon, Dave Towey, Chang-Ai Sun, Zheng Zheng, Zhi Quan Zhou, Tsong Yueh Chen

    Abstract: Metamorphic testing has become one mainstream technique to address the notorious oracle problem in software testing, thanks to its great successes in revealing real-life bugs in a wide variety of software systems. Metamorphic relations, the core component of metamorphic testing, have continuously attracted research interests from both academia and industry. In the last decade, a rapidly increasing… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by International Workshop on Software Engineering in 2030

  10. arXiv:2406.01126  [pdf, other

    cs.CL cs.AI

    TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine

    Authors: Wen**g Yue, Xiaoling Wang, Wei Zhu, Ming Guan, Huanran Zheng, Pengfei Wang, Changzhi Sun, Xin Ma

    Abstract: Large language models (LLMs) have performed remarkably well in various natural language processing tasks by benchmarking, including in the Western medical domain. However, the professional evaluation benchmarks for LLMs have yet to be covered in the traditional Chinese medicine(TCM) domain, which has a profound history and vast influence. To address this research gap, we introduce TCM-Bench, an co… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 20 pages, 15 figures

  11. arXiv:2405.17830  [pdf, other

    cs.CL

    More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

    Authors: Chengyuan Liu, Shihang Wang, Yangyang Kang, Lizhi Qing, Fubang Zhao, Changlong Sun, Kun Kuang, Fei Wu

    Abstract: The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.15208  [pdf, other

    cs.CL cs.AI

    Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

    Authors: Chenxi Sun, Hongzhi Zhang, Zijia Lin, **gyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

    Abstract: Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at LREC-COLING 2024

  13. arXiv:2405.11106  [pdf, other

    cs.MA cs.AI cs.CL cs.LG cs.RO

    LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

    Authors: Chuanneng Sun, Songjun Huang, Dario Pompili

    Abstract: In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 8 pages, 1 figure, 1 table, submitted to IEEE RA-L

  14. arXiv:2405.03892  [pdf, other

    cs.LG cs.AI

    Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

    Authors: Minjae Cho, Jonathan P. How, Chuangchuang Sun

    Abstract: Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance whe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Submitted for review at IEEE: Neural Networks and Learning Systems

  15. arXiv:2405.03248  [pdf, other

    cs.LG cs.AI

    Communication-Efficient Federated Learning with Adaptive Compression under Dynamic Bandwidth

    Authors: Ying Zhuansun, Dandan Li, Xiaohong Huang, Caijun Sun

    Abstract: Federated learning can train models without directly providing local data to the server. However, the frequent updating of the local model brings the problem of large communication overhead. Recently, scholars have achieved the communication efficiency of federated learning mainly by model compression. But they ignore two problems: 1) network state of each client changes dynamically; 2) network st… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  16. arXiv:2405.02559  [pdf

    cs.CL cs.AI

    A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare

    Authors: Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V Stolyar, Katelyn Polanska, Karleigh R McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, Yanshan Wang

    Abstract: As generative artificial intelligence (AI), particularly Large Language Models (LLMs), continues to permeate healthcare, it remains crucial to supplement traditional automated evaluations with human expert evaluation. Understanding and evaluating the generated texts is vital for ensuring safety, reliability, and effectiveness. However, the cumbersome, time-consuming, and non-standardized nature of… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  17. arXiv:2405.01085  [pdf, other

    cs.CV

    Single Image Super-Resolution Based on Global-Local Information Synergy

    Authors: Nianzu Qiao, Lamei Di, Changyin Sun

    Abstract: Although several image super-resolution solutions exist, they still face many challenges. CNN-based algorithms, despite the reduction in computational complexity, still need to improve their accuracy. While Transformer-based algorithms have higher accuracy, their ultra-high computational complexity makes them difficult to be accepted in practical applications. To overcome the existing challenges,… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  18. arXiv:2405.01083  [pdf, other

    cs.CV

    MCMS: Multi-Category Information and Multi-Scale Stripe Attention for Blind Motion Deblurring

    Authors: Nianzu Qiao, Lamei Di, Changyin Sun

    Abstract: Deep learning-based motion deblurring techniques have advanced significantly in recent years. This class of techniques, however, does not carefully examine the inherent flaws in blurry images. For instance, low edge and structural information are traits of blurry images. The high-frequency component of blurry images is edge information, and the low-frequency component is structure information. A b… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  19. arXiv:2405.00645  [pdf, other

    cs.LG physics.ins-det

    Gradient-based Automatic Per-Weight Mixed Precision Quantization for Neural Networks On-Chip

    Authors: Chang Sun, Thea K. Ã…rrestad, Vladimir Loncar, Jennifer Ngadiuba, Maria Spiropulu

    Abstract: Model size and inference speed at deployment time, are major challenges in many deep learning applications. A promising strategy to overcome these challenges is quantization. However, a straightforward uniform quantization to very low precision can result in significant accuracy loss. Mixed-precision quantization, based on the idea that certain parts of the network can accommodate lower precision… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  20. arXiv:2404.19180  [pdf, other

    cs.AR

    MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor

    Authors: Bingcai Sui, Junzhong Shen, Caixia Sun, Junhui Wang, Zhong Zheng, Wei Guo

    Abstract: General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled mul… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  21. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, **g Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jian** Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  22. arXiv:2404.16886  [pdf, other

    cs.LG cs.AI

    Review of Data-centric Time Series Analysis from Sample, Feature, and Period

    Authors: Chenxi Sun, Hongyan Li, Yaliang Li, Shenda Hong

    Abstract: Data is essential to performing time series analysis utilizing machine learning approaches, whether for classic models or today's large language models. A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence, as well as task outcomes and costs. The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data q… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 9 pages, 1 figure

  23. arXiv:2404.14619  [pdf, other

    cs.CL cs.AI cs.LG

    OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

    Authors: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi **, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

    Abstract: The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Minor corrections

  24. arXiv:2404.13146  [pdf, other

    cs.CR cs.CV

    DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection

    Authors: Yan Ju, Chengzhe Sun, Shan Jia, Shuwei Hou, Zhaofeng Si, Soumyya Kanti Datta, Lipeng Ke, Riky Zhou, Anita Nikolich, Siwei Lyu

    Abstract: Deepfakes, as AI-generated media, have increasingly threatened media integrity and personal privacy with realistic yet fake digital content. In this work, we introduce an open-source and user-friendly online platform, DeepFake-O-Meter v2.0, that integrates state-of-the-art methods for detecting Deepfake images, videos, and audio. Built upon DeepFake-O-Meter v1.0, we have made significant upgrades… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  25. arXiv:2404.12652  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

    Authors: Yuan Zang, Tian Yun, Hao Tan, Trung Bui, Chen Sun

    Abstract: Do vision-language models (VLMs) pre-trained to caption an image of a "durian" learn visual concepts such as "brown" (color) and "spiky" (texture) at the same time? We aim to answer this question as visual concepts learned "for free" would enable wide applications such as neuro-symbolic reasoning or human-interpretable object classification. We assume that the visual concepts, if captured by pre-t… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  26. arXiv:2404.12014  [pdf, other

    cs.CL cs.CR

    Enhance Robustness of Language Models Against Variation Attack through Graph Integration

    Authors: Zi Xiong, Lizhi Qing, Yangyang Kang, Jiawei Liu, Hongsong Li, Changlong Sun, Xiaozhong Liu, Wei Lu

    Abstract: The widespread use of pre-trained language models (PLMs) in natural language processing (NLP) has greatly improved performance outcomes. However, these models' vulnerability to adversarial attacks (e.g., camouflaged hints from drug dealers), particularly in the Chinese language with its rich character diversity/variation and complex structures, hatches vital apprehension. In this study, we propose… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 4 figures, accepted by COLING 2024

  27. arXiv:2404.11027  [pdf, other

    cs.AI

    Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

    Authors: Guangran Cheng, Chuheng Zhang, Wenzhe Cai, Li Zhao, Changyin Sun, Jiang Bian

    Abstract: While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly. We find that the main reason is that LLMs are not grounded in the physical world. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policie… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  28. arXiv:2404.07425  [pdf, ps, other

    eess.SP cs.IT

    Precoder Design for User-Centric Network Massive MIMO with Matrix Manifold Optimization

    Authors: Rui Sun, Li You, An-An Lu, Chen Sun, Xiqi Gao, Xiang-Gen Xia

    Abstract: In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By prov… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 13 pages, 9 figures, journal

  29. arXiv:2404.05183  [pdf, other

    cs.CV cs.LG

    Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

    Abstract: Traditional defect classification approaches are facing with two barriers. (1) Insufficient training data and unstable data quality. Collecting sufficient defective sample is expensive and time-costing, consequently leading to dataset variance. It introduces the difficulty on recognition and learning. (2) Over-dependence on visual modality. When the image pattern and texture is monotonic for all d… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: MULA 2024

  30. arXiv:2404.00923  [pdf, other

    cs.CV cs.AI cs.RO

    MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

    Authors: Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu

    Abstract: Simultaneous localization and map** is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Project Webpage: https://vita-group.github.io/MM3DGS-SLAM

  31. arXiv:2403.20091  [pdf, other

    cs.IT eess.SP

    A Signature Based Approach Towards Global Channel Charting with Ultra Low Complexity

    Authors: Longhai Zhao, Yunchuan Yang, Qi Xiong, He Wang, Bin Yu, Feifei Sun, Chengjun Sun

    Abstract: Channel charting, an unsupervised learning method that learns a low-dimensional representation from channel information to preserve geometrical property of physical space of user equipments (UEs), has drawn many attentions from both academic and industrial communities, because it can facilitate many downstream tasks, such as indoor localization, UE handover, beam management, and so on. However, ma… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE ICC 2024 Workshops

  32. arXiv:2403.18843  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition

    Authors: Chang Sun, Hong Yang, Bo Qin

    Abstract: Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledge distillation approach using a Joint-Embedding Predictive Architecture (JEPA), named JEP-KD, design… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  33. arXiv:2403.15004  [pdf

    cs.CV cs.LG

    ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

    Authors: Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Jun-Wei Hsieh, Hui-Kai Su, Wen-Kai Kuo

    Abstract: This work presents ParFormer as an enhanced transformer architecture that allows the incorporation of different token mixers into a single stage, hence improving feature extraction capabilities. Integrating both local and global data allows for precise representation of short- and long-range spatial relationships without the need for computationally intensive methods such as shifting windows. Alon… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  34. arXiv:2403.13820  [pdf, other

    cs.LG cs.CR eess.SP

    Identity information based on human magnetocardiography signals

    Authors: Pengju Zhang, Chenxi Sun, Jianwei Zhang, Hong Guo

    Abstract: We have developed an individual identification system based on magnetocardiography (MCG) signals captured using optically pumped magnetometers (OPMs). Our system utilizes pattern recognition to analyze the signals obtained at different positions on the body, by scanning the matrices composed of MCG signals with a 2*2 window. In order to make use of the spatial information of MCG signals, we transf… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures. Author manuscript accepted for AAAI 2024 Spring Symposium on Clinical Foundation Models

  35. arXiv:2403.13111  [pdf, other

    cs.LG cs.AI

    Deep learning with noisy labels in medical prediction problems: a sco** review

    Authors: Yishu Wei, Yu Deng, Cong Sun, Mingquan Lin, Hongmei Jiang, Yifan Peng

    Abstract: Objectives: Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels. Despite this, the adoption of label noise management remains limited, and label noise is largely ignored. To this end, there is a critical need to conduct a sco** review focusing on the problem space. This sco** review aims to comprehensi… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  36. Inter- and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation

    Authors: Qiangguo **, Hui Cui, Changming Sun, Yang Song, Jiangbin Zheng, Leilei Cao, Leyi Wei, Ran Su

    Abstract: Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertaint… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: Expert Systems with Applications, 2024, 238: 122093

  37. arXiv:2403.11536  [pdf, other

    cs.CV cs.AI cs.LG

    OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Chun-Hung Sun, Kuang-Ming Wu

    Abstract: Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These incl… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2403.06414  [pdf, other

    cs.CL

    Evolving Knowledge Distillation with Large Language Models and Active Learning

    Authors: Chengyuan Liu, Yangyang Kang, Fubang Zhao, Kun Kuang, Zhuoren Jiang, Changlong Sun, Fei Wu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks. However, their computational costs are prohibitively high. To address this issue, previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data. Nonetheless, these works have mainly focused on the direct use of LLMs for text generation and labeling, w… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by COLING 2024

  39. arXiv:2403.05810  [pdf, other

    cs.CV cs.AI

    Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction

    Authors: Yonghao Dong, Le Wang, San** Zhou, Gang Hua, Changyin Sun

    Abstract: Pedestrian trajectory prediction is a crucial component in computer vision and robotics, but remains challenging due to the domain shift problem. Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model. However, such domain adaptation methods are impractical in real-world scenarios, as it is infeasible to collect t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  40. arXiv:2403.04283  [pdf, other

    cs.CL cs.AI cs.LG

    Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

    Authors: Yu Zhu, Chuxiong Sun, Wenfei Yang, Wenqiang Wei, Bo Tang, Tianzhu Zhang, Zhiyu Li, Shifeng Zhang, Feiyu Xiong, Jie Hu, Mingchuan yang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment p… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  41. arXiv:2403.03768  [pdf, other

    cs.AI cs.LG q-bio.QM

    DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response Evaluation

    Authors: Yushuai Wu, Ting Zhang, Hao Zhou, Hainan Wu, Hanwen Sunchu, Lei Hu, Xiaofang Chen, Suyuan Zhao, Gaochao Liu, Chao Sun, Jiahuan Zhang, Yizhen Luo, Peng Liu, Zaiqing Nie, Yushuai Wu

    Abstract: The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models b… ▽ More

    Submitted 18 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  42. arXiv:2402.18420  [pdf, other

    cs.RO

    CafkNet: GNN-Empowered Forward Kinematic Modeling for Cable-Driven Parallel Robots

    Authors: Zeqing Zhang, Linhan Yang, Cong Sun, Weiwei Shang, Jia Pan

    Abstract: The Cable-Driven Parallel Robots (CDPRs) have gained significant attention due to their high payload capacity and large workspace. When deploying CDPRs in practice, one of the challenges is kinematic modeling. Unlike serial mechanisms, CDPRs have a simple inverse kinematics problem but a complex forward kinematics (FK) issue. Therefore, the development of accurate and efficient FK solvers has been… ▽ More

    Submitted 5 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: To the best of authors' knowledge, it is the first study to employ the GNN for the FK problem of CDPRs. First two authors have equal contribution. Videos and codes are available at https://sites.google.com/view/cafknet/site

  43. arXiv:2402.11060  [pdf, other

    cs.CL cs.AI cs.IR

    Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement

    Authors: Chenkai Sun, Ke Yang, Revanth Gangi Reddy, Yi R. Fung, Hou Pong Chan, ChengXiang Zhai, Heng Ji

    Abstract: The increasing demand for personalized interactions with large language models (LLMs) calls for the development of methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on e… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  44. arXiv:2402.09424  [pdf, other

    eess.SP cs.CV cs.LG cs.NE

    Epilepsy Seizure Detection and Prediction using an Approximate Spiking Convolutional Transformer

    Authors: Qinyu Chen, Congyi Sun, Chang Gao, Shih-Chii Liu

    Abstract: Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram… ▽ More

    Submitted 21 January, 2024; originally announced February 2024.

    Comments: To be published at the 2024 IEEE International Symposium on Circuits and Systems (ISCAS), Singapore

  45. arXiv:2402.09369  [pdf, other

    cs.CL

    Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking

    Authors: Yi Fung, Ruining Zhao, Jae Doo, Chenkai Sun, Heng Ji

    Abstract: Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively mult… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: preprint

  46. arXiv:2402.07216  [pdf, other

    cs.CV

    A novel spatial-frequency domain network for zero-shot incremental learning

    Authors: Jie Ren, Yang Zhao, Weichuan Zhang, Changming Sun

    Abstract: Zero-shot incremental learning aims to enable the model to generalize to new classes without forgetting previously learned classes. However, the semantic gap between old and new sample classes can lead to catastrophic forgetting. Additionally, existing algorithms lack capturing significant information from each sample image domain, impairing models' classification performance. Therefore, this pape… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  47. arXiv:2402.07108  [pdf, other

    cs.LG math.OC

    Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

    Authors: Wenzhi Gao, Chunlin Sun, Chenyu Xue, Dongdong Ge, Yinyu Ye

    Abstract: Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on develo** efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed b… ▽ More

    Submitted 28 May, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  48. arXiv:2402.07087  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Self-Correcting Self-Consuming Loops for Generative Model Training

    Authors: Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun

    Abstract: As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless… ▽ More

    Submitted 10 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: Camera ready version (ICML 2024). Code at https://nategillman.com/sc-sc.html

  49. arXiv:2402.06459  [pdf, other

    cs.GT cs.CE cs.CR cs.CY econ.GN

    Maximizing NFT Incentives: References Make You Rich

    Authors: Guangsheng Yu, Qin Wang, Caijun Sun, Lam Duc Nguyen, H. M. N. Dilum Bandara, Shi** Chen

    Abstract: In this paper, we study how to optimize existing Non-Fungible Token (NFT) incentives. Upon exploring a large number of NFT-related standards and real-world projects, we come across an unexpected finding. That is, the current NFT incentive mechanisms, often organized in an isolated and one-time-use fashion, tend to overlook their potential for scalable organizational structures. We propose, analy… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  50. arXiv:2402.02800  [pdf, other

    cs.CV

    Extreme Two-View Geometry From Object Poses with Diffusion Models

    Authors: Yu**g Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiu

    Abstract: Human has an incredible ability to effortlessly perceive the viewpoint difference between two images containing the same object, even when the viewpoint change is astonishingly vast with no co-visible regions in the images. This remarkable skill, however, has proven to be a challenge for existing camera pose estimation methods, which often fail when faced with large viewpoint differences due to th… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.