Skip to main content

Showing 1–50 of 580 results for author: Chen, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00928  [pdf, other

    cs.LG cs.CL

    FoldGPT: Simple and Effective Large Language Model Compression Scheme

    Authors: Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

    Abstract: The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.19791  [pdf, other

    cs.RO

    Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding

    Authors: Yifan Tang, Cong Tai, Fangxing Chen, Wanting Zhang, Tao Zhang, Xue** Liu, Yong** Liu, Long Zeng

    Abstract: Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed,… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This version has been accepted by ICRA2024 and the dataset has been published, where the link can be found in the paper

    Journal ref: IEEE International Conference on Robotics & Automation,2024

  3. arXiv:2406.18579  [pdf, other

    cs.CV cs.IR

    Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

    Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

    Abstract: Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-ob… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 22pages, 5 Figures, 6 tables, the extension of CMSEI in WACV23, and submitted to ACM TIST. arXiv admin note: text overlap with arXiv:2210.08908

  4. arXiv:2406.17274  [pdf, other

    cs.CL cs.LG

    Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

    Authors: Jianfeng He, Runing Yang, Linlin Yu, Changbin Li, Ruoxi Jia, Feng Chen, Ming **, Chang-Tien Lu

    Abstract: Text summarization, a key natural language generation (NLG) task, is vital in various domains. However, the high cost of inaccurate summaries in risk-critical applications, particularly those involving human-in-the-loop decision-making, raises concerns about the reliability of uncertainty estimation on text summarization (UE-TS) evaluation methods. This concern stems from the dependency of uncerta… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 63 pages, 41 figures, 11 tables

  5. arXiv:2406.17109  [pdf, other

    cs.CV

    GMT: Guided Mask Transformer for Leaf Instance Segmentation

    Authors: Feng Chen, Sotirios A. Tsaftaris, Mario Valerio Giuffrida

    Abstract: Leaf instance segmentation is a challenging multi-instance segmentation task, aiming to separate and delineate each leaf in an image of a plant. The delineation of each leaf is a necessary prerequisite task for several biology-related applications such as the fine-grained monitoring of plant growth, and crop yield estimation. The task is challenging because self-similarity of instances is high (si… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  7. arXiv:2406.12219  [pdf, other

    cs.CV

    PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

    Authors: Feng Chen, Ling Ding, Kanokphan Lertniphonphan, Jian Li, Kaer Huang, Zhepeng Wang

    Abstract: This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of the challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided for the task. This task is particularly challenging due to the subtle movements and occlusions. To handle the complexity of the task, we propose the… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.12211  [pdf, other

    cs.CV

    PCIE_LAM Solution for Ego4D Looking At Me Challenge

    Authors: Kanokphan Lertniphonphan, Jun Xie, Yaqing Meng, Shi**g Wang, Feng Chen, Zhepeng Wang

    Abstract: This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11434  [pdf, other

    cs.DB

    DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

    Authors: Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

    Abstract: Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.10484  [pdf, other

    cs.CV

    Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

    Authors: Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

    Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  11. arXiv:2406.07920  [pdf, ps, other

    cs.LG cs.AI cs.CC math.ST stat.ML

    Near-Optimal Learning and Planning in Separated Latent MDPs

    Authors: Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin

    Abstract: We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statist… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  12. arXiv:2406.07294  [pdf, other

    cs.RO cs.CV

    OTO Planner: An Efficient Only Travelling Once Exploration Planner for Complex and Unknown Environments

    Authors: Bo Zhou, Chuanzhao Lu, Yan Pan, Fu Chen

    Abstract: Autonomous exploration in complex and cluttered environments is essential for various applications. However, there are many challenges due to the lack of global heuristic information. Existing exploration methods suffer from the repeated paths and considerable computational resource requirement in large-scale environments. To address the above issues, this letter proposes an efficient exploration… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  13. arXiv:2406.06158  [pdf, other

    cs.LG cs.AI stat.ML

    Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

    Authors: Daniel Kunin, Allan Raventós, Clémentine Dominé, Feng Chen, David Klindt, Andrew Saxe, Surya Ganguli

    Abstract: While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our theoretical understanding stemming from the opposing lazy regime. In this work, we derive exact solutions to a minimal model that transitions between laz… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 40 pages, 12 figures

  14. arXiv:2406.03744  [pdf, other

    cs.CV cs.LG

    ReDistill: Residual Encoded Distillation for Peak Memory Reduction

    Authors: Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

    Abstract: The expansion of neural network sizes and the enhancement of image resolution through modern camera sensors result in heightened memory and power demands for neural networks. Reducing peak memory, which is the maximum memory consumed during the execution of a neural network, is critical to deploy neural networks on edge devices with limited memory budget. A naive approach to reducing peak memory i… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  15. arXiv:2406.03345  [pdf, other

    cs.LG cs.AI

    Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

    Authors: Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen

    Abstract: Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perha… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  16. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  17. arXiv:2405.20986  [pdf, other

    cs.LG cs.CV

    Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks

    Authors: Linlin Yu, Bowen Yang, Tianhao Wang, Kangshuo Li, Feng Chen

    Abstract: The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This pape… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  18. arXiv:2405.20562  [pdf, other

    cs.LG cs.AI

    Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study

    Authors: Haroon Miah, Dimitrios Kollias, Giacinto Luca Pedone, Drew Provan, Frederick Chen

    Abstract: Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcom… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  19. arXiv:2405.19854  [pdf, other

    cs.CV

    RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection

    Authors: Fangyi Chen, Han Zhang, Zhantao Yang, Hao Chen, Kai Hu, Marios Savvides

    Abstract: Open-vocabulary object detection (OVD) requires solid modeling of the region-semantic relationship, which could be learned from massive region-text pairs. However, such data is limited in practice due to significant annotation costs. In this work, we propose RTGen to generate scalable open-vocabulary region-text pairs and demonstrate its capability to boost the performance of open-vocabulary objec… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Technical report

  20. arXiv:2405.19726  [pdf, other

    cs.CV

    Streaming Video Diffusion: Online Video Editing with Diffusion Models

    Authors: Feng Chen, Zhen Yang, Bohan Zhuang, Qi Wu

    Abstract: We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible, online video editing is tailored to real-life applications such as live streaming and online chat, requiring (1) fast continual step inference, (2) long-term tem… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  21. Learn to be Fair without Labels: a Distribution-based Learning Framework for Fair Ranking

    Authors: Fumian Chen, Hui Fang

    Abstract: Ranking algorithms as an essential component of retrieval systems have been constantly improved in previous studies, especially regarding relevance-based utilities. In recent years, more and more research attempts have been proposed regarding fairness in rankings due to increasing concerns about potential discrimination and the issue of echo chamber. These attempts include traditional score-based… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: ICTIR'23

  22. arXiv:2405.17047  [pdf, other

    cs.RO cs.LG

    Interpretable Robotic Manipulation from Language

    Authors: Boyuan Zheng, Jianlong Zhou, Fang Chen

    Abstract: Humans naturally employ linguistic instructions to convey knowledge, a process that proves significantly more complex for machines, especially within the context of multitask robotic manipulation environments. Natural language, moreover, serves as the primary medium through which humans acquire new knowledge, presenting a potentially intuitive bridge for translating concepts understandable by huma… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.15329  [pdf, other

    cs.CL

    Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

    Authors: Minzhi Li, Zhengyuan Liu, Shumin Deng, Shafiq Joty, Nancy F. Chen, Min-Yen Kan

    Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final ev… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  24. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  25. arXiv:2405.05949  [pdf, other

    cs.CV

    CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

    Authors: Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen

    Abstract: Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Exp… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  26. arXiv:2405.03138  [pdf, other

    cs.CL

    CRAFT: Extracting and Tuning Cultural Instructions from the Wild

    Authors: Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

    Abstract: Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introd… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6 pages

  27. arXiv:2405.03121  [pdf, other

    cs.CV cs.AI

    AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

    Authors: Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu

    Abstract: The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 14 pages, 7 figures

  28. arXiv:2404.18826  [pdf, other

    cs.SI

    Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization

    Authors: Qi Zhang, Lance M. Kaplan, Audun Jøsang, Dong Hyun. Jeong, Feng Chen, **-Hee Cho

    Abstract: Competitive Influence Maximization (CIM) involves entities competing to maximize influence in online social networks (OSNs). Current Deep Reinforcement Learning (DRL) methods in CIM rely on simplistic binary opinion models (i.e., an opinion is represented by either 0 or 1) and often overlook the complexity of users' behavioral characteristics and their prior knowledge. We propose a novel DRL-based… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures, submitted to ASONAM 2024

  29. arXiv:2404.17735  [pdf, other

    cs.LG cs.AI stat.ME

    Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models

    Authors: Aneesh Komanduri, Chen Zhao, Feng Chen, Xintao Wu

    Abstract: Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-controlled generation using diffusion models. Specifically, causal mode… ▽ More

    Submitted 8 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Short version accepted to CVPR 2024 Workshop on Generative Models for Computer Vision

  30. 3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

    Authors: Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

    Abstract: In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval. 3SHNet highlights the salient identification of prominent objects and their spatial locations within the visual modality, thus allowing the integration of visual semantics-spatial interactions and maintaining indep… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted Information Processing and Management (IP&M), 10 pages, 9 figures and 8 tables

    Journal ref: Information Processing & Management, Volume 61, Issue 4, July 2024, 103716

  31. arXiv:2404.17199  [pdf, other

    cs.CV

    Few-shot Calligraphy Style Learning

    Authors: Fangda Chen, Jiacheng Nie, Lichuan Jiang, Zhuoer Zeng

    Abstract: We introduced "Presidifussion," a novel approach to learning and replicating the unique style of calligraphy of President Xu, using a pretrained diffusion model adapted through a two-stage training process. Initially, our model is pretrained on a diverse dataset containing works from various calligraphers. This is followed by fine-tuning on a smaller, specialized dataset of President Xu's calligra… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  32. arXiv:2404.11932  [pdf, other

    cs.CL cs.AI

    CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

    Authors: Geyu Lin, Bin Wang, Zhengyuan Liu, Nancy F. Chen

    Abstract: Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance discrepancy mainly stems from the imbalanced distribution of training data across languages during pre-training and instruction tuning stages. To address this p… ▽ More

    Submitted 12 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages

  33. Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems

    Authors: Dayu Yang, Fumian Chen, Hui Fang

    Abstract: Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)

  34. arXiv:2404.10980  [pdf, other

    cs.CV cs.LG

    Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty

    Authors: Changbin Li, Kangshuo Li, Yuzhe Ou, Lance M. Kaplan, Audun Jøsang, **-Hee Cho, Dong Hyun Jeong, Feng Chen

    Abstract: Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explic… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: In Proceedings of The Twelfth International Conference on Learning Representations, ICLR 2024

  35. arXiv:2404.10407  [pdf

    cs.CV

    Comprehensive Survey of Model Compression and Speed up for Vision Transformers

    Authors: Feiyang Chen, Ziqian Luo, Lisang Zhou, Xueting Pan, Ying Jiang

    Abstract: Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks. However, their practical deployment is hampered by high computational and memory demands. This study addresses the challenge by evaluating four primary model compression techniques: quantization, low-rank approximation, knowledge distillation, and pruning. We metho… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Journal ref: Journal of Information, Technology and Policy (2024): 1-12

  36. arXiv:2404.10322  [pdf, other

    cs.CV

    Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation

    Authors: Jiapeng Su, Qi Fan, Guangming Lu, Fanglin Chen, Wenjie Pei

    Abstract: Few-shot semantic segmentation (FSS) has achieved great success on segmenting objects of novel classes, supported by only a few annotated samples. However, existing FSS methods often underperform in the presence of domain shifts, especially when encountering new domain styles that are unseen during training. It is suboptimal to directly adapt or generalize the entire model to new domains in the fe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  37. arXiv:2404.10209  [pdf, other

    cs.AI cs.LG

    Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

    Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhi** Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.09754  [pdf, other

    cs.CL

    Resilience of Large Language Models for Noisy Instructions

    Authors: Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

    Abstract: As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks. Nonetheless, the resilience of LLMs to handle text containing inherent errors, stemming from human interactions and collaborative systems, has not been thoroughly explored. Our study investigates… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 12 pages

  39. arXiv:2404.06762  [pdf, other

    cs.CL cs.HC

    Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

    Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

    Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  40. arXiv:2404.03429  [pdf, other

    cs.CL

    Scaffolding Language Learning via Multi-modal Tutoring Systems with Pedagogical Instructions

    Authors: Zhengyuan Liu, Stella Xin Yin, Carolyn Lee, Nancy F. Chen

    Abstract: Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in lang… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  41. arXiv:2404.02790  [pdf, other

    cs.CV

    MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

    Authors: Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang, Fei Chen, Steven McDonagh, Gerasimos Lampouras, Ignacio Iacobacci, Sarah Parisot

    Abstract: Text-to-image generation has achieved astonishing results, yet precise spatial controllability and prompt fidelity remain highly challenging. This limitation is typically addressed through cumbersome prompt engineering, scene layout conditioning, or image editing techniques which often require hand drawn masks. Nonetheless, pre-existing works struggle to take advantage of the natural instance-leve… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - Project page: https://MuLAn-dataset.github.io/

  42. arXiv:2404.00095  [pdf, other

    cs.CV

    GDA: Generalized Diffusion for Robust Test-time Adaptation

    Authors: Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo

    Abstract: Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

  43. Fusion Dynamical Systems with Machine Learning in Imitation Learning: A Comprehensive Overview

    Authors: Yingbai Hu, Fares J. Abu-Dakka, Fei Chen, Xiao Luo, Zheng Li, Alois Knoll, Wei** Ding

    Abstract: Imitation Learning (IL), also referred to as Learning from Demonstration (LfD), holds significant promise for capturing expert motor skills through efficient imitation, facilitating adept navigation of complex scenarios. A persistent challenge in IL lies in extending generalization from historical demonstrations, enabling the acquisition of new skills without re-teaching. Dynamical system-based IL… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  44. arXiv:2403.16048  [pdf, other

    cs.CV

    Edit3K: Universal Representation Learning for Video Editing Components

    Authors: Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, Yufei Wang, Tiejian Luo, Sijie Zhu

    Abstract: This paper focuses on understanding the predominant video creation pipeline, i.e., compositional video editing with six main types of editing components, including video effects, animation, transition, filter, sticker, and text. In contrast to existing visual representation learning of visual materials (i.e., images/videos), we aim to learn visual representations of editing actions/components that… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  45. arXiv:2403.15933  [pdf, ps, other

    cs.AI cs.LG

    Understanding Domain-Size Generalization in Markov Logic Networks

    Authors: Florian Chen, Felix Weitkämper, Sagar Malhotra

    Abstract: We study the generalization behavior of Markov Logic Networks (MLNs) across relational structures of different sizes. Multiple works have noticed that MLNs learned on a given domain generalize poorly across domains of different sizes. This behavior emerges from a lack of internal consistency within an MLN when used across different domain sizes. In this paper, we quantify this inconsistency and bo… ▽ More

    Submitted 3 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: To Appear in Proceedings of ECML 2024-Research Track

  46. arXiv:2403.13639  [pdf, other

    cs.MA

    Multi-agent Reinforcement Traffic Signal Control based on Interpretable Influence Mechanism and Biased ReLU Approximation

    Authors: Zhiyue Luo, Jun Xu, Fanglin Chen

    Abstract: Traffic signal control is important in intelligent transportation system, of which cooperative control is difficult to realize but yet vital. Many methods model multi-intersection traffic networks as grids and address the problem using multi-agent reinforcement learning (RL). Despite these existing studies, there is an opportunity to further enhance our understanding of the connectivity and global… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  47. arXiv:2403.13351  [pdf, other

    cs.CV

    OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

    Authors: Xinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang

    Abstract: Redundancy is a persistent challenge in Capsule Networks (CapsNet),leading to high computational costs and parameter counts. Although previous works have introduced pruning after the initial capsule layer, dynamic routing's fully connected nature and non-orthogonal weight matrices reintroduce redundancy in deeper layers. Besides, dynamic routing requires iterating to converge, further increasing c… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages

  48. arXiv:2403.12698  [pdf, other

    cs.AR cs.CY

    System Support for Environmentally Sustainable Computing in Data Centers

    Authors: Fan Chen

    Abstract: Modern data centers suffer from a growing carbon footprint due to insufficient support for environmental sustainability. While hardware accelerators and renewable energy have been utilized to enhance sustainability, addressing Quality of Service (QoS) degradation caused by renewable energy supply and hardware recycling remains challenging: (1) prior accelerators exhibit significant carbon footprin… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  49. arXiv:2403.11123  [pdf, other

    cs.CL

    Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking

    Authors: Taha Aksu, Nancy F. Chen

    Abstract: Current metrics for evaluating Dialogue State Tracking (DST) systems exhibit three primary limitations. They: i) erroneously presume a uniform distribution of slots throughout the dialog, ii) neglect to assign partial scores for individual turns, iii) frequently overestimate or underestimate performance by repeatedly counting the models' successful or failed predictions. To address these shortcomi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to COLING 2024

  50. arXiv:2403.11048  [pdf, other

    quant-ph cs.CY cs.LG

    JustQ: Automated Deployment of Fair and Accurate Quantum Neural Networks

    Authors: Ruhan Wang, Fahiz Baba-Yara, Fan Chen

    Abstract: Despite the success of Quantum Neural Networks (QNNs) in decision-making systems, their fairness remains unexplored, as the focus primarily lies on accuracy. This work conducts a design space exploration, unveiling QNN unfairness, and highlighting the significant influence of QNN deployment and quantum noise on accuracy and fairness. To effectively navigate the vast QNN deployment design space, we… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Journal ref: published at ASP-DAC 2024