Skip to main content

Showing 1–50 of 324 results for author: Yu, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00125  [pdf, other

    cs.SE cs.AI cs.DC

    A Survey on Failure Analysis and Fault Injection in AI Systems

    Authors: Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

    Abstract: The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ens… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  2. arXiv:2406.19280  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

    Authors: Junying Chen, Ruyi Ouyang, Anningzhe Gao, Shunian Chen, Guiming Hardy Chen, Xidong Wang, Ruifei Zhang, Zhenyang Cai, Ke Ji, Guangjun Yu, Xiang Wan, Benyou Wang

    Abstract: The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-i… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.15819  [pdf, other

    cs.LG cs.IT cs.NI eess.SP

    Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning

    Authors: Qiushuo Hou, Matteo Zecchin, Sangwoo Park, Yunlong Cai, Guanding Yu, Kaushik Chowdhury, Osvaldo Simeone

    Abstract: In modern wireless network architectures, such as O-RAN, artificial intelligence (AI)-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control. The AI "apps" are selected on the basis of contextual information such as network conditions, topology, traffic statistics, and design goals. The map** between context and AI model parameter… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: submitted for a journal publication

  4. arXiv:2406.11689  [pdf, other

    cs.CV

    Lightweight Model Pre-training via Language Guided Knowledge Distillation

    Authors: Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan Fan, Tao Chen

    Abstract: This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Si** Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  6. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, ** Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  7. arXiv:2406.06051  [pdf, other

    cs.AI cs.HC cs.LG

    On the Utility of Accounting for Human Beliefs about AI Behavior in Human-AI Collaboration

    Authors: Guanghui Yu, Robert Kasumba, Chien-Ju Ho, William Yeoh

    Abstract: To enable effective human-AI collaboration, merely optimizing AI performance while ignoring humans is not sufficient. Recent research has demonstrated that designing AI agents to account for human behavior leads to improved performance in human-AI collaboration. However, a limitation of most existing approaches is their assumption that human behavior is static, irrespective of AI behavior. In real… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  8. arXiv:2406.05216  [pdf, other

    cs.LG

    TabPFGen -- Tabular Data Generation with TabPFN

    Authors: Junwei Ma, Apoorv Dankar, George Stein, Guangwei Yu, Anthony Caterini

    Abstract: Advances in deep generative modelling have not translated well to tabular data. We argue that this is caused by a mismatch in structure between popular generative models and discriminative models of tabular data. We thus devise a technique to turn TabPFN -- a highly performant transformer initially designed for in-context discriminative tabular tasks -- into an energy-based generative model, which… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.05207  [pdf, other

    cs.LG

    Retrieval & Fine-Tuning for In-Context Tabular Models

    Authors: Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan, Guangwei Yu, Maksims Volkovs, Anthony Caterini

    Abstract: Tabular data is a pervasive modality spanning a wide range of domains, and the inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex datasets, but have struggled to scale to larger and more complex ones. To address this limitation, we propose a combination of retrieval and… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.00947  [pdf, other

    cs.CV

    Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

    Authors: Fei Gao, Siwen Wang, Churan Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Gang Yu, Yizhou Yu

    Abstract: Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset b… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 accept

  11. arXiv:2405.20853  [pdf, other

    cs.CV

    MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

    Authors: Si** Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, **gyi Yu, Gang Yu, Bin Fu, Tao Chen

    Abstract: The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2405.10481  [pdf, other

    cs.LG cs.AI

    Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

    Authors: Yuqing Lan, Zhenghao Liu, Yu Gu, Xiaoyuan Yi, Xiaohua Li, Liner Yang, Ge Yu

    Abstract: Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 12pages

  13. arXiv:2404.15506  [pdf, other

    cs.CV

    Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

    Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

    Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recov… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

    Comments: Our project page is at https://JUGGHM.github.io/Metric3Dv2. arXiv admin note: substantial text overlap with arXiv:2307.10984

  14. arXiv:2404.14037  [pdf, other

    cs.CV cs.MM

    GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

    Authors: Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

    Abstract: Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method… ▽ More

    Submitted 28 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: https://yuhongyun777.github.io/GaussianTalker/

  15. arXiv:2404.08886  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

    Authors: Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea

    Abstract: In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024 Industry Track

  16. arXiv:2404.08826  [pdf, other

    cs.PF math.PR

    Strongly Tail-Optimal Scheduling in the Light-Tailed M/G/1

    Authors: George Yu, Ziv Scully

    Abstract: We study the problem of scheduling jobs in a queueing system, specifically an M/G/1 with light-tailed job sizes, to asymptotically optimize the response time tail. This means scheduling to make $\mathbf{P}[T > t]$, the chance a job's response time exceeds $t$, decay as quickly as possible in the $t \to \infty$ limit. For some time, the best known policy was First-Come First-Served (FCFS), which ha… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 33 pages, 8 figures. To appear in SIGMETRICS 2024

  17. arXiv:2404.08681  [pdf, other

    cs.CL

    EFSA: Towards Event-Level Financial Sentiment Analysis

    Authors: Tianyu Chen, Yiming Zhang, Guoxin Yu, Dapeng Zhang, Li Zeng, Qing He, Xiang Ao

    Abstract: In this paper, we extend financial sentiment analysis~(FSA) to event-level since events usually serve as the subject of the sentiment in financial text. Though extracting events from the financial text may be conducive to accurate sentiment predictions, it has specialized challenges due to the lengthy and discontinuity of events in a financial text. To this end, we reconceptualize the event extrac… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  18. arXiv:2404.06077  [pdf, other

    cs.CR cs.AI cs.CY

    Is Your AI Truly Yours? Leveraging Blockchain for Copyrights, Provenance, and Lineage

    Authors: Yilin Sai, Qin Wang, Guangsheng Yu, H. M. N. Dilum Bandara, Shi** Chen

    Abstract: As Artificial Intelligence (AI) integrates into diverse areas, particularly in content generation, ensuring rightful ownership and ethical use becomes paramount. AI service providers are expected to prioritize responsibly sourcing training data and obtaining licenses from data owners. However, existing studies primarily center on safeguarding static copyrights, which simply treats metadata/dataset… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  19. arXiv:2404.03054  [pdf, other

    cs.AI cs.LG

    Data-Driven Goal Recognition Design for General Behavioral Agents

    Authors: Robert Kasumba, Guanghui Yu, Chien-Ju Ho, Sarah Keren, William Yeoh

    Abstract: Goal recognition design aims to make limited modifications to decision-making environments with the goal of making it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address… ▽ More

    Submitted 11 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  20. arXiv:2404.01700  [pdf, other

    cs.CV

    MotionChain: Conversational Motion Controllers via Multimodal Prompts

    Authors: Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan

    Abstract: Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures

  21. arXiv:2404.00964  [pdf, other

    cs.CV

    S2RC-GCN: A Spatial-Spectral Reliable Contrastive Graph Convolutional Network for Complex Land Cover Classification Using Hyperspectral Images

    Authors: Renxiang Guan, Zihao Li, Chujia Song, Guo Yu, Xianju Li, Ruyi Feng

    Abstract: Spatial correlations between different ground objects are an important feature of mining land cover research. Graph Convolutional Networks (GCNs) can effectively capture such spatial feature representations and have demonstrated promising results in performing hyperspectral imagery (HSI) classification tasks of complex land. However, the existing GCN-based HSI classification methods are prone to i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

  22. arXiv:2403.19895  [pdf, ps, other

    cs.IT cs.LG

    An Information-Theoretic Framework for Out-of-Distribution Generalization

    Authors: Wenliang Liu, Guanding Yu, Lele Wang, Renjie Liao

    Abstract: We study the Out-of-Distribution (OOD) generalization in machine learning and propose a general framework that provides information-theoretic generalization bounds. Our framework interpolates freely between Integral Probability Metric (IPM) and $f$-divergence, which naturally recovers some known results (including Wasserstein- and KL-bounds), as well as yields new generalization bounds. Moreover,… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  23. arXiv:2403.15010  [pdf, other

    cs.CV cs.CR

    Clean-image Backdoor Attacks

    Authors: Dazhong Rong, Guoyao Yu, Shuheng Shen, Xinyi Fu, Peng Qian, Jianhai Chen, Qinming He, Xing Fu, Weiqiang Wang

    Abstract: To gather a significant quantity of annotated training data for high-performance image classification models, numerous companies opt to enlist third-party providers to label their unlabeled data. This practice is widely regarded as secure, even in cases where some annotated errors occur, as the impact of these minor inaccuracies on the final performance of the models is negligible and existing bac… ▽ More

    Submitted 26 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  24. arXiv:2403.11469  [pdf, other

    cs.CV cs.GR

    Generative Motion Stylization within Canonical Motion Space

    Authors: Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu

    Abstract: Stylized motion breathes life into characters. However, the fixed skeleton structure and style representation hinder existing data-driven motion synthesis methods from generating stylized motion for various characters. In this work, we propose a generative motion stylization pipeline, named MotionS, for synthesizing diverse and stylized motion on cross-structure characters using cross-modality sty… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  25. arXiv:2403.05135  [pdf, other

    cs.CV

    ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

    Authors: Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

    Abstract: Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ability to comprehend dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. In this paper, we introduce an Efficient Large Language Model Ad… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project Page: https://ella-diffusion.github.io/

  26. arXiv:2403.01422  [pdf, other

    cs.CV

    MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

    Authors: Zhende Song, Chenchen Wang, Jiamu Sheng, Chi Zhang, Gang Yu, Jiayuan Fan, Tao Chen

    Abstract: Development of multimodal models has marked a significant step forward in how machines understand videos. These models have shown promise in analyzing short video clips. However, when it comes to longer formats like movies, they often fall short. The main hurdles are the lack of high-quality, diverse video data and the intensive work required to collect or annotate such data. In face of these chal… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  27. arXiv:2402.19160  [pdf, other

    cs.CV

    Effective Message Hiding with Order-Preserving Mechanisms

    Authors: Gao Yu, Qiu Xuchong, Ye Zihan

    Abstract: Message hiding, a technique that conceals secret message bits within a cover image, aims to achieve an optimal balance among message capacity, recovery accuracy, and imperceptibility. While convolutional neural networks have notably improved message capacity and imperceptibility, achieving high recovery accuracy remains challenging. This challenge arises because convolutional operations struggle t… ▽ More

    Submitted 17 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 7 Pages

  28. arXiv:2402.16294  [pdf, other

    cs.CR cs.AI

    Decentralized Federated Unlearning on Blockchain

    Authors: Xiao Liu, Mingyuan Li, Xu Wang, Guangsheng Yu, Wei Ni, Lixiang Li, Haipeng Peng, Ren** Liu

    Abstract: Blockchained Federated Learning (FL) has been gaining traction for ensuring the integrity and traceability of FL processes. Blockchained FL involves participants training models locally with their data and subsequently publishing the models on the blockchain, forming a Directed Acyclic Graph (DAG)-like inheritance structure that represents the model relationship. However, this particular DAG-based… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  29. arXiv:2402.16058  [pdf, other

    cs.CL

    Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

    Authors: Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yukun Yan, Shuo Wang, Ge Yu

    Abstract: Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions, a process that incurs extra costs during inference. In this paper, we propose the Gist COnditioned deCOding (Gist-COCO) model, introducing a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decode… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  30. arXiv:2402.14652  [pdf, other

    cs.CL

    Cleaner Pretraining Corpus Curation with Neural Web Scra**

    Authors: Zhipeng Xu, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Ge Yu, Chenyan Xiong

    Abstract: The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans. Through meticulous data collection, preprocessing, and curation, webpages can be used as a fundamental data resource for language model pretraining. However, when confronted with the progressively revolutionized and intricate nature of webpages, rule-based/feature-based web scrapers… ▽ More

    Submitted 14 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  31. arXiv:2402.13547  [pdf, other

    cs.CL

    ActiveRAG: Revealing the Treasures of Knowledge via Active Learning

    Authors: Zhipeng Xu, Zhenghao Liu, Yibin Liu, Chenyan Xiong, Yukun Yan, Shuo Wang, Shi Yu, Zhiyuan Liu, Ge Yu

    Abstract: Retrieval Augmented Generation (RAG) has introduced a new paradigm for Large Language Models (LLMs), aiding in the resolution of knowledge-intensive tasks. However, current RAG models position LLMs as passive knowledge receptors, thereby restricting their capacity for learning and comprehending external knowledge. In this paper, we present ActiveRAG, an innovative RAG framework that shifts from pa… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  32. arXiv:2402.12694  [pdf, other

    cs.LG

    Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

    Authors: Guoqi Yu, **g Zou, Xiaowei Hu, Angelica I. Aviles-Rivero, **g Qin, Shujun Wang

    Abstract: Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introd… ▽ More

    Submitted 1 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  33. arXiv:2402.06971  [pdf, other

    cs.LG

    In-Context Data Distillation with TabPFN

    Authors: Junwei Ma, Valentin Thomas, Guangwei Yu, Anthony Caterini

    Abstract: Foundation models have revolutionized tasks in computer vision and natural language processing. However, in the realm of tabular data, tree-based models like XGBoost continue to dominate. TabPFN, a transformer model tailored for tabular data, mirrors recent foundation models in its exceptional in-context learning capability, being competitive with XGBoost's performance without the need for task-sp… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  34. arXiv:2402.06459  [pdf, other

    cs.GT cs.CE cs.CR cs.CY econ.GN

    Maximizing NFT Incentives: References Make You Rich

    Authors: Guangsheng Yu, Qin Wang, Caijun Sun, Lam Duc Nguyen, H. M. N. Dilum Bandara, Shi** Chen

    Abstract: In this paper, we study how to optimize existing Non-Fungible Token (NFT) incentives. Upon exploring a large number of NFT-related standards and real-world projects, we come across an unexpected finding. That is, the current NFT incentive mechanisms, often organized in an isolated and one-time-use fashion, tend to overlook their potential for scalable organizational structures. We propose, analy… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  35. arXiv:2402.03804  [pdf, other

    cs.LG cs.AI

    ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

    Authors: Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun

    Abstract: Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skip** the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of sparse LLMs beyond zero activation values. We introduce a general method that defines neuron acti… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  36. arXiv:2402.01808  [pdf, other

    cs.SD eess.AS

    KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

    Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

    Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

  37. arXiv:2401.17577  [pdf, other

    cs.IT eess.SP

    Robustness in Wireless Distributed Learning: An Information-Theoretic Analysis

    Authors: Yangshuo He, Guanding Yu

    Abstract: In this paper, we take an information-theoretic approach to understand the robustness in wireless distributed learning. Upon measuring the difference in loss functions, we provide an upper bound of the performance deterioration due to imperfect wireless channels. Moreover, we characterize the transmission rate under task performance guarantees and propose the channel capacity gain resulting from t… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  38. arXiv:2401.16760  [pdf, other

    cs.LG

    One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training

    Authors: Lianbo Ma, Yuee Zhou, Jianlun Ma, Guo Yu, Qing Li

    Abstract: Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources. Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient. However, we discover that the gradient error will lead to an unexpected zig-zagging-like issue in the gradient descent learning procedures,… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 9 pages, 13 figures,accepted by AAAI-24

  39. arXiv:2401.15371   

    cs.CL

    LegalDuet: Learning Effective Representations for Legal Judgment Prediction through a Dual-View Legal Clue Reasoning

    Authors: Pengjie Liu, Zhenghao Liu, Xiaoyuan Yi, Liner Yang, Shuo Wang, Yu Gu, Ge Yu, Xing Xie, Shuang-hua Yang

    Abstract: Most existing Legal Judgment Prediction (LJP) models focus on discovering the legal triggers in the criminal fact description. However, in real-world scenarios, a professional judge not only needs to assimilate the law case experience that thrives on past sentenced legal judgments but also depends on the professional legal grounded reasoning that learned from professional legal knowledge. In this… ▽ More

    Submitted 27 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: we will update this paper and revise this paper in the near future

  40. arXiv:2401.11471  [pdf, other

    cs.DC cs.AI

    LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction

    Authors: Zhigang Wang, Hangyu Yang, Ning Wang, Chuanfei Xu, Jie Nie, Zhiqiang Wei, Yu Gu, Ge Yu

    Abstract: In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially when processing high-dimension inputs with a big batch size. That poses great challenges to the limited memory capacity of current accelerators (e.g., GPUs). Exi… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  41. arXiv:2401.11469  [pdf, other

    cs.DC

    Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control

    Authors: Zhigang Wang, Xu Zhang, Ning Wang, Chuanfei Xu, Jie Nie, Zhiqiang Wei, Yu Gu, Ge Yu

    Abstract: Transformer-based models are becoming deeper and larger recently. For better scalability, an underlying training solution in industry is to split billions of parameters (tensors) into many tasks and then run them across homogeneous accelerators (e.g., GPUs). However, such dedicated compute cluster is prohibitively expensive in academia and moderate companies. An economic replacement is to aggregat… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 13 pages

  42. arXiv:2401.10274  [pdf, ps, other

    cs.NE cs.AI

    Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil Scheduling

    Authors: Wanting Zhang, Wei Du, Guo Yu, Renchu He, Wenli Du, Yaochu **

    Abstract: With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  43. arXiv:2401.00632  [pdf, other

    cs.CR

    TBDD: A New Trust-based, DRL-driven Framework for Blockchain Sharding in IoT

    Authors: Zixu Zhang, Guangsheng Yu, Caijun Sun, Xu Wang, Ying Wang, Ming Zhang, Wei Ni, Ren ** Liu, Andrew Reeves, Nektarios Georgalas

    Abstract: Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adep… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  44. arXiv:2312.13913  [pdf, other

    cs.CV

    Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

    Authors: Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu

    Abstract: This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within mod… ▽ More

    Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project Website: https://github.com/OpenTexture/Paint3D

  45. arXiv:2312.13771  [pdf, other

    cs.CV

    AppAgent: Multimodal Agents as Smartphone Users

    Authors: Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

    Abstract: Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tap**… ▽ More

    Submitted 21 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project Page is https://appagent-official.github.io/

  46. arXiv:2312.13722  [pdf, other

    cs.SD eess.AS

    BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

    Authors: Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu

    Abstract: Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024

  47. arXiv:2312.10763  [pdf, other

    cs.CV

    M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

    Authors: Mingsheng Li, Xin Chen, Chi Zhang, Si** Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

    Abstract: Recently, 3D understanding has become popular to facilitate autonomous agents to perform further decisionmaking. However, existing 3D datasets and methods are often limited to specific tasks. On the other hand, recent progress in Large Language Models (LLMs) and Multimodal Language Models (MLMs) have demonstrated exceptional general language and imagery tasking performance. Therefore, it is intere… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  48. arXiv:2312.10144  [pdf, other

    cs.LG cs.AI cs.CV

    Data-Efficient Multimodal Fusion on a Single GPU

    Authors: Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti, Valentin Villecroze, Jesse C. Cresswell, Guangwei Yu, Gabriel Loaiza-Ganem, Maksims Volkovs

    Abstract: The goal of multimodal alignment is to learn a single latent space that is shared between multimodal inputs. The most powerful models in this space have been trained using massive datasets of paired inputs and large-scale computational resources, making them prohibitively expensive to train in many practical scenarios. We surmise that existing unimodal encoders pre-trained on large amounts of unim… ▽ More

    Submitted 10 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 (Highlight)

  49. arXiv:2312.08727  [pdf, other

    cs.IR

    Calibration-compatible Listwise Distillation of Privileged Features for CTR Prediction

    Authors: Xiaoqiang Gui, Yueyao Cheng, Xiang-Rong Sheng, Yunfeng Zhao, Guoxian Yu, Shuguang Han, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: In machine learning systems, privileged features refer to the features that are available during offline training but inaccessible for online serving. Previous studies have recognized the importance of privileged features and explored ways to tackle online-offline discrepancies. A typical practice is privileged features distillation (PFD): train a teacher model using all features (including privil… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by WSDM'24

  50. arXiv:2312.05551  [pdf, other

    cs.LG

    Multi-dimensional Fair Federated Learning

    Authors: Cong Su, Guoxian Yu, Jun Wang, Hui Li, Qingzhong Li, Han Yu

    Abstract: Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitab… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: Accepted by the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI2024)