Skip to main content

Showing 1–50 of 252 results for author: Yao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01511  [pdf, other

    cs.AI

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

    Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00063  [pdf, other

    cs.IR cs.AI cs.LG

    An Interpretable Alternative to Neural Representation Learning for Rating Prediction -- Transparent Latent Class Modeling of User Reviews

    Authors: Giuseppe Serra, Peter Tino, Zhao Xu, Xin Yao

    Abstract: Nowadays, neural network (NN) and deep learning (DL) techniques are widely adopted in many applications, including recommender systems. Given the sparse and stochastic nature of collaborative filtering (CF) data, recent works have critically analyzed the effective improvement of neural-based approaches compared to simpler and often transparent algorithms for recommendation. Previous results showed… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  3. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Ze** Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  4. arXiv:2406.15373  [pdf, other

    cs.CY cs.AI econ.GN

    Occupation Life Cycle

    Authors: Lan Chen, Yufei Ji, Xichen Yao, Hengshu Zhu

    Abstract: This paper explores the evolution of occupations within the context of industry and technology life cycles, highlighting the critical yet underexplored intersection between occupational trends and broader economic dynamics. Introducing the Occupation Life Cycle (OLC) model, we delineate five stages (i.e., growth, peak, fluctuation, maturity, and decline) to systematically explore the trajectory of… ▽ More

    Submitted 14 April, 2024; originally announced June 2024.

  5. arXiv:2406.14977  [pdf, other

    cs.AI eess.IV

    Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

    Authors: Shan Cong, Zhoujie Fan, Hongwei Liu, Yinghan Zhang, Xin Wang, Haoran Luo, Xiaohui Yao

    Abstract: Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.03768  [pdf, other

    cs.LG cs.AI

    Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

    Authors: Xinhao Yao, Xiaolin Hu, Shenzhi Yang, Yong Liu

    Abstract: Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities. With a few demonstration input-label pairs, they can predict the label for an unseen input without any parameter updates. In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance, and more surprising, pruning weights in deep la… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  8. arXiv:2405.18884  [pdf

    cs.NE

    Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization

    Authors: Shengcai Liu, Zhiyuan Wang, Yew-Soon Ong, Xin Yao, Ke Tang

    Abstract: Real-world applications involve various discrete optimization problems. Designing a specialized optimizer for each of these problems is challenging, typically requiring significant domain knowledge and human efforts. Hence, develo** general-purpose optimizers as an off-the-shelf tool for a wide range of problems has been a long-standing research target. This article introduces MEGO, a novel gene… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 34 pages, 6 figures

  9. arXiv:2405.16283  [pdf, other

    cs.DC

    TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload

    Authors: Zhimin Ding, Jiawen Yao, Brianna Barrow, Tania Lorido Botran, Christopher Jermaine, Yuxin Tang, Jiehui Li, Xinyu Yao, Sleem Mahmoud Abdelghafar, Daniel Bourgeois

    Abstract: An obvious way to alleviate memory difficulties in GPU-based AI computing is via CPU offload, where data are moved between GPU and CPU RAM, so inexpensive CPU RAM is used to increase the amount of storage available. While CPU offload is an obvious idea, it can greatly slow down a computation, due to the relatively slow transfer rate between CPU RAM and GPU RAM. Thus, any system for CPU offload nee… ▽ More

    Submitted 27 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  10. arXiv:2405.13255  [pdf, other

    cs.IT

    Low-Complexity PSCL Decoding of Polar Codes

    Authors: Xinyuanmeng Yao, Xiao Ma

    Abstract: Successive cancellation list (SCL) decoding enables polar codes and their generalizations to deliver satisfactory performance in finite-length scenarios but it comes with high latency and complexity. To reduce latency, a partitioned SCL (PSCL) decoding algorithm, implemented over a PSCL decoding tree, can be utilized. In this work, we aim to lower down the complexity of the PSCL decoding, resultin… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 11 pages, 19 figures

  11. arXiv:2405.03998  [pdf, other

    cs.HC cs.CL

    Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches

    Authors: Chen Zhu-Tian, Zeyu Xiong, Xiaoshuo Yao, Elena Glassman

    Abstract: Crafting effective prompts for code generation or editing with Large Language Models (LLMs) is not an easy task. Particularly, the absence of immediate, stable feedback during prompt crafting hinders effective interaction, as users are left to mentally imagine possible outcomes until the code is generated. In response, we introduce Language-Oriented Code Sketching, an interactive approach that pro… ▽ More

    Submitted 10 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 4 pages

  12. arXiv:2405.02801  [pdf, other

    cs.SD cs.AI eess.AS

    Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

    Authors: Tianze Xu, Jiajun Li, Xuesong Chen, Xinrui Yao, Shuchang Liu

    Abstract: In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the c… ▽ More

    Submitted 7 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 7 pages, 2 figures, submitted to ACM MM 2024

  13. arXiv:2404.16067  [pdf, other

    cs.HC cs.AI

    Layout2Rendering: AI-aided Greenspace design

    Authors: Ran Chen, Zeke Lian, Yueheng He, Xiao Ling, Fuyu Yang, Xueqi Yao, Xingjian Yi, **g Zhao

    Abstract: In traditional human living environment landscape design, the establishment of three-dimensional models is an essential step for designers to intuitively present the spatial relationships of design elements, as well as a foundation for conducting landscape analysis on the site. Rapidly and effectively generating beautiful and realistic landscape spaces is a significant challenge faced by designers… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 14 pages,8 figures

  14. arXiv:2404.15899  [pdf, other

    cs.LG cs.AI

    ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

    Authors: Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao

    Abstract: Accurate traffic flow prediction is crucial for optimizing traffic management, enhancing road safety, and reducing environmental impacts. Existing models face challenges with long sequence data, requiring substantial memory and computational resources, and often suffer from slow inference times due to the lack of a unified summary state. This paper introduces ST-MambaSync, an innovative traffic fl… ▽ More

    Submitted 9 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257

    MSC Class: 53A45 ACM Class: I.2.0

  15. arXiv:2404.14763  [pdf, other

    cs.NE cs.AI

    Evolutionary Reinforcement Learning via Cooperative Coevolution

    Authors: Chengpeng Hu, Jialin Liu, Xin Yao

    Abstract: Recently, evolutionary reinforcement learning has obtained much attention in various domains. Maintaining a population of actors, evolutionary reinforcement learning utilises the collected experiences to improve the behaviour policy through efficient exploration. However, the poor scalability of genetic operators limits the efficiency of optimising high-dimensional neural networks. To address this… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  16. arXiv:2404.01877  [pdf, other

    cs.LG

    Procedural Fairness in Machine Learning

    Authors: Ziming Wang, Changwu Huang, Xin Yao

    Abstract: Fairness in machine learning (ML) has received much attention. However, existing studies have mainly focused on the distributive fairness of ML models. The other dimension of fairness, i.e., procedural fairness, has been neglected. In this paper, we first define the procedural fairness of ML models, and then give formal definitions of individual and group procedural fairness. We propose a novel me… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 14 pages

  17. arXiv:2404.00399  [pdf, other

    cs.CL cs.AI cs.LG

    Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

    Authors: Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak , et al. (20 additional authors not shown)

    Abstract: Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where… ▽ More

    Submitted 23 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Preprint

  18. arXiv:2403.17753  [pdf, other

    cs.LG

    CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model

    Authors: Zhiqi Shao, Michael G. H. Bell, Ze Wang, D. Glenn Geers, Xusheng Yao, Junbin Gao

    Abstract: Accurate, and effective traffic forecasting is vital for smart traffic systems, crucial in urban traffic planning and management. Current Spatio-Temporal Transformer models, despite their prediction capabilities, struggle with balancing computational efficiency and accuracy, favoring global over local information, and handling spatial and temporal data separately, limiting insight into complex int… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 18 pages

    ACM Class: I.2.0

  19. arXiv:2403.15434  [pdf, other

    cs.CL cs.AI

    ChatPattern: Layout Pattern Customization via Natural Language

    Authors: Zixiao Wang, Yunheng Shen, Xufeng Yao, Wenqian Zhao, Yang Bai, Farzan Farnia, Bei Yu

    Abstract: Existing works focus on fixed-size layout pattern generation, while the more practical free-size pattern generation receives limited attention. In this paper, we propose ChatPattern, a novel Large-Language-Model (LLM) powered framework for flexible pattern customization. ChatPattern utilizes a two-part system featuring an expert LLM agent and a highly controllable layout pattern generator. The LLM… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by DAC24

  20. arXiv:2403.13349  [pdf, other

    cs.LG cs.CV

    Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection

    Authors: Xincheng Yao, Ruoqi Li, Zefeng Qian, Lu Wang, Chongyang Zhang

    Abstract: Unified anomaly detection (AD) is one of the most challenges for anomaly detection, where one unified model is trained with normal samples from multiple classes with the objective to detect anomalies in these classes. For such a challenging task, popular normalizing flow (NF) based AD methods may fall into a "homogeneous map**" issue,where the NF-based AD models are biased to generate similar la… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 15 pages

  21. arXiv:2403.11788  [pdf, other

    cs.RO

    Locomotion Generation for a Rat Robot based on Environmental Changes via Reinforcement Learning

    Authors: Xinhui Shan, Yuhong Huang, Zhenshan Bing, Zitao Zhang, Xiangtong Yao, Kai Huang, Alois Knoll

    Abstract: This research focuses on develo** reinforcement learning approaches for the locomotion generation of small-size quadruped robots. The rat robot NeRmo is employed as the experimental platform. Due to the constrained volume, small-size quadruped robots typically possess fewer and weaker sensors, resulting in difficulty in accurately perceiving and responding to environmental changes. In this conte… ▽ More

    Submitted 14 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  22. arXiv:2403.11671  [pdf, other

    cs.AR cs.AI cs.CE cs.LG cs.SE

    HDLdebugger: Streamlining HDL debugging with Large Language Models

    Authors: Xufeng Yao, Haoyang Li, Tsz Ho Chan, Wenyi Xiao, Mingxuan Yuan, Yu Huang, Lei Chen, Bei Yu

    Abstract: In the domain of chip design, Hardware Description Languages (HDLs) play a pivotal role. However, due to the complex syntax of HDLs and the limited availability of online resources, debugging HDL codes remains a difficult and time-intensive task, even for seasoned engineers. Consequently, there is a pressing need to develop automated HDL code debugging models, which can alleviate the burden on har… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 13 pages,5 figures

  23. Channel Estimation for Stacked Intelligent Metasurface-Assisted Wireless Networks

    Authors: Xianghao Yao, Jiancheng An, Lu Gan, Marco Di Renzo, Chau Yuen

    Abstract: Emerging technologies, such as holographic multiple-input multiple-output (HMIMO) and stacked intelligent metasurface (SIM), are driving the development of wireless communication systems. Specifically, the SIM is physically constructed by stacking multiple layers of metasurfaces and has an architecture similar to an artificial neural network (ANN), which can flexibly manipulate the electromagnetic… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 13 pages, 3 figures, accepted by IEEE WCL

  24. arXiv:2403.03172  [pdf, other

    cs.AI cs.LG

    Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination

    Authors: Liangzhou Wang, Kaiwen Zhu, Fengming Zhu, Xinghu Yao, Shujie Zhang, Deheng Ye, Haobo Fu, Qiang Fu, Wei Yang

    Abstract: Reaching consensus is key to multi-agent coordination. To accomplish a cooperative task, agents need to coherently select optimal joint actions to maximize the team reward. However, current cooperative multi-agent reinforcement learning (MARL) methods usually do not explicitly take consensus into consideration, which may cause miscoordination problem. In this paper, we propose a model-based consen… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  25. arXiv:2403.01851  [pdf, other

    cs.CL cs.AI

    Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral

    Authors: Yiming Cui, Xin Yao

    Abstract: Mixtral, a representative sparse mixture of experts (SMoE) language model, has received significant attention due to its unique model design and superior performance. Based on Mixtral-8x7B-v0.1, in this paper, we propose Chinese-Mixtral and Chinese-Mixtral-Instruct with improved Chinese language abilities by adopting further pre-training and instruction fine-tuning. Experimental results show that… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 13 pages

  26. arXiv:2402.18946  [pdf, other

    cs.LG eess.SY

    Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

    Authors: Yu Zhang, Long Wen, Xiangtong Yao, Zhenshan Bing, Linghuan Kong, Wei He, Alois Knoll

    Abstract: This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Sub… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  27. arXiv:2402.16449  [pdf, other

    cs.RO cs.AI

    Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments

    Authors: Yu Zhang, Guangyao Tian, Long Wen, Xiangtong Yao, Liding Zhang, Zhenshan Bing, Wei He, Alois Knoll

    Abstract: This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles. This framework addresses two significant challenges associated with traditional dynamic control barrier functions (D-CBFs): their online construction and the diminished real-time performance caused… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  28. arXiv:2402.15731  [pdf, ps, other

    cs.LG cs.NE

    Clustering in Dynamic Environments: A Framework for Benchmark Dataset Generation With Heterogeneous Changes

    Authors: Danial Yazdani, Juergen Branke, Mohammad Sadegh Khorshidi, Mohammad Nabi Omidvar, Xiaodong Li, Amir H. Gandomi, Xin Yao

    Abstract: Clustering in dynamic environments is of increasing importance, with broad applications ranging from real-time data analysis and online unsupervised learning to dynamic facility location problems. While meta-heuristics have shown promising effectiveness in static clustering tasks, their application for tracking optimal clustering solutions or robust clustering over time in dynamic environments rem… ▽ More

    Submitted 9 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  29. arXiv:2402.13145  [pdf, other

    cs.CL cs.AI

    CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

    Authors: Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu

    Abstract: Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and con… ▽ More

    Submitted 20 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  30. GI-PIP: Do We Require Impractical Auxiliary Dataset for Gradient Inversion Attacks?

    Authors: Yu Sun, Gaojian Xiong, Xianxun Yao, Kailang Ma, Jian Cui

    Abstract: Deep gradient inversion attacks expose a serious threat to Federated Learning (FL) by accurately recovering private data from shared gradients. However, the state-of-the-art heavily relies on impractical assumptions to access excessive auxiliary data, which violates the basic data partitioning principle of FL. In this paper, a novel method, Gradient Inversion Attack using Practical Image Prior (GI… ▽ More

    Submitted 1 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  31. arXiv:2312.17642  [pdf

    cs.AI cs.CL cs.CV cs.SI

    Research on the Laws of Multimodal Perception and Cognition from a Cross-cultural Perspective -- Taking Overseas Chinese Gardens as an Example

    Authors: Ran Chen, Xueqi Yao, **g Zhao, Shuhan Xu, Sirui Zhang, Yijun Mao

    Abstract: This study aims to explore the complex relationship between perceptual and cognitive interactions in multimodal data analysis,with a specific emphasis on spatial experience design in overseas Chinese gardens. It is found that evaluation content and images on social media can reflect individuals' concerns and sentiment responses, providing a rich data base for cognitive research that contains both… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: 16 figures,1 table

  32. arXiv:2312.10807  [pdf, other

    cs.RO

    Language-conditioned Learning for Robotic Manipulation: A Survey

    Authors: Hongkuan Zhou, Xiangtong Yao, Yuan Meng, Siming Sun, Zhenshan Bing, Kai Huang, Alois Knoll

    Abstract: Language-conditioned robotic manipulation represents a cutting-edge area of research, enabling seamless communication and cooperation between humans and robotic agents. This field focuses on teaching robotic systems to comprehend and execute instructions conveyed in natural language. To achieve this, the development of robust language understanding models capable of extracting actionable insights… ▽ More

    Submitted 3 February, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

  33. arXiv:2312.10674  [pdf

    cs.CV

    A Framework of Full-Process Generation Design for Park Green Spaces Based on Remote Sensing Segmentation-GAN-Diffusion

    Authors: Ran Chen, Xingjian Yi, **g Zhao, Yueheng He, Bainian Chen, Xueqi Yao, Fangjun Liu, Haoran Li, Zeke Lian

    Abstract: The development of generative design driven by artificial intelligence algorithms is speedy. There are two research gaps in the current research: 1) Most studies only focus on the relationship between design elements and pay little attention to the external information of the site; 2) GAN and other traditional generative algorithms generate results with low resolution and insufficient details. To… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  34. arXiv:2312.10613  [pdf, other

    cs.CV

    p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

    Authors: Haoyuan Wu, Xinyun Zhang, Peng Xu, Peiyu Liao, Xufeng Yao, Bei Yu

    Abstract: Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-traine… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24. The first two authors contributed equally to this paper

  35. arXiv:2312.05215  [pdf, other

    cs.DC cs.LG

    DeltaZip: Multi-Tenant Language Model Serving via Delta Compression

    Authors: Xiaozhe Yao, Ana Klimovic

    Abstract: Fine-tuning large language models (LLMs) for downstream tasks can greatly improve model quality, however serving many different fine-tuned LLMs concurrently for users in multi-tenant environments is challenging. Dedicating GPU memory for each model is prohibitively expensive and naively swap** large model weights in and out of GPU memory is slow. Our key insight is that fine-tuned models can be… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  36. arXiv:2312.03292  [pdf, other

    cs.LG cs.MA q-bio.QM

    Enhancing Molecular Property Prediction via Mixture of Collaborative Experts

    Authors: Xu Yao, Shuang Liang, Songqiao Han, Hailiang Huang

    Abstract: Molecular Property Prediction (MPP) task involves predicting biochemical properties based on molecular features, such as molecular graph structures, contributing to the discovery of lead compounds in drug development. To address data scarcity and imbalance in MPP, some studies have adopted Graph Neural Networks (GNN) as an encoder to extract commonalities from molecular graphs. However, these appr… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 11 pages, 8 figures

  37. arXiv:2311.13209  [pdf, other

    cs.CV

    Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

    Authors: Junyu Gao, Xuan Yao, Changsheng Xu

    Abstract: The ability to accurately comprehend natural language instructions and navigate to the target location is essential for an embodied agent. Such agents are typically required to execute user instructions in an online manner, leading us to explore the use of unlabeled test samples for effective online model adaptation. However, for online Vision-and-Language Navigation (VLN), due to the intrinsic na… ▽ More

    Submitted 19 May, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  38. arXiv:2311.13028  [pdf, other

    cs.LG cs.AI cs.DC eess.SP

    DMLR: Data-centric Machine Learning Research -- Past, Present and Future

    Authors: Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš , et al. (13 additional authors not shown)

    Abstract: Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods tow… ▽ More

    Submitted 1 June, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Published in the Journal of Data-centric Machine Learning Research (DMLR) at https://data.mlr.press/assets/pdf/v01-5.pdf

  39. arXiv:2311.11514  [pdf, other

    cs.DC

    HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

    Authors: Youhe Jiang, Ran Yan, Xiaozhe Yao, Yang Zhou, Beidi Chen, Binhang Yuan

    Abstract: Serving generative inference of the large language model is a crucial component of contemporary AI applications. This paper focuses on deploying such services in a heterogeneous and cross-datacenter setting to mitigate the substantial inference costs typically associated with a single centralized datacenter. Towards this end, we propose HexGen, a flexible distributed inference engine that uniquely… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

    Comments: Accepted by ICML 2024

  40. arXiv:2311.10502  [pdf, other

    cs.NE

    Fast Estimations of Hitting Time of Elitist Evolutionary Algorithms from Fitness Levels

    Authors: Jun He, Siang Yew Chong, Xin Yao

    Abstract: The fitness level method is an easy-to-use tool for estimating the hitting time of elitist evolutionary algorithms. Recently, linear lower and upper bounds by fitness levels have been constructed. But these bounds require recursive computation, which makes them difficult to use in practice. We address this shortcoming with a new directed graph (digraph) method that does not require recursive compu… ▽ More

    Submitted 16 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  41. arXiv:2311.01252  [pdf, other

    cs.LG

    Sanitized Clustering against Confounding Bias

    Authors: Yinghua Yao, Yuangang Pan, **g Li, Ivor W. Tsang, Xin Yao

    Abstract: Real-world datasets inevitably contain biases that arise from different sources or conditions during data collection. Consequently, such inconsistency itself acts as a confounding factor that disturbs the cluster analysis. Existing methods eliminate the biases by projecting data onto the orthogonal complement of the subspace expanded by the confounding factor before clustering. Therein, the intere… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Machine Learning, in press

  42. arXiv:2310.05692  [pdf, other

    cs.AI

    Based on What We Can Control Artificial Neural Networks

    Authors: Cheng Kang, Xu**g Yao

    Abstract: How can the stability and efficiency of Artificial Neural Networks (ANNs) be ensured through a systematic analysis method? This paper seeks to address that query. While numerous factors can influence the learning process of ANNs, utilizing knowledge from control systems allows us to analyze its system function and simulate system responses. Although the complexity of most ANNs is extremely high, w… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 23 pages,

  43. arXiv:2309.16338  [pdf, other

    cs.LG

    EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect

    Authors: Jiashi Gao, Changwu Huang, Ming Tang, Shin Hwei Tan, Xin Yao, Xuetao Wei

    Abstract: Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accur… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  44. arXiv:2309.15697  [pdf, other

    cs.CV eess.IV

    Physics Inspired Hybrid Attention for SAR Target Recognition

    Authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han

    Abstract: There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly dep… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  45. arXiv:2309.11160  [pdf, other

    cs.CV

    Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

    Authors: Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan

    Abstract: Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained tem… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  46. arXiv:2309.10987  [pdf, other

    cs.NE cs.AI cs.CV

    SpikingNeRF: Making Bio-inspired Neural Networks See through the Real World

    Authors: Xingting Yao, Qinghao Hu, Tielong Liu, Zitao Mo, Zeyu Zhu, Zhengyang Zhuge, Jian Cheng

    Abstract: Spiking neural networks (SNNs) have been thriving on numerous tasks to leverage their promising energy efficiency and exploit their potentialities as biologically plausible intelligence. Meanwhile, the Neural Radiance Fields (NeRF) render high-quality 3D scenes with massive energy consumption, but few works delve into the energy-saving solution with a bio-inspired approach. In this paper, we propo… ▽ More

    Submitted 13 November, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  47. arXiv:2309.01286  [pdf, other

    cs.CV

    MAP: Domain Generalization via Meta-Learning on Anatomy-Consistent Pseudo-Modalities

    Authors: Dewei Hu, Hao Li, Han Liu, Xing Yao, Jiacheng Wang, Ipek Oguz

    Abstract: Deep models suffer from limited generalization capability to unseen domains, which has severely hindered their clinical applicability. Specifically for the retinal vessel segmentation task, although the model is supposed to learn the anatomy of the target, it can be distracted by confounding factors like intensity and contrast. We propose Meta learning on Anatomy-consistent Pseudo-modalities (MAP)… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  48. arXiv:2308.13906  [pdf, other

    eess.SP cs.LG

    A Two-Dimensional Deep Network for RF-based Drone Detection and Identification Towards Secure Coverage Extension

    Authors: Zixiao Zhao, Qinghe Du, Xiang Yao, Lei Lu, Shijiao Zhang

    Abstract: As drones become increasingly prevalent in human life, they also raises security concerns such as unauthorized access and control, as well as collisions and interference with manned aircraft. Therefore, ensuring the ability to accurately detect and identify between different drones holds significant implications for coverage extension. Assisted by machine learning, radio frequency (RF) detection c… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  49. arXiv:2308.12644  [pdf, other

    cs.NE cs.MS

    Evolutionary Dynamic Optimization Laboratory: A MATLAB Optimization Platform for Education and Experimentation in Dynamic Environments

    Authors: Mai Peng, Zeneng She, Delaram Yazdani, Danial Yazdani, Wenjian Luo, Changhe Li, Juergen Branke, Trung Thanh Nguyen, Amir H. Gandomi, Yaochu **, Xin Yao

    Abstract: Many real-world optimization problems possess dynamic characteristics. Evolutionary dynamic optimization algorithms (EDOAs) aim to tackle the challenges associated with dynamic optimization problems. Looking at the existing works, the results reported for a given EDOA can sometimes be considerably different. This issue occurs because the source codes of many EDOAs, which are usually very complex a… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: This work was submitted to ACM Transactions on Mathematical Software on December 7, 2022

  50. arXiv:2308.11774  [pdf

    cs.CV

    SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)

    Authors: Ange Lou, Yamin Li, Xing Yao, Yike Zhang, Jack Noble

    Abstract: The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D positio… ▽ More

    Submitted 5 February, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted by SPIE 2024