Skip to main content

Showing 1–29 of 29 results for author: Sha, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, **long Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  2. arXiv:2406.19614  [pdf, other

    cs.LG cs.AI

    A Survey on Data Quality Dimensions and Tools for Machine Learning

    Authors: Yuhan Zhou, Fengjiao Tu, Kewei Sha, Junhua Ding, Haihua Chen

    Abstract: Machine learning (ML) technologies have become substantial in practically all aspects of our society, and data quality (DQ) is critical for the performance, fairness, robustness, safety, and scalability of ML models. With the large and complex data in data-centric AI, traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges, highlighting the importance of… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by The 6th IEEE International Conference on Artificial Intelligence Testing (IEEE AITest 2024) as an invited paper

  3. arXiv:2406.16968  [pdf, other

    cs.LG cs.AI

    Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

    Authors: Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen

    Abstract: Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  4. arXiv:2405.18849  [pdf, other

    cs.CV

    SFANet: Spatial-Frequency Attention Network for Weather Forecasting

    Authors: Jiaze Wang, Hao Chen, Hongcan Xu, **peng Li, Bowen Wang, Kun Shao, Furui Liu, Huaxi Chen, Guangyong Chen, Pheng-Ann Heng

    Abstract: Weather forecasting plays a critical role in various sectors, driving decision-making and risk management. However, traditional methods often struggle to capture the complex dynamics of meteorological systems, particularly in the presence of high-resolution data. In this paper, we propose the Spatial-Frequency Attention Network (SFANet), a novel deep learning framework designed to address these ch… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2404.11116  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

    Authors: Keren Shao, Ke Chen, Shlomo Dubnov

    Abstract: In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Dis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 2 pages, 2 figures, 1 tables, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

  6. arXiv:2402.06570  [pdf, other

    cs.LG cs.RO

    Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

    Authors: Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson

    Abstract: Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good per… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  7. arXiv:2312.14878  [pdf, other

    cs.AI cs.LG

    Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

    Authors: Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, **gxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang

    Abstract: A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: paper and appendix, 27 pages

  8. arXiv:2312.11063  [pdf, ps, other

    cs.GT cs.AI cs.DS cs.LG econ.TH

    A survey on algorithms for Nash equilibria in finite normal-form games

    Authors: Hanyu Li, Wenhan Huang, Zhijian Duan, David Henry Mguni, Kun Shao, Jun Wang, Xiaotie Deng

    Abstract: Nash equilibrium is one of the most influential solution concepts in game theory. With the development of computer science and artificial intelligence, there is an increasing demand on Nash equilibrium computation, especially for Internet economics and multi-agent learning. This paper reviews various algorithms computing the Nash equilibrium and its approximation solutions in finite normal-form ga… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: The published version is in Computer Science Review

  9. arXiv:2311.16082  [pdf, other

    quant-ph cs.AI cs.AR cs.ET cs.LG

    Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers

    Authors: Hanrui Wang, Pengyu Liu, Kevin Shao, Dantong Li, Jiaqi Gu, David Z. Pan, Yongshan Ding, Song Han

    Abstract: Quantum computing has the potential to solve problems that are intractable for classical systems, yet the high error rates in contemporary quantum devices often exceed tolerable limits for useful algorithm execution. Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their stat… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to ICCAD 2023, FAST ML for Science Workshop; 7 pages, 8 figures

  10. arXiv:2308.02723  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

    Authors: Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonic… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

  11. arXiv:2306.09200  [pdf, other

    cs.LG cs.AI

    ChessGPT: Bridging Policy Learning and Language Modeling

    Authors: Xidong Feng, Yicheng Luo, Ziyan Wang, Hongrui Tang, Mengyue Yang, Kun Shao, David Mguni, Yali Du, Jun Wang

    Abstract: When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use his… ▽ More

    Submitted 21 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Published as a conference article in NeurIPS 2023

  12. DropDim: A Regularization Method for Transformer Networks

    Authors: Hao Zhang, Dan Qu, Keji Shao, Xukui Yang

    Abstract: We introduceDropDim, a structured dropout method designed for regularizing the self-attention mechanism, which is a key component of the transformer. In contrast to the general dropout method, which randomly drops neurons, DropDim drops part of the embedding dimensions. In this way, the semantic information can be completely discarded. Thus, the excessive coadapting between different embedding dim… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Journal ref: IEEE SIGNAL PROCESSING LETTERS, VOL. 29, 2022

  13. arXiv:2303.06697  [pdf, other

    cs.CV

    Traj-MAE: Masked Autoencoders for Trajectory Prediction

    Authors: Hao Chen, Jiaze Wang, Kun Shao, Furui Liu, Jianye Hao, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

    Abstract: Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers. One key issue is to generate consistent trajectory predictions without colliding. To overcome the challenge, we propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environm… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  14. arXiv:2212.07648  [pdf, other

    cs.CV cs.AI

    Relightable Neural Human Assets from Multi-view Gradient Illuminations

    Authors: Taotao Zhou, Kai He, Di Wu, Teng Xu, Qixuan Zhang, Kuixiang Shao, Wenzheng Chen, Lan Xu, **gyi Yu

    Abstract: Human modeling and relighting are two fundamental problems in computer vision and graphics, where high-quality datasets can largely facilitate related research. However, most existing human datasets only provide multi-view human images captured under the same illumination. Although valuable for modeling tasks, they are not readily used in relighting problems. To promote research in both fields, in… ▽ More

    Submitted 23 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Project page: https://miaoing.github.io/RNHA

  15. arXiv:2211.05543  [pdf, other

    cs.SD cs.LG eess.AS

    Vis2Mus: Exploring Multimodal Representation Map** for Controllable Music Generation

    Authors: Runbang Zhang, Yixiao Zhang, Kai Shao, Ying Shan, Gus Xia

    Abstract: In this study, we explore the representation map** from the domain of visual arts to the domain of music, with which we can use visual arts as an effective handle to control music generation. Unlike most studies in multimodal representation learning that are purely data-driven, we adopt an analysis-by-synthesis approach that combines deep music representation learning with user studies. Such an… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023. GitHub repo: https://github.com/ldzhangyx/vis2mus

  16. arXiv:2209.01054  [pdf, other

    cs.MA cs.LG

    Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

    Authors: Taher Jafferjee, Juliusz Ziomek, Tianpei Yang, Zipeng Dai, Jianhong Wang, Matthew Taylor, Kun Shao, Jun Wang, David Mguni

    Abstract: Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly rep… ▽ More

    Submitted 22 June, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

  17. arXiv:2207.09074  [pdf, other

    cs.CV cs.LG

    Incremental Task Learning with Incremental Rank Updates

    Authors: Rakib Hyder, Ken Shao, Boyu Hou, Panos Markopoulos, Ashley Prater-Bennette, M. Salman Asif

    Abstract: Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: Code will be available at https://github.com/CSIPlab/task-increment-rank-update.git

    Journal ref: ECCV 2022

  18. arXiv:2205.15953  [pdf, other

    cs.LG

    Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

    Authors: David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng Dai, Kun Shao, Jun Wang

    Abstract: Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving su… ▽ More

    Submitted 4 June, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

  19. arXiv:2204.14096  [pdf, other

    stat.ML cs.LG q-bio.QM stat.AP

    Bayesian Information Criterion for Event-based Multi-trial Ensemble data

    Authors: Kaidi Shao, Nikos K. Logothetis, Michel Besserve

    Abstract: Transient recurring phenomena are ubiquitous in many scientific fields like neuroscience and meteorology. Time inhomogenous Vector Autoregressive Models (VAR) may be used to characterize peri-event system dynamics associated with such phenomena, and can be learned by exploiting multi-dimensional data gathering samples of the evolution of the system in multiple time windows comprising, each associa… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: 12 pages, 4 figures

  20. PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution

    Authors: Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, Song Han

    Abstract: 3D neural networks are widely used in real-world applications (e.g., AR/VR headsets, self-driving cars). They are required to be fast and accurate; however, limited hardware resources on edge devices make these requirements rather challenging. Previous work processes 3D data using either voxel-based or point-based neural networks, but both types of 3D models are not hardware-efficient due to the l… ▽ More

    Submitted 25 April, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Journal extension of arXiv:1907.03739 and arXiv:2007.16100 (IEEE TPAMI, 2021). The first two authors contributed equally to this work

  21. arXiv:2201.06837  [pdf, other

    cs.LG physics.data-an physics.geo-ph

    Landslide Susceptibility Modeling by Interpretable Neural Network

    Authors: Khaled Youssef, Kevin Shao, Seulgi Moon, Louis-Serge Bouchard

    Abstract: Landslides are notoriously difficult to predict because numerous spatially and temporally varying factors contribute to slope stability. Artificial neural networks (ANN) have been shown to improve prediction accuracy but are largely uninterpretable. Here we introduce an additive ANN optimization framework to assess landslide susceptibility, as well as dataset division and outcome interpretation te… ▽ More

    Submitted 12 March, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: 79 pages (including SI section); 8 main figures; 12 supplementary figures; 9 supplementary tables

  22. arXiv:2106.00517  [pdf, other

    cs.AI

    Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

    Authors: Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

    Abstract: Extending transfer learning to cooperative multi-agent reinforcement learning (MARL) has recently received much attention. In contrast to the single-agent setting, the coordination indispensable in cooperative MARL constrains each agent's policy. However, existing transfer methods focus exclusively on agent policy and ignores coordination knowledge. We propose a new architecture that realizes robu… ▽ More

    Submitted 3 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: 12 pages, 9 figures

  23. FocusNetv2: Imbalanced Large and Small Organ Segmentation with Adversarial Shape Constraint for Head and Neck CT Images

    Authors: Yunhe Gao, Rui Huang, Yiwei Yang, Jie Zhang, Kainan Shao, Changjuan Tao, Yuanyuan Chen, Dimitris N. Metaxas, Hongsheng Li, Ming Chen

    Abstract: Radiotherapy is a treatment where radiation is used to eliminate cancer cells. The delineation of organs-at-risk (OARs) is a vital step in radiotherapy treatment planning to avoid damage to healthy organs. For nasopharyngeal cancer, more than 20 OARs are needed to be precisely segmented in advance. The challenge of this task lies in complex anatomical structure, low-contrast organ contours, and th… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: Accepted by Medical Image Analysis

  24. arXiv:2010.09776  [pdf, other

    cs.MA cs.AI cs.GT cs.LG eess.SY

    SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

    Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

    Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More

    Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

  25. arXiv:2006.01482  [pdf, other

    cs.LG cs.MA

    Multi-Agent Determinantal Q-Learning

    Authors: Yaodong Yang, Ying Wen, Liheng Chen, Jun Wang, Kun Shao, David Mguni, Weinan Zhang

    Abstract: Centralized training with decentralized execution has become an important paradigm in multi-agent learning. Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution. In this paper, we eliminate this restriction by proposing multi-agent determinantal Q-learning. Our method is established on Q-DPP, an extension of deter… ▽ More

    Submitted 9 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  26. arXiv:2002.03939  [pdf, other

    cs.MA

    Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

    Authors: Yaodong Yang, Jianye Hao, Ben Liao, Kun Shao, Guangyong Chen, Wulong Liu, Hongyao Tang

    Abstract: In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value… ▽ More

    Submitted 9 June, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  27. arXiv:2002.03585  [pdf, other

    cs.LG cs.AI stat.ML

    On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

    Authors: Che Wang, Shuhan Yuan, Kai Shao, Keith Ross

    Abstract: A simple and natural algorithm for reinforcement learning (RL) is Monte Carlo Exploring Starts (MCES), where the Q-function is estimated by averaging the Monte Carlo returns, and the policy is improved by choosing actions that maximize the current estimate of the Q-function. Exploration is performed by "exploring starts", that is, each episode begins with a randomly chosen state and action, and th… ▽ More

    Submitted 5 August, 2022; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: The Tenth International Conference on Learning Representations. ICLR 2022. 33 pages

  28. arXiv:1912.10944  [pdf, other

    cs.MA cs.AI cs.LG

    A Survey of Deep Reinforcement Learning in Video Games

    Authors: Kun Shao, Zhentao Tang, Yuanheng Zhu, Nannan Li, Dongbin Zhao

    Abstract: Deep reinforcement learning (DRL) has made great achievements since proposed. Generally, DRL agents receive high-dimensional inputs at each step, and make actions according to deep-neural-network-based policies. This learning mechanism updates the policy to maximize the return with an end-to-end method. In this paper, we survey the progress of DRL methods, including value-based, policy gradient, a… ▽ More

    Submitted 26 December, 2019; v1 submitted 23 December, 2019; originally announced December 2019.

    Comments: 13 pages, 3 figures

  29. arXiv:1804.00810  [pdf, other

    cs.AI cs.LG cs.MA

    StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning

    Authors: Kun Shao, Yuanheng Zhu, Dongbin Zhao

    Abstract: Real-time strategy games have been an important field of game artificial intelligence in recent years. This paper presents a reinforcement learning and curriculum transfer learning method to control multiple units in StarCraft micromanagement. We define an efficient state representation, which breaks down the complexity caused by the large state space in the game environment. Then a parameter shar… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Comments: 12 pages, 14 figures, accepted to IEEE Transactions on Emerging Topics in Computational Intelligence