Skip to main content

Showing 1–50 of 85 results for author: Shu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, **gtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.07438  [pdf, other

    cs.LG

    DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting

    Authors: Yuxuan Shu, Vasileios Lampos

    Abstract: In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. I… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/ClaudiaShu/DeformTime

  3. arXiv:2406.04264  [pdf, other

    cs.CV cs.AI cs.CL

    MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

    Authors: Junjie Zhou, Yan Shu, Bo Zhao, Boya Wu, Shitao Xiao, Xi Yang, Bo Zhang, Tiejun Huang, Zheng Liu

    Abstract: The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To addres… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2406.02309  [pdf, other

    cs.LG

    Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing

    Authors: Youwei Shu, Xi Xiao, Derui Wang, Yuxin Cao, Siji Chen, Jason Xue, Linyi Li, Bo Li

    Abstract: Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of tw… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ICML 2024 Poster

  5. arXiv:2405.19131  [pdf, other

    cs.DC

    Learning Interpretable Scheduling Algorithms for Data Processing Clusters

    Authors: Zhibo Hu, Chen Wang, Helen, Paik, Yanfeng Shu, Liming Zhu

    Abstract: Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 20 pages, 18 figures

    MSC Class: 68M20 ACM Class: I.2.8; D.4.1

  6. arXiv:2405.17478  [pdf, other

    cs.LG stat.ML

    ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning

    Authors: Yihang Wang, Yuying Qiu, Peng Chen, Kai Zhao, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo

    Abstract: With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  7. arXiv:2405.16122  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

    Authors: Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

    Abstract: Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 23 pages, 1 figure, 23 tables

  8. arXiv:2405.15273  [pdf, other

    cs.LG

    Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

    Authors: Qichao Shentu, Beibu Li, Kai Zhao, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo

    Abstract: Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomal… ▽ More

    Submitted 2 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.14831  [pdf, other

    cs.CL cs.AI

    HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

    Authors: Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su

    Abstract: In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integra… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  10. arXiv:2405.05733  [pdf, other

    stat.ML cs.LG

    Batched Stochastic Bandit for Nondegenerate Functions

    Authors: Yu Liu, Yunlu Shu, Tianyu Wang

    Abstract: This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  11. arXiv:2405.00244  [pdf, other

    cs.CV

    Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

    Authors: Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou

    Abstract: As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruc… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This paper has been accepted by CVPR 2024

  12. arXiv:2403.20198  [pdf, other

    cs.IT eess.SY

    Minimizing End-to-End Latency for Joint Source-Channel Coding Systems

    Authors: Kaiyi Chi, Qianqian Yang, Yuanchao Shu, Zhaohui Yang, Zhiguo Shi

    Abstract: While existing studies have highlighted the advantages of deep learning (DL)-based joint source-channel coding (JSCC) schemes in enhancing transmission efficiency, they often overlook the crucial aspect of resource management during the deployment phase. In this paper, we propose an approach to minimize the transmission latency in an uplink JSCC-based system. We first analyze the correlation betwe… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 7 Pages, 5 Figures, accepted by 2024 IEEE ICC Workshop

  13. arXiv:2403.13677  [pdf, other

    cs.CV

    Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers

    Authors: Yuyang Shu, Michael E. Bain

    Abstract: Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  14. arXiv:2403.07591  [pdf, other

    cs.LG

    Robustifying and Boosting Training-Free Neural Architecture Search

    Authors: Zhenfeng He, Yao Shu, Zhongxiang Dai, Bryan Kian Hsiang Low

    Abstract: Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics ty… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024. Code available at https://github.com/hzf1174/RoBoT

  15. arXiv:2403.02993  [pdf, other

    cs.AI

    Localized Zeroth-Order Prompt Optimization

    Authors: Wenyang Hu, Yao Shu, Zongmin Yu, Zhaoxuan Wu, Xiangqiang Lin, Zhongxiang Dai, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in develo** prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of fin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  16. GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features

    Authors: Yunzhuo Sun, Yifang Xu, Zien Xie, Yukun Shu, Sidan Du

    Abstract: Moment retrieval (MR) and highlight detection (HD) aim to identify relevant moments and highlights in video from corresponding natural language query. Large language models (LLMs) have demonstrated proficiency in various computer vision tasks. However, existing methods for MR\&HD have not yet been integrated with LLMs. In this letter, we propose a novel two-stage model that takes the output of LLM… ▽ More

    Submitted 10 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures

  17. arXiv:2402.14672  [pdf, other

    cs.CL cs.AI

    Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

    Authors: Yu Gu, Yiheng Shu, Hao Yu, Xiao Liu, Yuxiao Dong, Jie Tang, Jayanth Srinivasa, Hugo Latapie, Yu Su

    Abstract: The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist language agents capable of operating within complex real-world environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 16 pages, 8 figures, 4 tables

    ACM Class: I.2.7

  18. arXiv:2402.11427  [pdf, other

    cs.LG cs.AI stat.ML

    OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations

    Authors: Yao Shu, Jiongfeng Fang, Ying Tiffany He, Fei Richard Yu

    Abstract: First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately paralle… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  19. arXiv:2402.07179  [pdf, other

    cs.CL cs.IR

    Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models

    Authors: Zhibo Hu, Chen Wang, Yanfeng Shu, Helen, Paik, Liming Zhu

    Abstract: The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the inser… ▽ More

    Submitted 20 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: 12 pages, 9 figures

    ACM Class: I.2.7; H.3.3

  20. arXiv:2402.05956  [pdf, other

    cs.LG

    Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

    Authors: Peng Chen, Yingying Zhang, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, Chenjuan Guo

    Abstract: Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different… ▽ More

    Submitted 6 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by the 12th International Conference on Learning Representations (ICLR 2024)

  21. arXiv:2402.03082  [pdf, other

    cs.CV cs.LG

    Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing

    Authors: Yan Shu, Weichao Zeng, Zhenhang Li, Fangmin Zhao, Yu Zhou

    Abstract: Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that dist… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  22. arXiv:2402.01157  [pdf, other

    cs.CV

    Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation of Prediction Rationale

    Authors: Yangyang Shu, Xiaofeng Cao, Qi Chen, Bowen Zhang, Ziqin Zhou, Anton van den Hengel, Lingqiao Liu

    Abstract: Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. The primary difficulty in this task is that the model's predictions may be inaccurate, and using these inaccurate predictions for model adaptation can lead to misleading results. To address this issue, this paper pr… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  23. arXiv:2401.07213  [pdf, ps, other

    cs.CV

    Depth-agnostic Single Image Dehazing

    Authors: Honglei Xu, Yan Shu, Shaohui Liu

    Abstract: Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel syn… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  24. arXiv:2401.02594  [pdf, other

    cs.CL

    Unsupervised hard Negative Augmentation for contrastive learning

    Authors: Yuxuan Shu, Vasileios Lampos

    Abstract: We present Unsupervised hard Negative Augmentation (UNA), a method that generates synthetic negative instances based on the term frequency-inverse document frequency (TF-IDF) retrieval model. UNA uses TF-IDF scores to ascertain the perceived importance of terms in a sentence and then produces negative samples by replacing terms with respect to that. Our experiments demonstrate that models trained… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: The code and pre-trained models are available at https://github.com/ClaudiaShu/UNA

  25. arXiv:2312.05927  [pdf, other

    cs.DL cs.SI physics.soc-ph

    The survival of scientific stylization

    Authors: Yuanyuan Shu, Tianxing Pan

    Abstract: This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-re… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 55 pages (23 main text, 32 SI)

  26. arXiv:2312.00411  [pdf

    cs.LG

    A framework for mining lifestyle profiles through multi-dimensional and high-order mobility feature clustering

    Authors: Yeshuo Shu, Gangcheng Zhang, Keyi Liu, **tong Tang, Liyan Xu

    Abstract: Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users' moving trajectory records… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  27. arXiv:2311.13381  [pdf, other

    cs.LG cs.AI cs.DC

    Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

    Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen

    Abstract: Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 6 pages, 7 figures; Submitted to HotMobile 2024

  28. arXiv:2311.11572  [pdf, other

    cs.ET

    Cryogenic quasi-static embedded DRAM for energy-efficient compute-in-memory applications

    Authors: Yuhao Shu, Hongtu Zhang, Hao Sun, Mengru Zhang, Wenfeng Zhao, Qi Deng, Zhidong Tang, Yumeng Yuan, Yongqi Hu, Yu Gu, Xufeng Kou, Yajun Ha

    Abstract: Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed fou… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  29. arXiv:2311.07090  [pdf, other

    cs.CV

    CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings

    Authors: Yachun Mi, Yu Li, Yan Shu, Chen Hui, Puchao Zhou, Shaohui Liu

    Abstract: Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  30. arXiv:2311.05827  [pdf, other

    cs.LG

    AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

    Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Zhiguo Shi, Jiming Chen

    Abstract: It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communicat… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  31. arXiv:2311.02715  [pdf, other

    cs.LG stat.ML

    Exploiting Correlated Auxiliary Feedback in Parameterized Bandits

    Authors: Arun Verma, Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low

    Abstract: We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect addit… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023

  32. arXiv:2310.13473  [pdf, other

    cs.CV

    Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

    Authors: Mingwei Zhu, Leigang Sha, Yu Shu, Kangjia Zhao, Tiancheng Zhao, Jianwei Yin

    Abstract: Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human ac… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  33. arXiv:2310.05373  [pdf, other

    cs.LG cs.AI

    Quantum Bayesian Optimization

    Authors: Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets f… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  34. arXiv:2310.02905  [pdf, other

    cs.LG cs.AI cs.CL

    Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

    Authors: Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

    Abstract: Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: Accepted to ICML 2024

  35. arXiv:2309.17288  [pdf, other

    cs.AI

    AutoAgents: A Framework for Automatic Agent Generation

    Authors: Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F. Karlsson, Jie Fu, Yemin Shi

    Abstract: Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: IJCAI 2024

  36. arXiv:2309.08345  [pdf, other

    cs.CL cs.AI

    Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases

    Authors: Yiheng Shu, Zhiwei Yu

    Abstract: Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experiment… ▽ More

    Submitted 9 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

  37. arXiv:2308.15930  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    LLaSM: Large Language and Speech Model

    Authors: Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

    Abstract: Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f… ▽ More

    Submitted 16 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  38. arXiv:2308.11155  [pdf, other

    cs.LG cs.AI physics.chem-ph quant-ph

    Beyond MD17: the reactive xxMD dataset

    Authors: Zihan Pengmei, Junyu Liu, Yinan Shu

    Abstract: System specific neural force fields (NFFs) have gained popularity in computational chemistry. One of the most popular datasets as a bencharmk to develop NFFs models is the MD17 dataset and its subsequent extension. These datasets comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampled from direct adiabatic dynamics. However, many chemical re… ▽ More

    Submitted 5 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: 19 pages, many figures. Data available at https://github.com/zpengmei/xxMD

    Journal ref: Sci Data 11, 222 (2024)

  39. arXiv:2308.09904  [pdf, other

    cs.IR cs.AI

    RAH! RecSys-Assistant-Human: A Human-Centered Recommendation Framework with LLM Agents

    Authors: Yubo Shu, Haonan Zhang, Hansu Gu, Peng Zhang, Tun Lu, Dongsheng Li, Ning Gu

    Abstract: The rapid evolution of the web has led to an exponential growth in content. Recommender systems play a crucial role in Human-Computer Interaction (HCI) by tailoring content based on individual preferences. Despite their importance, challenges persist in balancing recommendation accuracy with user satisfaction, addressing biases while preserving user privacy, and solving cold-start problems in cros… ▽ More

    Submitted 17 October, 2023; v1 submitted 19 August, 2023; originally announced August 2023.

  40. arXiv:2308.04077  [pdf, other

    cs.LG cs.AI

    Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients

    Authors: Yao Shu, Xiaoqiang Lin, Zhongxiang Dai, Bryan Kian Hsiang Low

    Abstract: Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization,… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  41. arXiv:2307.01090  [pdf, other

    astro-ph.GA astro-ph.CO astro-ph.IM cs.CV cs.LG

    Streamlined Lensed Quasar Identification in Multiband Images via Ensemble Networks

    Authors: Irham Taufik Andika, Sherry H. Suyu, Raoul Cañameras, Alejandra Melo, Stefan Schuldt, Yi** Shu, Anna-Christina Eilers, Anton Timur Jaelani, Minghao Yue

    Abstract: Quasars experiencing strong lensing offer unique viewpoints on subjects related to the cosmic expansion rate, the dark matter profile within the foreground deflectors, and the quasar host galaxies. Unfortunately, identifying them in astronomical images is challenging since they are overwhelmed by the abundance of non-lenses. To address this, we have developed a novel approach by ensembling cutting… ▽ More

    Submitted 18 August, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in the Astronomy & Astrophysics journal. 28 pages, 11 figures, and 3 tables. We welcome comments from the reader

    Journal ref: A&A 678, A103 (2023)

  42. arXiv:2306.07597  [pdf, other

    cs.CL

    Question Decomposition Tree for Answering Complex Questions over Knowledge Bases

    Authors: Xiang Huang, Sitao Cheng, Yiheng Shu, Yuheng Bao, Yuzhong Qu

    Abstract: Knowledge base question answering (KBQA) has attracted a lot of interest in recent years, especially for complex questions which require multiple facts to answer. Question decomposition is a promising way to answer complex questions. Existing decomposition methods split the question into sub-questions according to a single compositionality type, which is not sufficient for questions involving mult… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted by AAAI2023

  43. arXiv:2305.12450  [pdf, other

    eess.AS cs.SD

    Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

    Authors: Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

    Abstract: For speech interaction, voice activity detection (VAD) is often used as a front-end. However, traditional VAD algorithms usually need to wait for a continuous tail silence to reach a preset maximum duration before segmentation, resulting in a large latency that affects user experience. In this paper, we propose a novel semantic VAD for low-latency segmentation. Different from existing methods, a f… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech2023

  44. arXiv:2304.07987  [pdf, other

    cs.CL cs.AI

    Chinese Open Instruction Generalist: A Preliminary Release

    Authors: Ge Zhang, Yemin Shi, Ruibo Liu, Ruibin Yuan, Yizhi Li, Siwei Dong, Yu Shu, Zhaoqun Li, Zekun Wang, Chenghua Lin, Wenhao Huang, Jie Fu

    Abstract: Instruction tuning is widely recognized as a key technique for building generalist language models, which has attracted the attention of researchers and the public with the release of InstructGPT~\citep{ouyang2022training} and ChatGPT\footnote{\url{https://chat.openai.com/}}. Despite impressive progress in English-oriented large-scale language models (LLMs), it is still under-explored whether Engl… ▽ More

    Submitted 24 April, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

  45. arXiv:2303.01669  [pdf, other

    cs.CV cs.LG

    Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

    Authors: Yangyang Shu, Anton van den Hengel, Lingqiao Liu

    Abstract: Self-supervised learning (SSL) strategies have demonstrated remarkable performance in various recognition tasks. However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differen… ▽ More

    Submitted 27 July, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  46. Read Pointer Meters in complex environments based on a Human-like Alignment and Recognition Algorithm

    Authors: Yan Shu, Shaohui Liu, Honglei Xu, Feng Jiang

    Abstract: Recently, develo** an automatic reading system for analog measuring instruments has gained increased attention, as it enables the collection of numerous state of equipment. Nonetheless, two major obstacles still obstruct its deployment to real-world applications. The first issue is that they rarely take the entire pipeline's speed into account. The second is that they are incapable of dealing wi… ▽ More

    Submitted 30 July, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

  47. arXiv:2302.00864  [pdf, other

    cs.LG cs.CV

    CLIPood: Generalizing CLIP to Out-of-Distributions

    Authors: Yang Shu, Xingzhuo Guo, Jialong Wu, Ximei Wang, Jianmin Wang, Mingsheng Long

    Abstract: Out-of-distribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. This paper aims at generalizing CLIP to out-of-distribution… ▽ More

    Submitted 13 July, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted by ICML 2023

  48. arXiv:2210.12925  [pdf, other

    cs.CL cs.AI

    TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases

    Authors: Yiheng Shu, Zhiwei Yu, Yuhan Li, Börje F. Karlsson, Tingting Ma, Yuzhong Qu, Chin-Yew Lin

    Abstract: Pre-trained language models (PLMs) have shown their effectiveness in multiple scenarios. However, KBQA remains challenging, especially regarding coverage and generalization settings. This is due to two main factors: i) understanding the semantics of both questions and relevant knowledge from the KB; ii) generating executable logical forms with both semantic and syntactic correctness. In this paper… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

  49. arXiv:2210.06850  [pdf, other

    cs.LG cs.AI

    Sample-Then-Optimize Batch Neural Thompson Sampling

    Authors: Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and der… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Extended version with proofs and additional experimental details and results, 30 pages

  50. arXiv:2210.03853  [pdf, other

    cs.CV

    Revisiting Self-Supervised Contrastive Learning for Facial Expression Recognition

    Authors: Yuxuan Shu, Xiao Gu, Guang-Zhong Yang, Benny Lo

    Abstract: The success of most advanced facial expression recognition works relies heavily on large-scale annotated datasets. However, it poses great challenges in acquiring clean and consistent annotations for facial expression datasets. On the other hand, self-supervised contrastive learning has gained great popularity due to its simple yet effective instance discrimination training strategy, which can pot… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to BMVC 2022