Skip to main content

Showing 1–50 of 389 results for author: Zhao, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01726  [pdf

    cs.CV

    Grouped Discrete Representation Guides Object-Centric Learning

    Authors: Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

    Abstract: Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of discrete representation, obtained by discretizing noisy features in image or video feature maps using template features from a codebook. However, treating featu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    ACM Class: I.4.6

  2. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang **g, Haining Tan, **g** Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.17580  [pdf, other

    cs.DC

    Experimental Evaluation of Distributed k-Core Decomposition

    Authors: Bin Guo, Runze Zhao

    Abstract: Given an undirected graph, the $k$-core is a subgraph in which each node has at least $k$ connections, which is widely used in graph analytics to identify core subgraphs within a larger graph. The sequential $k$-core decomposition algorithm faces limitations due to memory constraints and data graphs can be inherently distributed. A distributed approach is proposed to overcome limitations by allowi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16605  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    CLEAR: Can Language Models Really Understand Causal Graphs?

    Authors: Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

    Abstract: Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16006  [pdf, other

    cs.LG cs.AI

    Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

    Authors: Erin J. Talvitie, Zilei Shao, Huiying Li, **ghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

    Abstract: In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: To appear: Reinforcement Learning Conference (RLC), 2024

  6. arXiv:2406.10174  [pdf, other

    cs.CL

    Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation

    Authors: Mohamad Elzohbi, Richard Zhao

    Abstract: The intersection between poetry and music provides an interesting case for computational creativity, yet remains relatively unexplored. This paper explores the integration of poetry and music through the lens of beat patterns, investigating whether a byte-based language model can generate words that fit specific beat patterns within the context of poetry. Drawing on earlier studies, we developed a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, accepted for the 15th International Conference on Computational Creativity, ICCC'24

  7. arXiv:2406.01638  [pdf, other

    cs.LG cs.AI cs.CL

    TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment

    Authors: Chenxi Liu, Qianxiong Xu, Hao Miao, Sun Yang, Lingzheng Zhang, Cheng Long, Ziyue Li, Rui Zhao

    Abstract: The widespread adoption of scalable mobile sensing has led to large amounts of time series data for real-world applications. A fundamental application is multivariate time series forecasting (MTSF), which aims to predict future time series values based on historical observations. Existing MTSF methods suffer from limited parameterization and small-scale training data. Recently, Large language mode… ▽ More

    Submitted 13 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  8. arXiv:2406.00023  [pdf, other

    cs.CL

    LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

    Authors: **g Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

    Abstract: Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversel… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  9. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  10. arXiv:2405.20267  [pdf, other

    cs.CL

    Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions

    Authors: Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing

    Abstract: As LLMs evolve on a daily basis, there is an urgent need for a trustworthy evaluation method that can provide robust evaluation results in a timely fashion. Currently, as static benchmarks are prone to contamination concerns, users tend to trust human voting platforms, such as Chatbot Arena. However, human annotations require extensive manual efforts. To provide an automatic, robust, and trustwort… ▽ More

    Submitted 12 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.18688  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

    Authors: Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han

    Abstract: Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. A Vlogger-augmented Graph Neural Network Model for Micro-video Recommendation

    Authors: Weijiang Lai, Beihong **, Beibei Li, Yiyuan Zheng, Rui Zhao

    Abstract: Existing micro-video recommendation models exploit the interactions between users and micro-videos and/or multi-modal information of micro-videos to predict the next micro-video a user will watch, ignoring the information related to vloggers, i.e., the producers of micro-videos. However, in micro-video scenarios, vloggers play a significant role in user-video interactions, since vloggers generally… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Journal ref: (2023) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track (pp. 684-699). Cham: Springer Nature Switzerland

  13. arXiv:2405.17152  [pdf, other

    cs.MA cs.AI

    CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

    Authors: **gqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

    Abstract: Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator sel… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  14. arXiv:2405.13532  [pdf, other

    cs.CV

    What Makes Good Few-shot Examples for Vision-Language Models?

    Authors: Zhaojun Guo, **ghui Lu, Xue**g Liu, Rui Zhao, ZhenXing Qian, Fei Tan

    Abstract: Despite the notable advancements achieved by leveraging pre-trained vision-language (VL) models through few-shot tuning for downstream tasks, our detailed empirical study highlights a significant dependence of few-shot learning outcomes on the careful selection of training examples - a facet that has been previously overlooked in research. In this study, we delve into devising more effective strat… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 8 pages, 4 figures

  15. arXiv:2405.09593  [pdf, other

    cs.DB cs.AI

    SQL-to-Schema Enhances Schema Linking in Text-to-SQL

    Authors: Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao

    Abstract: In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  16. arXiv:2405.09582  [pdf

    cs.CV eess.IV

    AD-Aligning: Emulating Human-like Generalization for Cognitive Domain Adaptation in Deep Learning

    Authors: Zhuoying Li, Bohua Wan, Cong Mu, Ruzhang Zhao, Shushan Qiu, Chao Yan

    Abstract: Domain adaptation is pivotal for enabling deep learning models to generalize across diverse domains, a task complicated by variations in presentation and cognitive nuances. In this paper, we introduce AD-Aligning, a novel approach that combines adversarial training with source-target domain alignment to enhance generalization capabilities. By pretraining with Coral loss and standard loss, AD-Align… ▽ More

    Submitted 21 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 5th International Conference on Electronic Communication and Artificial Intelligence

  17. arXiv:2405.05714  [pdf, other

    cs.CV cs.LG

    Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

    Authors: Rui Zhao, Bin Shi, Jianfei Ruan, Tianze Pan, Bo Dong

    Abstract: In noisy label learning, estimating noisy class posteriors plays a fundamental role for develo** consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize… ▽ More

    Submitted 2 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  18. arXiv:2405.02686  [pdf, other

    cs.CV cs.AI

    Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

    Authors: Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

    Abstract: Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 3 pages

  19. arXiv:2405.01439  [pdf, other

    cs.CV

    Improving Domain Generalization on Gaze Estimation via Branch-out Auxiliary Regularization

    Authors: Ruijie Zhao, Pinyan Tang, Sihui Luo

    Abstract: Despite remarkable advancements, mainstream gaze estimation techniques, particularly appearance-based methods, often suffer from performance degradation in uncontrolled environments due to variations in illumination and individual facial attributes. Existing domain adaptation strategies, limited by their need for target domain samples, may fall short in real-world applications. This letter introdu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  20. arXiv:2405.00622  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Evaluation of Language Models

    Authors: Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

    Abstract: Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 315 pages, 230 figures, 21 tables. Project website: https://opencausalab.github.io/CaLM

  21. arXiv:2404.17662  [pdf, other

    cs.CL

    PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

    Authors: Qinglin Zhu, Runcong Zhao, **hua Du, Lin Gui, Yulan He

    Abstract: We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equip**… ▽ More

    Submitted 17 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  22. arXiv:2404.17378  [pdf

    quant-ph cs.AI

    Quantum Adjoint Convolutional Layers for Effective Data Representation

    Authors: Ren-Xin Zhao, Shi Wang, Yaonan Wang

    Abstract: Quantum Convolutional Layer (QCL) is considered as one of the core of Quantum Convolutional Neural Networks (QCNNs) due to its efficient data feature extraction capability. However, the current principle of QCL is not as mathematically understandable as Classical Convolutional Layer (CCL) due to its black-box structure. Moreover, classical data map** in many QCLs is inefficient. To this end, fir… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  23. arXiv:2404.16280  [pdf, ps, other

    cs.NE cs.AI cs.LG

    An Efficient Reconstructed Differential Evolution Variant by Some of the Current State-of-the-art Strategies for Solving Single Objective Bound Constrained Problems

    Authors: Sichen Tao, Ruihan Zhao, Kaiyu Wang, Shangce Gao

    Abstract: Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research ba… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  24. arXiv:2404.13767  [pdf, other

    cs.RO cs.AI cs.CV

    Autonomous Robot for Disaster Map** and Victim Localization

    Authors: Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, **gming Cheng

    Abstract: In response to the critical need for effective reconnaissance in disaster scenarios, this research article presents the design and implementation of a complete autonomous robot system using the Turtlebot3 with Robotic Operating System (ROS) Noetic. Upon deployment in closed, initially unknown environments, the system aims to generate a comprehensive map and identify any present 'victims' using Apr… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Class final project for Northeastern University EECE 5550 Mobile Robotics Course

  25. arXiv:2404.12090  [pdf, other

    cs.AI

    X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

    Authors: Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, **gqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

    Abstract: The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  26. arXiv:2404.11945  [pdf, other

    cs.RO

    Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion

    Authors: Ruoqi Zhao, Xingbang Yan, Yubo Fan

    Abstract: Powered hip exoskeletons have shown the ability for locomotion assistance during treadmill walking. However, providing suitable assistance in real-world walking scenarios which involve changing terrain remains challenging. Recent research suggests that forecasting the lower limb joint's angles could provide target trajectories for exoskeletons and prostheses, and the performance could be improved… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 6 pages, submitted to IEEE RA-L, under review. This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2404.11895  [pdf, other

    cs.CV

    FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

    Authors: Wei Wu, Qingnan Fan, Shuai Qin, Hong Gu, Ruoyu Zhao, Antoni B. Chan

    Abstract: Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  28. arXiv:2404.10306  [pdf, other

    cs.CL

    Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

    Authors: Hengyuan Zhang, Yanru Wu, Dawei Li, Sak Yang, Rui Zhao, Yong Jiang, Fei Tan

    Abstract: Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's perfo… ▽ More

    Submitted 3 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 43 pages, 10 figures, accepted by ACL 2024 Findings

  29. arXiv:2404.08001  [pdf, other

    hep-ph cs.AI cs.CL cs.LG hep-ex physics.comp-ph

    Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

    Authors: Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao, Jun Cao, Fazhi Qi, Changzheng Yuan

    Abstract: Large Language Models (LLMs) are undergoing a period of rapid updates and changes, with state-of-the-art (SOTA) model frequently being replaced. When applying LLMs to a specific scientific field, it's challenging to acquire unique domain knowledge while kee** the model itself advanced. To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

    ACM Class: I.2.7

  30. arXiv:2404.06913  [pdf, other

    cs.CV

    Sparse Global Matching for Video Frame Interpolation with Large Motion

    Authors: Chunxu Liu, Guozhen Zhang, Rui Zhao, Limin Wang

    Abstract: Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. In this paper, we introduce a new pipeline for VFI, which can effectively integrate global-level information to alleviate issues associated with large motion. Specifically… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://sgm-vfi.github.io/. Fixed some typos in the supplementary material

  31. arXiv:2404.02507  [pdf, other

    cs.CL

    Lifelong Event Detection with Embedding Space Separation and Compaction

    Authors: Chengwei Qin, Ruirui Chen, Ruochen Zhao, Wenhan Xia, Shafiq Joty

    Abstract: To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 main conference

  32. arXiv:2404.00987  [pdf, other

    cs.CV

    FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

    Authors: Ruowen Zhao, Zhengyi Wang, Yikai Wang, Zihan Zhou, Jun Zhu

    Abstract: 3D content generation has wide applications in various fields. One of its dominant paradigms is by sparse-view reconstruction using multi-view images generated by diffusion models. However, since directly reconstructing triangle meshes from multi-view images is challenging, most methodologies opt to an implicit representation (such as NeRF) during the sparse-view reconstruction and acquire the tar… ▽ More

    Submitted 27 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Project page: https://flexidreamer.github.io

  33. arXiv:2404.00699  [pdf, other

    cs.CL

    How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library

    Authors: Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty

    Abstract: With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model int… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, 3 tables

  34. arXiv:2403.18660  [pdf, other

    cs.GR cs.CV

    InstructBrush: Learning Attention-based Instruction Optimization for Image Editing

    Authors: Ruoyu Zhao, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Wei Wu, Pengcheng Xu, Mingrui Zhu, Nannan Wang, Xinbo Gao

    Abstract: In recent years, instruction-based image editing methods have garnered significant attention in image editing. However, despite encompassing a wide range of editing priors, these methods are helpless when handling editing tasks that are challenging to accurately describe through language. We propose InstructBrush, an inversion method for instruction-based image editing methods to bridge this gap.… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Project Page: https://royzhao926.github.io/InstructBrush/

  35. arXiv:2403.15027  [pdf, other

    cs.LG cs.AI

    Grey-informed neural network for time-series forecasting

    Authors: Wanli Xie, Ruibin Zhao, Zhenguo Xu, Tingting Liang

    Abstract: Neural network models have shown outstanding performance and successful resolutions to complex problems in various fields. However, the majority of these models are viewed as black-box, requiring a significant amount of data for development. Consequently, in situations with limited data, constructing appropriate models becomes challenging due to the lack of transparency and scarcity of data. To ta… ▽ More

    Submitted 3 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  36. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, **yu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios. We first investigate whether long-content transcriptions can improve the vanilla conformer transducer (C-T) models. Our experiments indicate that th… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by TASLP 2024

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

  37. arXiv:2403.12667  [pdf, other

    cs.MM cs.HC

    ICE: Interactive 3D Game Character Editing via Dialogue

    Authors: Haoqian Wu, Yunjie Wu, Zhipeng Hu, Lincheng Li, Weijie Chen, Rui Zhao, Changjie Fan, Xin Yu

    Abstract: Text-driven in-game 3D character auto-customization systems eliminate the complicated process of manipulating intricate character control parameters. However, current methods are limited by their single-round generation, incapable of further editing and fine-grained modification. In this paper, we propose an Interactive Character Editing framework (ICE) to achieve a multi-round dialogue-based refi… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  38. arXiv:2403.10833  [pdf, other

    cs.RO

    Deep Reinforcement Learning-based Large-scale Robot Exploration

    Authors: Yuhong Cao, Rui Zhao, Yizhuo Wang, Bairan Xiang, Guillaume Sartoretti

    Abstract: In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this e… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  39. arXiv:2403.10408  [pdf, other

    cs.CR cs.CY cs.IR cs.LG cs.SI

    SocialGenPod: Privacy-Friendly Generative AI Social Web Applications with Decentralised Personal Data Stores

    Authors: Vidminas Vizgirda, Rui Zhao, Naman Goel

    Abstract: We present SocialGenPod, a decentralised and privacy-friendly way of deploying generative AI Web applications. Unlike centralised Web and data architectures that keep user data tied to application and service providers, we show how one can use Solid -- a decentralised Web specification -- to decouple user data from generative AI applications. We demonstrate SocialGenPod using a prototype that allo… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Demo paper accepted in Companion Proceedings of the ACM Web Conference 2024

    ACM Class: H.3.4; H.3.5; C.2.4; I.2.1; K.8.1

  40. arXiv:2403.09732  [pdf, other

    cs.CL cs.AI

    PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency

    Authors: Zhishuai Li, Xiang Wang, **g**g Zhao, Sun Yang, Guoqing Du, Xiaoru Hu, Bin Zhang, Yuxiao Ye, Ziyue Li, Rui Zhao, Hangyu Mao

    Abstract: Recent advancements in Text-to-SQL (Text2SQL) emphasize stimulating the large language models (LLM) on in-context learning, achieving significant results. Nevertheless, they face challenges when dealing with verbose database information and complex user intentions. This paper presents a two-stage framework to enhance the performance of current LLM-based natural language to SQL systems. We first in… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  41. arXiv:2403.07587  [pdf, other

    cs.AI cs.CY cs.LO

    Perennial Semantic Data Terms of Use for Decentralized Web

    Authors: Rui Zhao, Jun Zhao

    Abstract: In today's digital landscape, the Web has become increasingly centralized, raising concerns about user privacy violations. Decentralized Web architectures, such as Solid, offer a promising solution by empowering users with better control over their data in their personal `Pods'. However, a significant challenge remains: users must navigate numerous applications to decide which application can be t… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by International World Wide Web Conference 2024 (WWW 2024 / The Web Conf 2024)

  42. arXiv:2403.07420  [pdf, other

    cs.CV

    DragAnything: Motion Control for Anything using Entity Representation

    Authors: Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

    Abstract: We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw… ▽ More

    Submitted 15 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: The project website is at: https://weijiawu.github.io/draganything_page/ . The code is at: https://github.com/showlab/DragAnything

  43. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  44. arXiv:2403.02990  [pdf, other

    cs.CL cs.AI

    Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

    Authors: Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, Shafiq Joty

    Abstract: In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of LLMs on DA, particularly addressing the unique challenges and opportunities they present in the context of natural… ▽ More

    Submitted 2 July, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  45. arXiv:2403.02951  [pdf, other

    cs.CL cs.AI

    Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

    Authors: Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Yang, Chi Harold Liu, Rui Zhao, Ziyue Li, Hangyu Mao

    Abstract: Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL pr… ▽ More

    Submitted 6 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 26pages, 6figures, 14tables

  46. arXiv:2403.00331  [pdf, other

    cs.DC

    WindGP: Efficient Graph Partitioning on Heterogenous Machines

    Authors: Li Zeng, Haohan Huang, Binfan Zheng, Kang Yang, Shengcheng Shao, **hua Zhou, Jun Xie, Rongqian Zhao, Xin Chen

    Abstract: Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider repli… ▽ More

    Submitted 6 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 19 pages, 15 figures, 18 tables

  47. arXiv:2402.17718  [pdf

    cs.LG eess.SP

    Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

    Authors: Vispi Karkaria, Anthony Goeckner, Ru**g Zha, Jie Chen, Jian**g Zhang, Qi Zhu, Jian Cao, Robert X. Gao, Wei Chen

    Abstract: Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 Pages, 10 Figures, 1 Table, NAMRC Conference

  48. arXiv:2402.17411   

    cs.CL

    Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective

    Authors: Fufangchen Zhao, Guoqiang **, Jiaheng Huang, Rui Zhao, Fei Tan

    Abstract: Nowadays both commercial and open-source academic LLM have become the mainstream models of NLP. However, there is still a lack of research on LLM consistency, meaning that throughout the various stages of LLM research and deployment, its internal parameters and capabilities should remain unchanged. This issue exists in both the industrial and academic sectors. The solution to this problem is often… ▽ More

    Submitted 2 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: This paper is not ready

  49. arXiv:2402.13473  [pdf, other

    cs.RO

    Learning Highly Dynamic Behaviors for Quadrupedal Robots

    Authors: Chong Zhang, Jiapeng Sheng, Tingguang Li, He Zhang, Cheng Zhou, Qingxu Zhu, Rui Zhao, Yizheng Zhang, Lei Han

    Abstract: Learning highly dynamic behaviors for robots has been a longstanding challenge. Traditional approaches have demonstrated robust locomotion, but the exhibited behaviors lack diversity and agility. They employ approximate models, which lead to compromises in performance. Data-driven approaches have been shown to reproduce agile behaviors of animals, but typically have not been able to learn highly d… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  50. arXiv:2402.11496  [pdf, other

    cs.RO

    Point-Wise Vibration Pattern Production via a Sparse Actuator Array for Surface Tactile Feedback

    Authors: Xiaosa Li, Runze Zhao, Chengyue Lu, Xiao Xiao, Wenbo Ding

    Abstract: Surface vibration tactile feedback is capable of conveying various semantic information to humans via the handheld electronic devices, like smartphone, touch panel,and game controller. However, covering the whole device contacting surface with dense actuator arrangement can affect its normal use, how to produce desired vibration patterns at any contact point with only several sparse actuators depl… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.