Skip to main content

Showing 1–50 of 262 results for author: Xiao, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00625  [pdf, other

    cs.LO

    Nonlinear Craig Interpolant Generation over Unbounded Domains by Separating Semialgebraic Sets

    Authors: Hao Wu, Jie Wang, Bican Xia, Xiakun Li, Naijun Zhan, Ting Gan

    Abstract: Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques. In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essenti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages (with appendix); accepted by the 26th International Symposium on Formal Methods (FM2024)

  2. arXiv:2406.19963  [pdf, other

    cs.RO cs.AI cs.LG

    Text2Robot: Evolutionary Robot Design from Text Descriptions

    Authors: Ryan P. Ringel, Zachary S. Charlick, Jiaxun Liu, Boxi Xia, Boyuan Chen

    Abstract: Robot design has traditionally been costly and labor-intensive. Despite advancements in automated processes, it remains challenging to navigate a vast design space while producing physically manufacturable robots. We introduce Text2Robot, a framework that converts user text specifications and performance preferences into physical quadrupedal robots. Within minutes, Text2Robot can use text-to-3D mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/Text2Robot

  3. arXiv:2406.14732  [pdf, other

    cs.CL cs.IR

    TTQA-RS- A break-down prompting approach for Multi-hop Table-Text Question Answering with Reasoning and Summarization

    Authors: Jayetri Bardhan, Bushi Xiao, Daisy Zhe Wang

    Abstract: Question answering (QA) over tables and text has gained much popularity over the years. Multi-hop table-text QA requires multiple hops between the table and text, making it a challenging QA task. Although several works have attempted to solve the table-text QA task, most involve training the models and requiring labeled data. In this paper, we have proposed a model - TTQA-RS: A break-down promptin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.10556  [pdf, other

    cs.IT cs.AI

    Multi-User Semantic Fusion for Semantic Communications over Degraded Broadcast Channels

    Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Bin Xia, Wenjun Zhang

    Abstract: Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuse… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: accepted by China Communications

  5. arXiv:2406.10382  [pdf, other

    cs.AI cs.CL

    Efficient Prompting for LLM-based Generative Internet of Things

    Authors: Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit Niyato, Mohsen Guizani

    Abstract: Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network s… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  6. arXiv:2406.09612  [pdf, other

    cs.AI cs.LG physics.chem-ph

    Automated Molecular Concept Generation and Labeling with Large Language Models

    Authors: Shichang Zhang, Botao Xia, Zimin Zhang, Qianli Wu, Fang Sun, Ziniu Hu, Yizhou Sun

    Abstract: Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like G… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.03143  [pdf, other

    cs.CV cs.CR

    ZeroPur: Succinct Training-Free Adversarial Purification

    Authors: Xiuli Bi, Zonglin Yang, Bo Liu, Xiaodong Cun, Chi-Man Pun, Pietro Lio, Bin Xiao

    Abstract: Adversarial purification is a kind of defense technique that can defend various unseen adversarial attacks without modifying the victim classifier. Existing methods often depend on external generative models or cooperation between auxiliary functions and victim classifiers. However, retraining generative models, auxiliary functions, or victim classifiers relies on the domain of the fine-tuned data… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures, under review

  8. arXiv:2406.03102  [pdf, other

    cs.LG cs.AI

    DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

    Authors: Bo Xia, Yilun Kong, Yongzhe Chang, Bo Yuan, Zhiheng Li, Xueqian Wang, Bin Liang

    Abstract: Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redund… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  9. arXiv:2405.17233  [pdf, other

    cs.LG

    CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

    Authors: Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

    Abstract: Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantizatio… ▽ More

    Submitted 2 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2405.17191  [pdf, other

    cs.CV math.PR

    MCGAN: Enhancing GAN Training with Regression-Based Generator Loss

    Authors: Baoren Xiao, Hao Ni, Weixin Yang

    Abstract: Generative adversarial networks (GANs) have emerged as a powerful tool for generating high-fidelity data. However, the main bottleneck of existing approaches is the lack of supervision on the generator training, which often results in undamped oscillation and unsatisfactory performance. To address this issue, we propose an algorithm called Monte Carlo GAN (MCGAN). This approach, utilizing an innov… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.12954  [pdf, other

    cs.LG cs.AI

    A Method on Searching Better Activation Functions

    Authors: Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan, Yongzhe Chang, Xueqian Wang

    Abstract: The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effe… ▽ More

    Submitted 22 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 16 pages,3 figures

  12. arXiv:2405.12462   

    cs.LG cs.AI

    Boosting X-formers with Structured Matrix for Long Sequence Time Series Forecasting

    Authors: Zhicheng Zhang, Yong Wang, Shaoqi Tan, Bowei Xia, Yujie Luo

    Abstract: Transformer-based models for long sequence time series forecasting (LSTF) problems have gained significant attention due to their exceptional forecasting precision. As the cornerstone of these models, the self-attention mechanism poses a challenge to efficient training and inference due to its quadratic time complexity. In this article, we propose a novel architectural design for Transformer-based… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: We believe this work is premature and requires further study

  13. arXiv:2405.12229  [pdf, other

    physics.chem-ph cond-mat.mtrl-sci cs.AI cs.CE physics.comp-ph

    Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy

    Authors: Hao Tang, Brian Xiao, Wenhao He, Pero Subasic, Avetik R. Harutyunyan, Yao Wang, Fang Liu, Haowei Xu, Ju Li

    Abstract: Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method f… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  14. arXiv:2405.09508  [pdf, other

    cs.CL cs.LG

    Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming

    Authors: Bushi Xiao, Chao Gao, Demi Zhang

    Abstract: This study evaluates the performance of Recurrent Neural Network (RNN) and Transformer in replicating cross-language structural priming: a key indicator of abstract grammatical representations in human language processing. Focusing on Chinese-English priming, which involves two typologically distinct languages, we examine how these models handle the robust phenomenon of structural priming, where e… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 9 pages, 6 figures

  15. arXiv:2405.00574  [pdf, other

    cs.CV cs.MM

    EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

    Authors: Deng Li, Xin Liu, Bohao Xing, Baiqiang Xia, Yuan Zong, Bihan Wen, Heikki Kälviäinen

    Abstract: Emotion AI is the ability of computers to understand human emotional states. Existing works have achieved promising progress, but two limitations remain to be solved: 1) Previous studies have been more focused on short sequential video emotion analysis while overlooking long sequential video. However, the emotions in short sequential videos only reflect instantaneous emotions, which may be deliber… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  16. arXiv:2405.00263  [pdf, other

    cs.CL cs.AI cs.LG

    Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

    Authors: Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, Weipeng Chen, Bin Cui

    Abstract: Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its ti… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  17. arXiv:2404.14811  [pdf, other

    eess.SP cs.LG

    FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

    Authors: Bingnan Xiao, **g**g Zhang, Wei Ni, Xin Wang

    Abstract: Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and l… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  18. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  19. arXiv:2404.13235  [pdf, other

    cs.LG

    TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction

    Authors: Ling Yue, Jonathan Li, Md Zabirul Islam, Bolun Xia, Tianfan Fu, **tai Chen

    Abstract: The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2404.05388  [pdf, other

    cs.SE cs.AI cs.CY cs.LG

    An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Map**

    Authors: Boming Xia, Qinghua Lu, Liming Zhu, Zhenchang Xing

    Abstract: The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 1st ACM International Conference on AI-powered Software (AIware)

  21. arXiv:2403.19963  [pdf, other

    cs.CV

    Efficient Modulation for Vision Networks

    Authors: Xu Ma, Xiyang Dai, Jianwei Yang, Bin Xiao, Yinpeng Chen, Yun Fu, Lu Yuan

    Abstract: In this work, we present efficient modulation, a novel design for efficient vision networks. We revisit the modulation mechanism, which operates input through convolutional context modeling and feature projection layers, and fuses features via element-wise multiplication and an MLP block. We demonstrate that the modulation mechanism is particularly well suited for efficient networks and further ta… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024. Codes are made publically available at https://github.com/ma-xu/EfficientMod

  22. arXiv:2403.14430  [pdf, other

    cs.CV

    Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

    Authors: Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question.… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  23. arXiv:2403.13457  [pdf, other

    cs.SC cs.LO

    OSVAuto: semi-automatic verifier for functional specifications of operating systems

    Authors: Yulun Wu, Bohua Zhan, Bican Xia

    Abstract: We present the design and implementation of a tool for semi-automatic verification of functional specifications of operating system modules. Such verification tasks are traditionally done in interactive theorem provers, where the functionalities of the module are specified at abstract and concrete levels using data such as structures, algebraic datatypes, arrays, maps and so on. In this work, we p… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  24. arXiv:2403.12382  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Trace Adaptation of Zero-shot Self-supervised Blind Image Denoising

    Authors: **tong Hu, Bin Xia, Bingchen Li, Wenming Yang

    Abstract: Deep learning-based denoiser has been the focus of recent development on image denoising. In the past few years, there has been increasing interest in develo** self-supervised denoising networks that only require noisy images, without the need for clean ground truth for training. However, a performance gap remains between current self-supervised methods and their supervised counterparts. Additio… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 11pages, 6 figures

  25. arXiv:2403.12052  [pdf, other

    cs.CV

    A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

    Authors: Rui Ma, Qiang Zhou, Yizhu **, Daquan Zhou, Bangjun Xiao, Xiuyu Li, Yi Qu, Aishani Singh, Kurt Keutzer, **gtong Hu, Xiaodong Xie, Zhen Dong, Shanghang Zhang, Shiji Zhou

    Abstract: Copyright law confers upon creators the exclusive rights to reproduce, distribute, and monetize their creative works. However, recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. These technologies enable the unauthorized learning and replication of copyrighted content, artistic creations, and likenesses, leading to the proliferation of unregu… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 January, 2024; originally announced March 2024.

    Comments: 20 pages, 7 figures, 3 table

  26. arXiv:2403.11423  [pdf, other

    cs.CV

    VmambaIR: Visual State Space Model for Image Restoration

    Authors: Yuan Shi, Bin Xia, Xiaoyu **, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, Wenming Yang

    Abstract: Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range depend… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 23 pages

  27. arXiv:2403.10828  [pdf, other

    cs.CR

    Data Availability and Decentralization: New Techniques for zk-Rollups in Layer 2 Blockchain Networks

    Authors: Chengpeng Huang, Rui Song, Shang Gao, Yu Guo, Bin Xiao

    Abstract: The scalability limitations of public blockchains have hindered their widespread adoption in real-world applications. While the Ethereum community is pushing forward in zk-rollup (zero-knowledge rollup) solutions, such as introducing the ``blob transaction'' in EIP-4844, Layer 2 networks encounter a data availability problem: storing transactions completely off-chain poses a risk of data loss, par… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  28. arXiv:2403.10413  [pdf, other

    cs.CV

    Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

    Authors: Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

    Abstract: Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating m… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures, submitted to IROS 2024

  29. arXiv:2401.11401  [pdf, other

    cs.CV

    LLMRA: Multi-modal Large Language Model based Restoration Assistant

    Authors: Xiaoyu **, Yuan Shi, Bin Xia, Wenming Yang

    Abstract: Multi-modal Large Language Models (MLLMs) have a significant impact on various tasks, due to their extensive knowledge and powerful perception and generation capabilities. However, it still remains an open research problem on applying MLLMs to low-level vision tasks. In this paper, we present a simple MLLM-based Image Restoration framework to address this gap, namely Multi-modal Large Language Mod… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  30. arXiv:2401.08433  [pdf, other

    cs.RO

    Autonomous Multiple-Trolley Collection System with Nonholonomic Robots: Design, Control, and Implementation

    Authors: Peijia Xie, Bingyi Xia, Anjun Hu, Ziqi Zhao, Lingxiao Meng, Zhirui Sun, Xuheng Gao, Jiankun Wang, Max Q. -H. Meng

    Abstract: The intricate and multi-stage task in dynamic public spaces like luggage trolley collection in airports presents both a promising opportunity and an ongoing challenge for automated service robots. Previous research has primarily focused on handling a single trolley or individual functional components, creating a gap in providing cost-effective and efficient solutions for practical scenarios. In th… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  31. arXiv:2312.08592  [pdf, other

    cs.CV

    Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis

    Authors: Frank P. -W. Lo, Jianing Qiu, Zeyu Wang, Junhong Chen, Bo Xiao, Wu Yuan, Stamatia Giannarou, Gary Frost, Benny Lo

    Abstract: Conventional approaches to dietary assessment are primarily grounded in self-reporting methods or structured interviews conducted under the supervision of dietitians. These methods, however, are often subjective, potentially inaccurate, and time-intensive. Although artificial intelligence (AI)-based solutions have been devised to automate the dietary assessment process, these prior AI methodologie… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 10 pages

  32. arXiv:2312.07025  [pdf, other

    cs.AI

    Noise Distribution Decomposition based Multi-Agent Distributional Reinforcement Learning

    Authors: Wei Geng, Baidi Xiao, Rongpeng Li, Ning Wei, Dong Wang, Zhifeng Zhao

    Abstract: Generally, Reinforcement Learning (RL) agent updates its policy by repetitively interacting with the environment, contingent on the received rewards to observed states and undertaken actions. However, the environmental disturbance, commonly leading to noisy observations (e.g., rewards and states), could significantly shape the performance of agent. Furthermore, the learning performance of Multi-Ag… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  33. arXiv:2312.00699  [pdf, other

    cs.CV cs.IR

    Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images

    Authors: Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

    Abstract: Table Structure Recognition (TSR) is a widely discussed task aiming at transforming unstructured table images into structured formats, such as HTML sequences, to make text-only models, such as ChatGPT, that can further process these tables. One type of solution is using detection models to detect table components, such as columns and rows, then applying a rule-based post-processing method to conve… ▽ More

    Submitted 10 January, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: under review

  34. arXiv:2312.00250  [pdf, other

    cs.CV

    Advancements and Trends in Ultra-High-Resolution Image Processing: An Overview

    Authors: Zhuoran Zheng, Boxue Xiao

    Abstract: Currently, to further improve visual enjoyment, Ultra-High-Definition (UHD) images are catching wide attention. Here, UHD images are usually referred to as having a resolution greater than or equal to $3840 \times 2160$. However, since the imaging equipment is subject to environmental noise or equipment jitter, UHD images are prone to contrast degradation, blurring, low dynamic range, etc. To addr… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  35. arXiv:2311.18252  [pdf, other

    cs.SE cs.AI cs.CY cs.LG

    Navigating Privacy and Copyright Challenges Across the Data Lifecycle of Generative AI

    Authors: Dawen Zhang, Boming Xia, Yue Liu, Xiwei Xu, Thong Hoang, Zhenchang Xing, Mark Staples, Qinghua Lu, Liming Zhu

    Abstract: The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential p… ▽ More

    Submitted 10 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted by 2024 IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN)

  36. arXiv:2311.16500  [pdf, other

    cs.CV

    LLMGA: Multimodal Large Language Model based Generation Assistant

    Authors: Bin Xia, Shiyin Wang, Yingfan Tao, Yitong Wang, Jiaya Jia

    Abstract: In this paper, we introduce a Multimodal Large Language Model-based Generation Assistant (LLMGA), leveraging the vast reservoir of knowledge and proficiency in reasoning, comprehension, and response inherent in Large Language Models (LLMs) to assist users in image generation and editing. Diverging from existing approaches where Multimodal Large Language Models (MLLMs) generate fixed-size embedding… ▽ More

    Submitted 11 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  37. arXiv:2311.13158  [pdf, other

    cs.SE

    Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability

    Authors: Boming Xia, Qinghua Lu, Liming Zhu, Sung Une Lee, Yue Liu, Zhenchang Xing

    Abstract: Artificial Intelligence (AI), particularly through the advent of large-scale generative AI (GenAI) models such as Large Language Models (LLMs), has become a transformative element in contemporary technology. While these models have unlocked new possibilities, they simultaneously present significant challenges, such as concerns over data privacy and the propensity to generate misleading or fabricat… ▽ More

    Submitted 17 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  38. arXiv:2311.09919  [pdf, other

    cs.CV cs.AI

    DSR-Diff: Depth Map Super-Resolution with Diffusion Model

    Authors: Yuan Shi, Bin Xia, Rui Zhu, Qingmin Liao, Wenming Yang

    Abstract: Color-guided depth map super-resolution (CDSR) improve the spatial resolution of a low-quality depth map with the corresponding high-quality color map, benefiting various applications such as 3D reconstruction, virtual reality, and augmented reality. While conventional CDSR methods typically rely on convolutional neural networks or transformers, diffusion models (DMs) have demonstrated notable eff… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  39. arXiv:2311.06957  [pdf, other

    cs.CY

    Simulating Public Administration Crisis: A Novel Generative Agent-Based Simulation System to Lower Technology Barriers in Social Science Research

    Authors: Bushi Xiao, Ziyuan Yin, Zixuan Shan

    Abstract: This article proposes a social simulation paradigm based on the GPT-3.5 large language model. It involves constructing Generative Agents that emulate human cognition, memory, and decision-making frameworks, along with establishing a virtual social system capable of stable operation and an insertion mechanism for standardized public events. The project focuses on simulating a township water polluti… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 12 Pages, 14 figures. This paper was submitted to IEEE TCSS on November 12, 2023

  40. arXiv:2311.06242  [pdf, other

    cs.CV

    Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

    Authors: Bin Xiao, Hai** Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

    Abstract: We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity of various spatial hierarchy and semantic granularity.… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  41. arXiv:2311.00962  [pdf, other

    cs.CV

    Detecting Generated Images by Real Images Only

    Authors: Xiuli Bi, Bo Liu, Fan Yang, Bin Xiao, Weisheng Li, Gao Huang, Pamela C. Cosman

    Abstract: As deep learning technology continues to evolve, the images yielded by generative models are becoming more and more realistic, triggering people to question the authenticity of images. Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training. This learning paradigm will result in… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  42. arXiv:2310.13367  [pdf, other

    cs.LG cs.AI cs.DC

    VFedMH: Vertical Federated Learning for Training Multiple Heterogeneous Models

    Authors: Shuo Wang, Keke Gai, **g Yu, Liehuang Zhu, Kim-Kwang Raymond Choo, Bin Xiao

    Abstract: Vertical federated learning has garnered significant attention as it allows clients to train machine learning models collaboratively without sharing local data, which protects the client's local private data. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this chall… ▽ More

    Submitted 8 February, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  43. arXiv:2310.09499  [pdf, other

    cs.CL cs.AI

    One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

    Authors: Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

    Abstract: Various Large Language Models~(LLMs) from the Generative Pretrained Transformer(GPT) family have achieved outstanding performances in a wide range of text generation tasks. However, the enormous model sizes have hindered their practical use in real-world applications due to high inference latency. Therefore, improving the efficiencies of LLMs through quantization, pruning, and other means has been… ▽ More

    Submitted 23 April, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted to ICASSP2024

  44. arXiv:2310.07915  [pdf, other

    cs.NI cs.CY cs.SI

    Tag Your Fish in the Broken Net: A Responsible Web Framework for Protecting Online Privacy and Copyright

    Authors: Dawen Zhang, Boming Xia, Yue Liu, Xiwei Xu, Thong Hoang, Zhenchang Xing, Mark Staples, Qinghua Lu, Liming Zhu

    Abstract: The World Wide Web, a ubiquitous source of information, serves as a primary resource for countless individuals, amassing a vast amount of data from global internet users. However, this online data, when scraped, indexed, and utilized for activities like web crawling, search engine indexing, and, notably, AI model training, often diverges from the original intent of its contributors. The ascent of… ▽ More

    Submitted 5 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: added some information on how to deal with CDN in the design section; minor fixes on writing

  45. arXiv:2310.05370  [pdf, other

    cs.CV

    SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

    Authors: Conghao Wong, Beihao Xia, Ziqian Zou, Yulong Wang, Xinge You

    Abstract: Analyzing and forecasting trajectories of agents like pedestrians and cars in complex scenes has become more and more significant in many intelligent systems and applications. The diversity and uncertainty in socially interactive behaviors among a rich variety of agents make this task more challenging than other deterministic computer vision tasks. Researchers have made a lot of efforts to quantif… ▽ More

    Submitted 26 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: CVPR 2024 accepted

  46. arXiv:2310.03518  [pdf, other

    cs.CL cs.AI cs.DS

    Towards Robust and Generalizable Training: An Empirical Study of Noisy Slot Filling for Input Perturbations

    Authors: Jiachi Liu, Liwen Wang, Guanting Dong, Xiaoshuai Song, Zechen Wang, Zhengyang Wang, Shanglin Lei, **zheng Zhao, Keqing He, Bo Xiao, Weiran Xu

    Abstract: In real dialogue scenarios, as there are unknown input noises in the utterances, existing supervised slot filling models often perform poorly in practical applications. Even though there are some studies on noise-robust models, these works are only evaluated on rule-based synthetic datasets, which is limiting, making it difficult to promote the research of noise-robust methods. In this paper, we i… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Working in progress

  47. arXiv:2310.00633  [pdf, other

    cs.LG cs.AI

    A Survey of Robustness and Safety of 2D and 3D Deep Learning Models Against Adversarial Attacks

    Authors: Yanjie Li, Bin Xie, Songtao Guo, Yuanyuan Yang, Bin Xiao

    Abstract: Benefiting from the rapid development of deep learning, 2D and 3D computer vision applications are deployed in many safe-critical systems, such as autopilot and identity authentication. However, deep learning models are not trustworthy enough because of their limited robustness against adversarial attacks. The physically realizable adversarial attacks further pose fatal threats to the application… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: Submitted to CSUR

  48. arXiv:2309.12660  [pdf, ps, other

    cs.RO eess.SY

    Disturbance Rejection Control for Autonomous Trolley Collection Robots with Prescribed Performance

    Authors: Rui-Dong Xi, Liang Lu, Xue Zhang, Xiao Xiao, Bingyi Xia, Jiankun Wang, Max Q. -H. Meng

    Abstract: Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped di… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  49. arXiv:2309.12314  [pdf, other

    cs.CV

    TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

    Authors: Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu

    Abstract: In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models. The method introduces two core techniques: affinity mimicking and weight inheritance. Affinity mimicking explores the interaction between modalities during distillation, enabling student models to mimic teachers' behavior of learning cross-modal feature alignment i… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted By ICCV 2023

  50. arXiv:2309.10305  [pdf, other

    cs.CL

    Baichuan 2: Open Large-scale Language Models

    Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2