Skip to main content

Showing 1–50 of 754 results for author: Zhao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  2. arXiv:2406.18583  [pdf, other

    cs.CV cs.LG

    Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

    Authors: Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

    Abstract: Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  3. arXiv:2406.15677  [pdf, other

    cs.RO

    Open-vocabulary Pick and Place via Patch-level Semantic Maps

    Authors: Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex

    Abstract: Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.14885  [pdf, other

    cs.HC

    Ink and Algorithm: Exploring Temporal Dynamics in Human-AI Collaborative Writing

    Authors: Kaixun Yang, Yixin Cheng, Linxuan Zhao, Mladen Raković, Zachari Swiecki, Dragan Gašević, Guanliang Chen

    Abstract: The advent of Generative Artificial Intelligence (GAI) has revolutionized the field of writing, marking a shift towards human-AI collaborative writing in education. However, the dynamics of human-AI interaction in the collaborative writing process are not well understood, and thus it remains largely unknown how human learning can be effectively supported with such cutting-edge GAI technologies. In… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.14862  [pdf, other

    cs.LG cs.CL cs.CV

    LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

    Authors: Mengdan Zhu, Raasikh Kanjiani, Jiahui Lu, Andrew Choi, Qirui Ye, Liang Zhao

    Abstract: Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framewo… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.13215  [pdf, other

    cs.CV cs.AI

    Neural Residual Diffusion Models for Deep Scalable Vision Generation

    Authors: Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

    Abstract: The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, w… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.13105  [pdf, other

    cs.CV

    A transformer boosted UNet for smoke segmentation in complex backgrounds in multispectral LandSat imagery

    Authors: Jixue Liu, Jiuyong Li, Stefan Peters, Liang Zhao

    Abstract: Many studies have been done to detect smokes from satellite imagery. However, these prior methods are not still effective in detecting various smokes in complex backgrounds. Smokes present challenges in detection due to variations in density, color, lighting, and backgrounds such as clouds, haze, and/or mist, as well as the contextual nature of thin smoke. This paper addresses these challenges by… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.12479  [pdf, other

    cs.CV cs.AI

    RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding

    Authors: Linrui Xu, Ling Zhao, Wang Guo, Qiujun Li, Kewang Long, Kaiqi Zou, Yuhan Wang, Haifeng Li

    Abstract: The remote sensing image intelligence understanding model is undergoing a new profound paradigm shift which has been promoted by multi-modal large language model (MLLM), i.e. from the paradigm learning a domain model (LaDM) shifts to paradigm learning a pre-trained general foundation model followed by an adaptive domain model (LaGD). Under the new LaGD paradigm, the old datasets, which have led to… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures, 4 tables

  10. arXiv:2406.12182  [pdf, other

    cs.CL cs.AI

    Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models

    Authors: Lulu Zhao, Weihao Zeng, Xiaofeng Shi, Hua Zhou, Donglin Hao, Yonghua Lin

    Abstract: Recently, both closed-source LLMs and open-source communities have made significant strides, outperforming humans in various general domains. However, their performance in specific professional fields such as medicine, especially within the open-source community, remains suboptimal due to the complexity of medical knowledge. We propose Aquila-Med, a bilingual medical LLM based on Aquila, addressin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.10948  [pdf

    cs.LG cs.AI

    Incorporating uncertainty quantification into travel mode choice modeling: a Bayesian neural network (BNN) approach and an uncertainty-guided active survey framework

    Authors: Shuwen Zheng, Zhou Fang, Liang Zhao

    Abstract: Existing deep learning approaches for travel mode choice modeling fail to inform modelers about their prediction uncertainty. Even when facing scenarios that are out of the distribution of training data, which implies high prediction uncertainty, these approaches still provide deterministic answers, potentially leading to misguidance. To address this limitation, this study introduces the concept o… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  12. arXiv:2406.10910  [pdf, ps, other

    cs.IT eess.SP

    Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications

    Authors: Yannan Chen, Yi Feng, Xiaoyang Li, Licheng Zhao, Kaiming Shen

    Abstract: This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  13. arXiv:2406.10310  [pdf, other

    cs.CL cs.AI

    TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

    Authors: Zhuofeng Li, Zixing Gou, Xiangnan Zhang, Zhongyuan Liu, Sirui Li, Yuntong Hu, Chen Ling, Zheng Zhang, Liang Zhao

    Abstract: Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  14. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  15. arXiv:2406.06086  [pdf, other

    cs.SD eess.AS

    RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

    Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  16. arXiv:2406.04858  [pdf, other

    cs.RO eess.SY

    Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors

    Authors: Bingheng Wang, Kuankuan Sima, Rui Huang, Lin Zhao

    Abstract: Designing motion control and planning algorithms for multilift systems remains challenging due to the complexities of dynamics, collision avoidance, actuator limits, and scalability. Existing methods that use optimization and distributed techniques effectively address these constraints and scalability issues. However, they often require substantial manual tuning, leading to suboptimal performance.… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  17. arXiv:2406.04647  [pdf, other

    cs.CV

    UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

    Authors: Yuchao Wang, Peirui Cheng, Pengju Tian, Ziyang Yuan, Liang** Zhao, **g Tian, Wensheng Wang, Zhirui Wang, Xian Sun

    Abstract: With the advancement of collaborative perception, the role of aerial-ground collaborative perception, a crucial component, is becoming increasingly important. The demand for collaborative perception across different perspectives to construct more comprehensive perceptual information is growing. However, challenges arise due to the disparities in the field of view between cross-domain agents and th… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2406.04336  [pdf, other

    cs.LG cs.DM cs.DS math.CO math.SP

    On the Expressive Power of Spectral Invariant Graph Neural Networks

    Authors: Bohang Zhang, Lingxiao Zhao, Haggai Maron

    Abstract: Incorporating spectral information to enhance Graph Neural Networks (GNNs) has shown promising results but raises a fundamental challenge due to the inherent ambiguity of eigenvectors. Various architectures have been proposed to address this ambiguity, referred to as spectral invariant architectures. Notable examples include GNNs and Graph Transformers that use spectral distances, spectral project… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 31 pages; 3 figures; to appear in ICML 2024

  19. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  20. arXiv:2405.18860  [pdf, other

    cs.RO

    Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

    Authors: Tianle Zhang, Dongjiang Li, Yihang Li, Zecui Zeng, Lin Zhao, Lei Sun, Yue Chen, Xuelong Wei, Yibing Zhan, Lusong Li, Xiaodong He

    Abstract: The advancements in embodied AI are increasingly enabling robots to tackle complex real-world tasks, such as household manipulation. However, the deployment of robots in these environments remains constrained by the lack of comprehensive bimanual-mobile robot manipulation data that can be learned. Existing datasets predominantly focus on single-arm manipulation tasks, while the few dual-arm datase… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  21. arXiv:2405.18765  [pdf, other

    cs.LG

    Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

    Authors: Wei-Bang Jiang, Li-Ming Zhao, Bao-Liang Lu

    Abstract: The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: The Twelfth International Conference on Learning Representations

    Journal ref: The Twelfth International Conference on Learning Representations, 2024

  22. arXiv:2405.18729  [pdf, other

    cs.LG cs.AI

    Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

    Authors: Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He

    Abstract: Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach opt… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  23. arXiv:2405.17916  [pdf, other

    cs.CV

    Boosting General Trimap-free Matting in the Real-World Image

    Authors: Leo Shan Wenzhang Zhou Grace Zhao

    Abstract: Image matting aims to obtain an alpha matte that separates foreground objects from the background accurately. Recently, trimap-free matting has been well studied because it requires only the original image without any extra input. Such methods usually extract a rough foreground by itself to take place trimap as further guidance. However, the definition of 'foreground' lacks a unified standard and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages, 8 figures

  24. arXiv:2405.16800  [pdf, other

    cs.LG cs.AI

    TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

    Authors: Zheng Zhang, Yuntong Hu, Bo Pan, Chen Ling, Liang Zhao

    Abstract: Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions, enabling detailed representation of data and their relationships across a broad spectrum of real-world scenarios. Despite the potential for deeper insights, existing TAG representation learning primarily relies on supervised methods, necessitating extensive labeled data and limiting applicability across dive… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  25. arXiv:2405.16765  [pdf, ps, other

    cs.LG eess.SP

    Study of Robust Direction Finding Based on Joint Sparse Representation

    Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

    Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  26. arXiv:2405.16606  [pdf, other

    cs.SI

    Link Prediction on Textual Edge Graphs

    Authors: Chen Ling, Zhuofeng Li, Yuntong Hu, Zheng Zhang, Zhongyuan Liu, Shuang Zheng, Liang Zhao

    Abstract: Textual-edge Graphs (TEGs), characterized by rich text annotations on edges, are increasingly significant in network science due to their ability to capture rich contextual information among entities. Existing works have proposed various edge-aware graph neural networks (GNNs) or let language models directly make predictions. However, they often fall short of fully capturing the contextualized sem… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  27. arXiv:2405.16506  [pdf, other

    cs.LG

    GRAG: Graph Retrieval-Augmented Generation

    Authors: Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao

    Abstract: While Retrieval-Augmented Generation (RAG) enhances the accuracy and relevance of responses by generative language models, it falls short in graph-based contexts where both textual and topological information are important. Naive RAG approaches inherently neglect the structural intricacies of textual graphs, resulting in a critical gap in the generation process. To address this challenge, we intro… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 14 pages, 4 figures

  28. arXiv:2405.16409  [pdf, other

    cs.AI cs.LG

    Network Interdiction Goes Neural

    Authors: Lei Zhang, Zhiqian Chen, Chang-Tien Lu, Liang Zhao

    Abstract: Network interdiction problems are combinatorial optimization problems involving two players: one aims to solve an optimization problem on a network, while the other seeks to modify the network to thwart the first player's objectives. Such problems typically emerge in an attacker-defender context, encompassing areas such as military operations, disease spread analysis, and communication network man… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2405.16219  [pdf, other

    cs.LG stat.ML

    Deep Causal Generative Models with Property Control

    Authors: Qilong Zhao, Shiyu Wang, Guangji Bai, Bo Pan, Zhaohui Qin, Liang Zhao

    Abstract: Generating data with properties of interest by external users while following the right causation among its intrinsic factors is important yet has not been well addressed jointly. This is due to the long-lasting challenge of jointly identifying key latent variables, their causal relations, and their correlation with properties of interest, as well as how to leverage their discoveries toward causal… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

  30. arXiv:2405.16181  [pdf, other

    cs.CV

    Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling

    Authors: Chunlin Qiu, Yiheng Duan, Lingchen Zhao, Qian Wang

    Abstract: Transfer-based attacks craft adversarial examples utilizing a white-box surrogate model to compromise various black-box target models, posing significant threats to many real-world applications. However, existing transfer attacks suffer from either weak transferability or expensive computation. To bridge the gap, we propose a novel sample-based attack, named neighborhood conditional sampling (NCS)… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Under review

  31. arXiv:2405.16075  [pdf, other

    cs.LG cs.AI

    Continuous Temporal Domain Generalization

    Authors: Zekun Cai, Guangji Bai, Renhe Jiang, Xuan Song, Liang Zhao

    Abstract: Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  32. arXiv:2405.14295  [pdf, other

    cs.CV

    Focus Anywhere for Fine-grained Multi-page Document Understanding

    Authors: Chenglong Liu, Haoran Wei, **yue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents. We introduce a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  33. arXiv:2405.12476  [pdf, other

    cs.CV

    Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding

    Authors: Weizhen Liu, Jiayu Tan, Guangyu Lan, Ao Li, Dongye Li, Le Zhao, Xiaohui Yuan, Nanqing Dong

    Abstract: Accurate phenotypic analysis in aquaculture breeding necessitates the quantification of subtle morphological phenotypes. Existing datasets suffer from limitations such as small scale, limited species coverage, and inadequate annotation of keypoints for measuring refined and complex morphological phenotypes of fish body parts. To address this gap, we introduce FishPhenoKey, a comprehensive dataset… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024, Code: https://github.com/WeizhenLiuBioinform/Fish-Phenotype-Detect

  34. arXiv:2405.11493  [pdf, other

    cs.CV cs.IT eess.SP

    Point Cloud Compression with Implicit Neural Representations: A Unified Framework

    Authors: Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Dusit Niyato

    Abstract: Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 6 Pages, 6 Figures, submitted to IEEE ICCC

  35. Occupancy-SLAM: Simultaneously Optimizing Robot Poses and Continuous Occupancy Map

    Authors: Liang Zhao, Yingyu Wang, Shoudong Huang

    Abstract: In this paper, we propose an optimization based SLAM approach to simultaneously optimize the robot trajectory and the occupancy map using 2D laser scans (and odometry) information. The key novelty is that the robot poses and the occupancy map are optimized together, which is significantly different from existing occupancy map** strategies where the robot poses need to be obtained first before th… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: This paper has been accpeted by Robotics: Science and Systems 2022

    Journal ref: Robotics: Science and Systems 2022

  36. arXiv:2405.09133  [pdf, other

    cs.LG

    Overcoming Domain Drift in Online Continual Learning

    Authors: Fan Lyu, Daofeng Liu, Linglan Zhao, Zhang Zhang, Fanhua Shang, Fuyuan Hu, Wei Feng, Liang Wang

    Abstract: Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential lea… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  37. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  38. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  39. arXiv:2405.03724  [pdf, other

    cs.LG cs.SI

    GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets

    Authors: Junxiang Wang, Liang Zhao

    Abstract: We present GraphSL, a novel library designed for investigating the graph source localization problem. Our library facilitates the exploration of various graph diffusion models for simulating information spread and enables the evaluation of cutting-edge source localization approaches on established benchmark datasets. The source code of GraphSL is made available at \url{https://github.com/xianggebe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  40. arXiv:2405.02196  [pdf, other

    cs.AR

    GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

    Authors: Chenyang Ai, Lechuan Zhao, Zhijie Huang, Cangyuan Li, Xinan Wang, Ying Wang

    Abstract: Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and CGRA, support tensor operators with low energy and area efficiency. This paper conducts an in-depth exploration of general accelerator for tensor processing.… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  41. arXiv:2404.19403  [pdf, other

    cs.RO cs.AI

    Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making

    Authors: Lei Zhuang, **gdong Zhao, Yuntao Li, Zichun Xu, Liangliang Zhao, Hong Liu

    Abstract: Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  42. arXiv:2404.18922  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    DPO Meets PPO: Reinforced Token Optimization for RLHF

    Authors: Han Zhong, Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian, Liwei Wang

    Abstract: In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  43. arXiv:2404.18411  [pdf, other

    cs.RO cs.CV

    Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles

    Authors: Mingi Jeong, Arihant Chadda, Ziang Ren, Luyang Zhao, Haowen Liu, Monika Roznere, Aiwei Zhang, Yitao Jiang, Sabriel Achong, Samuel Lensgraf, Alberto Quattrini Li

    Abstract: This paper introduces the first publicly accessible multi-modal perception dataset for autonomous maritime navigation, focusing on in-water obstacles within the aquatic environment to enhance situational awareness for Autonomous Surface Vehicles (ASVs). This dataset, consisting of diverse objects encountered under varying environmental conditions, aims to bridge the research gap in marine robotics… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

  44. arXiv:2404.14885  [pdf, other

    cs.CV

    Domain adaptive pose estimation via multi-level alignment

    Authors: Yugan Chen, Lin Zhao, Yalong Xu, Honglei Zu, Xiaoqi An, Guangyu Li

    Abstract: Domain adaptive pose estimation aims to enable deep models trained on source domain (synthesized) datasets produce similar results on the target domain (real-world) datasets. The existing methods have made significant progress by conducting image-level or feature-level alignment. However, only aligning at a single level is not sufficient to fully bridge the domain gap and achieve excellent domain… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: accepted to icme2024

  45. arXiv:2404.14668  [pdf, other

    cs.SI

    Source Localization for Cross Network Information Diffusion

    Authors: Chen Ling, Tanmoy Chowdhury, Jie Ji, Sirui Li, Andreas Züfle, Liang Zhao

    Abstract: Source localization aims to locate information diffusion sources only given the diffusion observation, which has attracted extensive attention in the past few years. Existing methods are mostly tailored for single networks and may not be generalized to handle more complex networks like cross-networks. Cross-network is defined as two interconnected networks, where one network's functionality depend… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Code and data are available at: https://github.com/tanmoysr/CNSL/

  46. arXiv:2404.13584  [pdf, other

    cs.CV cs.LG

    Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

    Authors: Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo

    Abstract: Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the q… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by CVIU

  47. arXiv:2404.11474  [pdf, other

    cs.CV

    Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt

    Authors: Zhanjie Zhang, Quanwei Zhang, Huaizhong Lin, Wei Xing, Juncheng Mo, Shuaicheng Huang, **heng Xie, Guangyuan Li, Junsheng Luan, Lei Zhao, Dalong Zhang, Lixia Chen

    Abstract: Artistic style transfer aims to transfer the learned artistic style onto an arbitrary content image, generating artistic stylized images. Existing generative adversarial network-based methods fail to generate highly realistic stylized images and always introduce obvious artifacts and disharmonious patterns. Recently, large-scale pre-trained diffusion models opened up a new way for generating highl… ▽ More

    Submitted 29 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024

  48. arXiv:2404.11027  [pdf, other

    cs.AI

    Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

    Authors: Guangran Cheng, Chuheng Zhang, Wenzhe Cai, Li Zhao, Changyin Sun, Jiang Bian

    Abstract: While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly. We find that the main reason is that LLMs are not grounded in the physical world. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policie… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  49. arXiv:2404.10501  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Self-Supervised Visual Preference Alignment

    Authors: Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang

    Abstract: This paper makes the first attempt towards unsupervised preference alignment in Vision-Language Models (VLMs). We generate chosen and rejected responses with regard to the original and augmented image pairs, and conduct preference alignment with direct preference optimization. It is based on a core idea: properly designed augmentation to the image input will induce VLM to generate false but hard n… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  50. arXiv:2404.09987  [pdf, other

    cs.CV

    OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

    Authors: **yue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorpo… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures and 6 tables