Skip to main content

Showing 51–100 of 348 results for author: Ouyang, W

.
  1. arXiv:2402.00377  [pdf, other

    math.OC

    Kurdyka-Łojasiewicz exponent via Hadamard parametrization

    Authors: Wenqing Ouyang, Yuncheng Liu, Ting Kei Pong, Hao Wang

    Abstract: We consider a class of $\ell_1$-regularized optimization problems and the associated smooth "over-parameterized" optimization problems built upon the Hadamard parametrization, or equivalently, the Hadamard difference parametrization (HDP). We characterize the set of second-order stationary points of the HDP-based model and show that they correspond to some stationary points of the corresponding… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    MSC Class: 90C25; 90C26; 68Q25

  2. arXiv:2402.00059  [pdf, other

    cs.LG cs.AI physics.ao-ph

    FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting

    Authors: Tao Han, Song Guo, Fenghua Ling, Kang Chen, Junchao Gong, **gjia Luo, Junxia Gu, Kan Dai, Wanli Ouyang, Lei Bai

    Abstract: Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Non… ▽ More

    Submitted 28 January, 2024; originally announced February 2024.

    Comments: 19 pages

  3. arXiv:2401.15071  [pdf, other

    cs.CV

    From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

    Authors: Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, **g Shao, **gyi Deng, **lan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He , et al. (11 additional authors not shown)

    Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  4. arXiv:2401.11960  [pdf, other

    cs.CV eess.IV

    Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  5. arXiv:2401.05607  [pdf, other

    cond-mat.mtrl-sci

    Room-temperature Magnetic Thermal Switching by Suppressing Phonon-Magnon Scattering

    Authors: Fanghao Zhang, Lokanath Patra, Yubi Chen, Wenkai Ouyang, Paul Sarte, Shantal Adajian, Xiangying Zuo, Runqing Yang, Tengfei Luo, Bolin Liao

    Abstract: Thermal switching materials, whose thermal conductivity can be controlled externally, show great potential in contemporary thermal management. Manipulating thermal transport properties through magnetic fields has been accomplished in materials that exhibit a high magnetoresistance. However, it is generally understood that the lattice thermal conductivity attributed to phonons is not significantly… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  6. arXiv:2401.03407  [pdf, other

    cs.CV

    Bilateral Reference for High-Resolution Dichotomous Image Segmentation

    Authors: Peng Zheng, Dehong Gao, Deng-** Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

    Abstract: We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction proce… ▽ More

    Submitted 25 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Version 5, with updated DIS performance, accuracy-efficiency comparison, and 3rd-party applications

  7. arXiv:2312.16240  [pdf, other

    cs.CV cs.AI

    Merging Vision Transformers from Different Tasks and Domains

    Authors: Peng Ye, Chenyu Huang, Mingzhu Shen, Tao Chen, Yongqi Huang, Yuning Zhang, Wanli Ouyang

    Abstract: This work targets to merge various Vision Transformers (ViTs) trained on different tasks (i.e., datasets with different object categories) or domains (i.e., datasets with the same categories but different environments) into one unified model, yielding still good performance on each task or domain. Previous model merging works focus on either CNNs or NLP models, leaving the ViTs merging research un… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  8. arXiv:2312.15681  [pdf, other

    cs.CV cs.AI

    Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

    Authors: Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang

    Abstract: Fine-tuning pre-trained foundation models has gained significant popularity in various research fields. Existing methods for fine-tuning can be roughly divided into two categories, namely Parameter-Efficient Fine-Tuning and High-Performance Fine-Tuning. The former aims at improving efficiency, while the latter focuses on enhancing performance. Beyond these methods, we demonstrate that Partial Fine… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  9. arXiv:2312.14200  [pdf, other

    cs.CV

    Efficient Architecture Search via Bi-level Data Pruning

    Authors: Chongjun Tu, Peng Ye, Weihao Lin, Hancheng Ye, Chong Yu, Tao Chen, Baopu Li, Wanli Ouyang

    Abstract: Improving the efficiency of Neural Architecture Search (NAS) is a challenging but significant task that has received much attention. Previous works mainly adopted the Differentiable Architecture Search (DARTS) and improved its search strategies or modules to enhance search efficiency. Recently, some methods have started considering data reduction for speedup, but they are not tightly coupled with… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 11 pages

    MSC Class: 68T05(Primary)

  10. arXiv:2312.12462  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Towards an end-to-end artificial intelligence driven global weather forecasting system

    Authors: Kun Chen, Lei Bai, Fenghua Ling, Peng Ye, Tao Chen, **g-Jia Luo, Hao Chen, Yi Xiao, Kang Chen, Tao Han, Wanli Ouyang

    Abstract: The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting. However, existing AI-based weather forecasting models rely on analysis or reanalysis products from traditional numerical weather prediction (NWP) systems as initial conditions for making predictions. Initial s… ▽ More

    Submitted 8 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  11. arXiv:2312.12455  [pdf, other

    physics.ao-ph cs.AI cs.LG

    FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation

    Authors: Yi Xiao, Lei Bai, Wei Xue, Kang Chen, Tao Han, Wanli Ouyang

    Abstract: Weather forecasting is a crucial yet highly challenging task. With the maturity of Artificial Intelligence (AI), the emergence of data-driven weather forecasting models has opened up a new paradigm for the development of weather forecasting systems. Despite the significant successes that have been achieved (e.g., surpassing advanced traditional physical models for global medium-range forecasting),… ▽ More

    Submitted 19 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 15 pages, 8 figures

  12. arXiv:2312.11584  [pdf, other

    q-bio.QM cs.AI cs.LG

    ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing

    Authors: Zhi **, Sheng Xu, Xiang Zhang, Tianze Ling, Nanqing Dong, Wanli Ouyang, Zhiqiang Gao, Cheng Chang, Siqi Sun

    Abstract: De novo peptide sequencing from mass spectrometry (MS) data is a critical task in proteomics research. Traditional de novo algorithms have encountered a bottleneck in accuracy due to the inherent complexity of proteomics data. While deep learning-based methods have shown progress, they reduce the problem to a translation task, potentially overlooking critical nuances between spectra and peptides.… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by AAAI 2024

  13. arXiv:2312.10429  [pdf, other

    physics.geo-ph cs.AI

    ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks

    Authors: Pumeng Lyu, Tao Tang, Fenghua Ling, **g-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Recent studies have shown that deep learning (DL) models can skillfully predict the El Niño-Southern Oscillation (ENSO) forecasts over 1.5 years ahead. However, concerns regarding the reliability of predictions made by DL methods persist, including potential overfitting issues and lack of interpretability. Here, we propose ResoNet, a DL model that combines convolutional neural network (CNN) and Tr… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 32 pages, 5 main figures and 12 supplementary figures

  14. arXiv:2312.10035  [pdf, other

    cs.CV

    Point Transformer V3: Simpler, Faster, Stronger

    Authors: Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao

    Abstract: This paper is not motivated to seek innovation within the attention mechanism. Instead, it focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing, leveraging the power of scale. Drawing inspiration from recent advances in 3D large-scale representation learning, we recognize that model performance is more influenced by scale than b… ▽ More

    Submitted 25 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: CVPR 2024, code available at Pointcept (https://github.com/Pointcept/PointTransformerV3)

  15. arXiv:2312.08754  [pdf, other

    cs.CV

    UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

    Authors: Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang

    Abstract: Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects. Despite these developments, a prevalent limitation arises from the use of RGB data in diffusion or reconstruction models, which often results in models with inherent lighting and shadows effects that detract fro… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  16. arXiv:2312.07685  [pdf, other

    cs.LG cs.AI

    A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

    Authors: Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

    Abstract: Offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of offline pretrained policy using only a few online samples. Built on offline RL algorithms, most O2O methods focus on the balance between RL objective and pessimism, or the utilization of offline and online samples. In this paper, from a novel perspective, we systematically study the challenges that remain in O2O R… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024

  17. arXiv:2312.01697  [pdf, other

    cs.CV cs.AI

    Hulk: A Universal Knowledge Translator for Human-Centric Tasks

    Authors: Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

    Abstract: Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did no… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 24 pages, 5 figures

  18. arXiv:2311.15732  [pdf, other

    cs.CV

    GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

    Authors: Wenhao Wu, Huan** Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, **gdong Wang

    Abstract: This paper does not present a novel method. Instead, it delves into an essential, yet must-know baseline in light of the latest advancements in Generative Artificial Intelligence (GenAI): the utilization of GPT-4 for visual understanding. Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its… ▽ More

    Submitted 11 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Technical report. Retest GPT-4V and update results

  19. arXiv:2311.14960  [pdf, other

    cs.CV

    Point Cloud Pre-training with Diffusion Models

    Authors: Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, Yongshun Gong

    Abstract: Pre-training a model and then fine-tuning it on downstream tasks has demonstrated significant success in the 2D image and NLP domains. However, due to the unordered and non-uniform density characteristics of point clouds, it is non-trivial to explore the prior knowledge of point clouds and pre-train a point cloud backbone. In this paper, we propose a novel pre-training method called Point cloud Di… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  20. arXiv:2311.07276  [pdf, other

    math.OC

    Variational Properties of Decomposable Functions Part II: Strong Second-Order Theory

    Authors: Wenqing Ouyang, Andre Milzarek

    Abstract: Local superlinear convergence of the semismooth Newton method usually requires the uniform invertibility of the generalized Jacobian matrix, e.g. BD-regularity or CD-regularity. For several types of nonlinear programming and composite-type optimization problems -- for which the generalized Jacobian of the stationary equation can be calculated explicitly -- this is characterized by the strong secon… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 28 pages; preliminary draft

  21. arXiv:2311.07267  [pdf, ps, other

    math.OC

    Variational Properties of Decomposable Functions. Part I: Strict Epi-Calculus and Applications

    Authors: Wenqing Ouyang, Andre Milzarek

    Abstract: We provide systematic studies of the variational properties of decomposable functions which are compositions of an outer support function and an inner smooth map** under certain constraint qualifications. We put a particular focus on the strict twice epi-differentiability and the associated strict second subderivative of such functions. Calculus rules for the (strict) second subderivative and tw… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 28 pages; preliminary draft

  22. arXiv:2311.07049  [pdf

    eess.SY eess.SP

    Clifford Algebra-Based Iterated Extended Kalman Filter with Application to Low-Cost INS/GNSS Navigation

    Authors: Wei Ouyang, Yutian Wang, Yuanxin Wu

    Abstract: The traditional GNSS-aided inertial navigation system (INS) usually exploits the extended Kalman filter (EKF) for state estimation, and the initial attitude accuracy is key to the filtering performance. To spare the reliance on the initial attitude, this work generalizes the previously proposed trident quaternion within the framework of Clifford algebra to represent the extended pose, IMU biases a… ▽ More

    Submitted 14 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

  23. arXiv:2311.05650  [pdf, other

    math.OC cs.LG

    Learning to Configure Separators in Branch-and-Cut

    Authors: Sirui Li, Wenbin Ouyang, Max B. Paulus, Cathy Wu

    Abstract: Cutting planes are crucial in solving mixed integer linear programs (MILP) as they facilitate bound improvements on the optimal solution. Modern MILP solvers rely on a variety of separators to generate a diverse set of cutting planes by invoking the separators frequently during the solving process. This work identifies that MILP solvers can be drastically accelerated by appropriately selecting sep… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  24. arXiv:2311.02684  [pdf, other

    cs.CV cs.CL

    Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

    Authors: Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, **g Shao

    Abstract: Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-shot generalization capabilities to multimodal learning through instruction tuning. As more modalities and downstream tasks are introduced, negative conflicts and interference may have a worse impact on performance. While this phenomenon has been overlooked in previous work, we propose a novel and extensible framew… ▽ More

    Submitted 13 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: 22 pages, 12 figures. Accepted in ICLR 2024

  25. arXiv:2311.00282  [pdf, other

    cs.CV

    TLMCM Network for Medical Image Hierarchical Multi-Label Classification

    Authors: Meng Wu, Siyan Luo, Qiyu Wu, Wenbin Ouyang

    Abstract: Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of paramount importance in modern healthcare, presenting two significant challenges: data imbalance and \textit{hierarchy constraint}. Existing solutions involve complex model architecture design or domain-specific preprocessing, demanding considerable expertise or effort in implementation. To address these limitations, this paper p… ▽ More

    Submitted 11 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  26. BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational Bioimaging

    Authors: Wanlu Lei, Caterina Fuster-Barceló, Gabriel Reder, Arrate Muñoz-Barrutia, Wei Ouyang

    Abstract: We present the BioImage$.$IO Chatbot, an AI assistant powered by Large Language Models and supported by a community-driven knowledge base and toolset. This chatbot is designed to cater to a wide range of user needs through a flexible extension mechanism that spans from information retrieval to AI-enhanced analysis and microscopy control. Embracing open-source principles, the chatbot is designed to… ▽ More

    Submitted 16 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  27. arXiv:2310.15624  [pdf, other

    cs.CV cs.LG

    GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

    Authors: Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

    Abstract: Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and re… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 18 pages, 9 figures

  28. arXiv:2310.15568  [pdf, other

    cs.CV

    I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

    Authors: Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li

    Abstract: Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of lim… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: submitted to IJCV. arXiv admin note: substantial text overlap with arXiv:2208.12448

  29. arXiv:2310.11846  [pdf, other

    cs.AI

    MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning

    Authors: Jie Liu, Yinmin Zhang, Chuming Li, Chao Yang, Yaodong Yang, Yu Liu, Wanli Ouyang

    Abstract: Building a single generalist agent with strong zero-shot capability has recently sparked significant advancements. However, extending this capability to multi-agent decision making scenarios presents challenges. Most current works struggle with zero-shot transfer, due to two challenges particular to the multi-agent settings: (a) a mismatch between centralized training and decentralized execution;… ▽ More

    Submitted 22 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 17 pages

  30. arXiv:2310.08586  [pdf, other

    cs.CV

    PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

    Authors: Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

    Abstract: In contrast to numerous NLP and 2D vision foundational models, learning a 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and diversity of downstream tasks. In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway t… ▽ More

    Submitted 27 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.00157

  31. arXiv:2310.08370  [pdf, other

    cs.CV

    UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

    Authors: Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang

    Abstract: In the context of autonomous driving, the significance of effective feature learning is widely acknowledged. While conventional 3D self-supervised pre-training methods have shown widespread success, most methods follow the ideas originally designed for 2D images. In this paper, we present UniPAD, a novel self-supervised learning paradigm applying 3D volumetric differentiable rendering. UniPAD impl… ▽ More

    Submitted 7 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: CVPR2024

  32. arXiv:2310.07644  [pdf, other

    cs.AI cs.CL cs.LG

    Rethinking the BERT-like Pretraining for DNA Sequences

    Authors: Chaoqi Liang, Weiqiang Bai, Lifeng Qiao, Yuchen Ren, Jianle Sun, Peng Ye, Hongliang Yan, Xinzhu Ma, Wangmeng Zuo, Wanli Ouyang

    Abstract: With the success of large-scale pretraining in NLP, there is an increasing trend of applying it to the domain of life sciences. In particular, pretraining methods based on DNA sequences have garnered growing attention due to their potential to capture generic information about genes. However, existing pretraining methods for DNA sequences largely rely on direct adoptions of BERT pretraining from N… ▽ More

    Submitted 11 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  33. arXiv:2310.05447  [pdf, other

    cs.CV

    Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection

    Authors: Xinzhu Ma, Yongtao Wang, Yinmin Zhang, Zhiyi Xia, Yuan Meng, Zhihui Wang, Haojie Li, Wanli Ouyang

    Abstract: In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tric… ▽ More

    Submitted 11 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICCV23, code will be released soon

  34. arXiv:2310.03708  [pdf, other

    cs.LG cs.AI

    Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

    Authors: Zhanhui Zhou, Jie Liu, Chao Yang, **g Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao

    Abstract: A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches therefore opt for customization by collecting multi-dimensional feedback and creating distinct reward models (RMs) for each dimension (e.g., helpfulness, harmlessness, or honesty). Different LMs… ▽ More

    Submitted 15 December, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Multi-Objective Direct Preference Optimization for LLMs

  35. arXiv:2310.01994  [pdf, other

    cs.CV

    Understanding Masked Autoencoders From a Local Contrastive Perspective

    Authors: Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Lu** Zhou, Wanli Ouyang

    Abstract: Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we fir… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  36. arXiv:2310.00746  [pdf, other

    cs.CL cs.AI

    RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

    Authors: Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Stephen W. Huang, Jie Fu, Junran Peng

    Abstract: The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-p… ▽ More

    Submitted 18 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: 30 pages, repo at https://github.com/InteractiveNLP-Team/RoleLLM-public

  37. arXiv:2309.15718  [pdf, other

    physics.chem-ph physics.comp-ph

    Geometry-enhanced Pre-training on Interatomic Potentials

    Authors: Taoyong Cui, Chenyu Tang, Mao Su, Shufei Zhang, Yuqiang Li, Lei Bai, Yuhan Dong, Xingao Gong, Wanli Ouyang

    Abstract: Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are limited by insufficient labeled training data, which require expensive ab initio calculations to obtain the labels, especially for complex molecular systems. To addre… ▽ More

    Submitted 12 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Journal ref: Published in Nature Machine Intelligence 2024

  38. arXiv:2309.14616  [pdf, other

    cs.CV

    NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

    Authors: Jiawei Yao, Chuming Li, Keqiang Sun, Yingjie Cai, Hao Li, Wanli Ouyang, Hongsheng Li

    Abstract: Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs. In this paper, we identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambigui… ▽ More

    Submitted 11 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at ICCV 2023. Project page: https://jiawei-yao0812.github.io/NDC-Scene/

  39. arXiv:2309.13326  [pdf

    q-bio.GN

    SARS-CoV-2 Wastewater Genomic Surveillance: Approaches, Challenges, and Opportunities

    Authors: Viorel Munteanu, Michael Saldana, Dumitru Ciorba, Viorel Bostan, Justin Maine Su, Nadiia Kasianchuk, Nitesh Kumar Sharma, Sergey Knyazev, Victor Gordeev, Eva Aßmann, Andrei Lobiuc, Mihai Covasa, Keith A. Crandall, Wenhao O. Ouyang, Nicholas C. Wu, Christopher Mason, Braden T Tierney, Alexander G Lucaci, Alex Zelikovsky, Fatemeh Mohebbi, Pavel Skums, Cynthia Gibas, Jessica Schlueter, Piotr Rzymski, Helena Solo-Gabriele , et al. (3 additional authors not shown)

    Abstract: During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly iden… ▽ More

    Submitted 30 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: V Munteanu and M Saldana contributed equally to this work. M Hölzer, A Smith and S Mangul jointly supervised this work. For correspondence: [email protected]

  40. The stability of unevenly spaced planetary systems

    Authors: Sheng Yang, Liangyu Wu, Zekai Zheng, Masahiro Ogihara, Kangrou Guo, Wenzhan Ouyang, Yaxing He

    Abstract: Studying the orbital stability of multi-planet systems is essential to understand planet formation, estimate the stable time of an observed planetary system, and advance population synthesis models. Although previous studies have primarily focused on ideal systems characterized by uniform orbital separations, in reality a diverse range of orbital separations exists among planets within the same sy… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 6 pages, 3 figures, accepted for publication in Icarus

    Journal ref: Icarus, Volume 406, 2023, 115757

  41. arXiv:2308.16487  [pdf, other

    cond-mat.mtrl-sci

    Extraordinary Thermoelectric Properties of Topological Surface States in Quantum-Confined Cd3As2 Thin Films

    Authors: Wenkai Ouyang, Alexander C. Lygo, Yubi Chen, Huiyuan Zheng, Dung Vu, Brandi L. Wooten, Xichen Liang, Wang Yao, Joseph P. Heremans, Susanne Stemmer, Bolin Liao

    Abstract: Topological insulators and semimetals have been shown to possess intriguing thermoelectric properties promising for energy harvesting and cooling applications. However, thermoelectric transport associated with the Fermi arc topological surface states on topological Dirac semimetals remains less explored. In this work, we systematically examine thermoelectric transport in a series of topological Di… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  42. arXiv:2308.16376  [pdf, other

    eess.IV cs.CV cs.DC

    Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training

    Authors: Lei Bai, Dongang Wang, Michael Barnett, Mariano Cabezas, Weidong Cai, Fernando Calamante, Kain Kyle, Dongnan Liu, Linda Ly, Aria Nguyen, Chun-Chien Shieh, Ryan Sullivan, Hengrui Wang, Geng Zhan, Wanli Ouyang, Chenyu Wang

    Abstract: Accurately measuring the evolution of Multiple Sclerosis (MS) with magnetic resonance imaging (MRI) critically informs understanding of disease progression and helps to direct therapeutic strategy. Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. Obtaining sufficient data from a single clin… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 11 pages, 4 figures, journal submission

  43. arXiv:2308.15070  [pdf, other

    cs.CV

    DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

    Authors: Xinqi Lin, **gwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Wanli Ouyang, Yu Qiao, Chao Dong

    Abstract: We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. Each stage is developed independently but they work seamlessly in a cascaded… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  44. arXiv:2308.13772  [pdf, other

    cs.CV

    Boosting Residual Networks with Group Knowledge

    Authors: Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

    Abstract: Recent research understands the residual networks from a new perspective of the implicit ensemble model. From this view, previous methods such as stochastic depth and stimulative training have further improved the performance of the residual network by sampling and training of its subnets. However, they both use the same supervision for all subnets of different capacities and neglect the valuable… ▽ More

    Submitted 14 December, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: Accepted by AAAI2024

  45. arXiv:2308.10468  [pdf, other

    cs.CV

    STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning

    Authors: Tao Han, Lei Bai, Lingbo Liu, Wanli Ouyang

    Abstract: Scale variation is a deep-rooted problem in object counting, which has not been effectively addressed by existing scale-aware algorithms. An important factor is that they typically involve cooperative learning across multi-resolutions, which could be suboptimal for learning the most discriminative features from each scale. In this paper, we propose a novel method termed STEERER (\textbf{S}elec\tex… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023, 9 pages

  46. arXiv:2308.07092  [pdf, other

    cs.CV

    Masked Motion Predictors are Strong 3D Action Representation Learners

    Authors: Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, Houqiang Li

    Abstract: In 3D human action recognition, limited supervised data makes it challenging to fully tap into the modeling potential of powerful networks such as transformers. As a result, researchers have been actively investigating effective self-supervised pre-training strategies. In this work, we show that instead of following the prevalent pretext task to perform masked self-component reconstruction in huma… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: To appear in ICCV 2023

  47. arXiv:2308.06093  [pdf, other

    cs.CV cs.LG

    Experts Weights Averaging: A New General Training Scheme for Vision Transformers

    Authors: Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang

    Abstract: Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs), which achieves performance improvement without increasing inference cost. As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing infere… ▽ More

    Submitted 25 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: 12 pages, 2 figures

  48. arXiv:2308.03005  [pdf, other

    cs.CV

    MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

    Authors: Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu

    Abstract: This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Journal extension for MCTformer

  49. arXiv:2308.02818  [pdf

    physics.app-ph cond-mat.soft

    Shape-dependent friction scaling laws in twisted layered material interfaces

    Authors: Weidong Yan, Xiang Gao, Wengen Ouyang, Ze Liu, Oded Hod, Michael Urbakh

    Abstract: Static friction induced by moiré superstructure in twisted incommensurate finite layered material interfaces reveals unique double periodicity and lack of scaling with contact size. The underlying mechanism involves compensation of incomplete moiré tiles at the rim of rigid polygonal graphene flakes sliding atop fixed graphene or h-BN substrates. The scaling of friction (or lack thereof) with cont… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  50. arXiv:2307.12933  [pdf, other

    cs.AI

    Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

    Authors: Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

    Abstract: Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency. To save the computation cost of conducting planning online, recent practices tend to distill optimized action sequences into an RL policy during the training phase. Although the distillation can incorporate both the foresight of planning and the ex… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.