Skip to main content

Showing 1–50 of 123 results for author: Mei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08478  [pdf, other

    cs.CV cs.CL

    What If We Recaption Billions of Web Images with LLaMA-3?

    Authors: Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

    Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: First five authors contributed equally

  2. arXiv:2406.07537  [pdf, other

    cs.CV

    Autoregressive Pretraining with Mamba in Vision

    Authors: Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

    Abstract: The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2406.05565  [pdf, other

    cs.CV

    Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

    Authors: Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

    Abstract: This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treati… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  4. arXiv:2406.04488  [pdf, other

    cs.LG cs.IR

    Negative Feedback for Music Personalization

    Authors: M. Jeffrey Mei, Oliver Bembom, Andreas F. Ehmann

    Abstract: Next-item recommender systems are often trained using only positive feedback with randomly-sampled negative feedback. We show the benefits of using real negative feedback both as inputs into the user sequence and also as negative targets for training a next-song recommender system for internet radio. In particular, using explicit negative samples during training helps reduce training time by ~60%… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 6 pages, 4 figures, accepted to ACM UMAP 2024

  5. arXiv:2405.21043  [pdf, other

    cs.LG cs.AI

    Target Networks and Over-parameterization Stabilize Off-policy Bootstrap** with Function Approximation

    Authors: Fengdi Che, Chenjun Xiao, **cheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

    Abstract: We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Journal ref: Proceedings of the 41 st International Conference on Machine Learning, 2024

  6. arXiv:2405.19320  [pdf, other

    cs.LG cs.AI stat.ML

    Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

    Authors: Shicong Cen, **cheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

    Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF,… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  8. arXiv:2405.14858  [pdf, other

    cs.CV

    Mamba-R: Vision Mamba ALSO Needs Registers

    Authors: Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

    Abstract: Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba -- they exist prevalently even with the tiny-sized model and activate extensively across background regions. To mitigate this issue, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. "Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies

    Authors: Brennan Schaffner, Arjun Nitin Bhagoji, Siyuan Cheng, Jacqueline Mei, Jay L. Shen, Grace Wang, Marshini Chetty, Nick Feamster, Genevieve Lakier, Chenhao Tan

    Abstract: Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  10. arXiv:2404.08364  [pdf, other

    cs.DC

    FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

    Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, **g Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

    Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras… ▽ More

    Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  11. arXiv:2404.06854  [pdf, other

    cs.CL

    Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

    Authors: **ghong Chen, Weizhe Lin, **gbiao Mei, Bill Byrne

    Abstract: The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5)… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 11 pages. NAACL 2024

    ACM Class: I.2

  12. arXiv:2403.19758  [pdf, other

    quant-ph cs.AI cs.CL

    Quantum Natural Language Processing

    Authors: Dominic Widdows, Willie Aboumrad, Dohun Kim, Sayonee Ray, Jonathan Mei

    Abstract: Language processing is at the heart of current developments in artificial intelligence, and quantum computers are becoming available at the same time. This has led to great interest in quantum natural language processing, and several early proposals and experiments. This paper surveys the state of this area, showing how NLP-related techniques have been used in quantum language processing. We exa… ▽ More

    Submitted 26 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  13. arXiv:2403.15735  [pdf, other

    eess.IV cs.CV

    3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge

    Authors: Siwei Yang, Xianhang Li, Jieru Mei, Jieneng Chen, Cihang Xie, Yuyin Zhou

    Abstract: Segmenting brain tumors is complex due to their diverse appearances and scales. Brain metastases, the most common type of brain tumor, are a frequent complication of cancer. Therefore, an effective segmentation model for brain metastases must adeptly capture local intricacies to delineate small tumor regions while also integrating global context to understand broader scan features. The TransUNet m… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  14. arXiv:2403.07197  [pdf, ps, other

    quant-ph cs.LO

    Simulating Quantum Circuits by Model Counting

    Authors: **gyi Mei, Marcello Bonsangue, Alfons Laarman

    Abstract: Quantum circuit compilation comprises many computationally hard reasoning tasks that nonetheless lie inside #$\mathbf{P}$ and its decision counterpart in $\mathbf{PP}$. The classical simulation of general quantum circuits is a core example. We show for the first time that a strong simulation of universal quantum circuits can be efficiently tackled through weighted model counting by providing a lin… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  15. arXiv:2402.18021  [pdf, other

    cs.RO

    Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints

    Authors: Fangguo Zhao, Jiahao Mei, ** Zhou, Jiming Chen, Shuo Li

    Abstract: The autonomous quadrotor's flying speed has kept increasing in the past 5 years, especially in the field of autonomous drone racing. However, the majority of the research mainly focuses on the aggressive flight of a single quadrotor. In this letter, we propose a novel method called Pairwise Model Predictive Control (PMPC) that can guide two quadrotors online to fly through the waypoints with minim… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  16. arXiv:2402.17235  [pdf, other

    cs.LG

    Stochastic Gradient Succeeds for Bandits

    Authors: **cheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

    Abstract: We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is an old algorithm known to be applicable to bandits. The new result is achieved by establishing two nove… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 39 pages; Correction for a previous version published at ICML 2023 conference

  17. arXiv:2402.11570  [pdf, other

    cs.RO

    Imitation Learning-Based Online Time-Optimal Control with Multiple-Waypoint Constraints for Quadrotors

    Authors: ** Zhou, Jiahao Mei, Fangguo Zhao, Jiming Chen, Shuo Li

    Abstract: Over the past decade, there has been a remarkable surge in utilizing quadrotors for various purposes due to their simple structure and aggressive maneuverability, such as search and rescue, delivery and autonomous drone racing, etc. One of the key challenges preventing quadrotors from being widely used in these scenarios is online waypoint-constrained time-optimal trajectory generation and control… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  18. arXiv:2402.09315  [pdf, other

    cs.CV

    Few-Shot Object Detection with Sparse Context Transformers

    Authors: Jie Mei, Mingyuan Jiu, Hichem Sahbi, Xiaoheng Jiang, Mingliang Xu

    Abstract: Few-shot detection is a major task in pattern recognition which seeks to localize objects using models trained with few labeled data. One of the mainstream few-shot methods is transfer learning which consists in pretraining a detection model in a source domain prior to its fine-tuning in a target domain. However, it is challenging for fine-tuned models to effectively identify new classes in the ta… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  19. arXiv:2402.08327  [pdf, other

    cs.CL

    PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

    Authors: Weizhe Lin, **gbiao Mei, **ghong Chen, Bill Byrne

    Abstract: Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in sha** answers to questions. We present an extensive training and evaluation framework, M2KR, for KB-VQA. M2KR contains a collection… ▽ More

    Submitted 5 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: ACL 2024; Project page: https://preflmr.github.io/

  20. arXiv:2402.02698  [pdf, other

    cs.LG cs.AI math.OC

    Beyond Expectations: Learning with Stochastic Dominance Made Practical

    Authors: Shicong Cen, **cheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

    Abstract: Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  21. arXiv:2401.11649  [pdf, other

    cs.CV

    M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

    Authors: Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, **gdong Wang, Yong Liu

    Abstract: Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Journal ref: AAAI2024

  22. arXiv:2401.04916  [pdf, other

    cs.NI

    Digital Retina for IoV Towards 6G: Architecture, Opportunities, and Challenges

    Authors: Kan Zheng, Jie Mei, Haojun Yang, Lu Hou, Siwei Ma

    Abstract: Vehicles are no longer isolated entities in traffic environments, thanks to the development of IoV powered by 5G networks and their evolution into 6G. However, it is not enough for vehicles in a highly dynamic and complex traffic environment to make reliable and efficient decisions. As a result, this paper proposes a cloud-edge-end computing system with multi-streams for IoV, referred to as Vehicu… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures

  23. arXiv:2401.02931  [pdf, other

    cs.CV

    SPFormer: Enhancing Vision Transformer with Superpixel Representation

    Authors: Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

    Abstract: In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and ap… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  24. arXiv:2312.13764  [pdf, other

    cs.CV cs.CL cs.LG

    A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

    Authors: Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

    Abstract: This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models. It is based on two core designs. First, we employ Large Language Models (LLMs) and carefully craft… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Preprint. Code is available at https://github.com/lambert-x/ProLab

  25. arXiv:2312.11555  [pdf, other

    cs.CV

    CR-SFP: Learning Consistent Representation for Soft Filter Pruning

    Authors: **gyang Xiang, Zhuangzhi Chen, Jianbiao Mei, Siqi Li, Jun Chen, Yong Liu

    Abstract: Soft filter pruning~(SFP) has emerged as an effective pruning technique for allowing pruned filters to update and the opportunity for them to regrow to the network. However, this pruning strategy applies training and pruning in an alternative manner, which inevitably causes inconsistent representations between the reconstructed network~(R-NN) at the training and the pruned network~(P-NN) at the in… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  26. arXiv:2312.11420  [pdf, other

    cs.CL cs.AI cs.CV

    Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

    Authors: Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, Cihang Xie

    Abstract: This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreov… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: The first two authors contributed equally

  27. arXiv:2312.05752  [pdf, other

    cs.CV

    Camera-based 3D Semantic Scene Completion with Sparse Guidance Network

    Authors: Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Xiangrui Zhao, Jongwon Ra, Laijian Li, Yong Liu

    Abstract: Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D mod… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  28. arXiv:2312.01597  [pdf, other

    cs.CV

    SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

    Authors: Feng Wang, Jieru Mei, Alan Yuille

    Abstract: Recent advances in contrastive language-image pretraining (CLIP) have demonstrated strong capabilities in zero-shot classification by aligning visual representations with target text embeddings in an image level. However, in dense prediction tasks, CLIP often struggles to localize visual features within an image and fails to give accurate pixel-level predictions, which prevents it from functioning… ▽ More

    Submitted 2 January, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

  29. arXiv:2311.18675  [pdf, other

    cs.CV

    Cascaded Interaction with Eroded Deep Supervision for Salient Object Detection

    Authors: Hewen Xiao, Jie Mei, Guangfu Ma, Weiren Wu

    Abstract: Deep convolutional neural networks have been widely applied in salient object detection and have achieved remarkable results in this field. However, existing models suffer from information distortion caused by interpolation during up-sampling and down-sampling. In response to this drawback, this article starts from two directions in the network: feature and label. On the one hand, a novel cascaded… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  30. arXiv:2311.18373  [pdf, other

    cs.CV

    A Survey on Deep Learning for Polyp Segmentation: Techniques, Challenges and Future Trends

    Authors: Jiaxin Mei, Tao Zhou, Kaiwen Huang, Yizhe Zhang, Yi Zhou, Ye Wu, Huazhu Fu

    Abstract: Early detection and assessment of polyps play a crucial role in the prevention and treatment of colorectal cancer (CRC). Polyp segmentation provides an effective solution to assist clinicians in accurately locating and segmenting polyp regions. In the past, people often relied on manually extracted lower-level features such as color, texture, and shape, which often had issues capturing global cont… ▽ More

    Submitted 5 February, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: 17 pages, 7 figures

  31. arXiv:2311.14762  [pdf, other

    cs.CV cs.AI

    The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

    Authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo , et al. (24 additional authors not shown)

    Abstract: The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

  32. arXiv:2311.08110  [pdf, other

    cs.CL cs.CV

    Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning

    Authors: **gbiao Mei, **ghong Chen, Weizhe Lin, Bill Byrne, Marcus Tomalin

    Abstract: Hateful memes have emerged as a significant concern on the Internet. Detecting hateful memes requires the system to jointly understand the visual and textual modalities. Our investigation reveals that the embedding space of existing CLIP-based systems lacks sensitivity to subtle differences in memes that are vital for correct hatefulness classification. We propose constructing a hatefulness-aware… ▽ More

    Submitted 4 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by ACL 2024 Main Conference. This is the camera-ready version. We added more experiments to address reviewers' comments

  33. arXiv:2311.03561  [pdf, other

    cs.CV

    Sea You Later: Metadata-Guided Long-Term Re-Identification for UAV-Based Multi-Object Tracking

    Authors: Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Chung-I Huang, Jenq-Neng Hwang

    Abstract: Re-identification (ReID) in multi-object tracking (MOT) for UAVs in maritime computer vision has been challenging for several reasons. More specifically, short-term re-identification (ReID) is difficult due to the nature of the characteristics of small targets and the sudden movement of the drone's gimbal. Long-term ReID suffers from the lack of useful appearance diversity. In response to these ch… ▽ More

    Submitted 22 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: 1st place method (WACV Workshop Paper) of the UAV-based Multi-Object Tracking with Reidentification Challenge in MaCVi WACV 2024

  34. arXiv:2311.01617  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Look-Ahead Selective Plasticity for Continual Learning of Visual Tasks

    Authors: Rouzbeh Meshkinnejad, Jie Mei, Daniel Lizotte, Yalda Mohsenzadeh

    Abstract: Contrastive representation learning has emerged as a promising technique for continual learning as it can learn representations that are robust to catastrophic forgetting and generalize well to unseen future tasks. Previous work in continual learning has addressed forgetting by using previous task data and trained models. Inspired by event models created and updated in the brain, we propose a new… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  35. arXiv:2310.07781  [pdf, other

    cs.CV

    3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

    Authors: Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew Lungren, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou

    Abstract: Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning. The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks. However, U-Net's convolution-based operations inherently limit its ability to model long-range dependencies effectively. To address these limitati… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Code and models are available at https://github.com/Beckschen/3D-TransUNet

  36. arXiv:2310.04412  [pdf, other

    cs.CV

    FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

    Authors: Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou

    Abstract: Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage. While recent studies posit that Vision Transformer (ViT) outperforms Convolutional Neural Networks (CNNs) in addressing data heterogeneity in FL, the specific architectural components that underpin this advantage… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 9 pages, 6 figures. Equal contribution by P. Xu and Z. Wang

  37. arXiv:2309.17133  [pdf, other

    cs.CL cs.CV

    Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

    Authors: Weizhe Lin, **ghong Chen, **gbiao Mei, Alexandru Coca, Bill Byrne

    Abstract: Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to utilize knowledge from external knowledge bases to answer visually-grounded questions. Retrieval-Augmented Visual Question Answering (RA-VQA), a strong framework to tackle KB-VQA, first retrieves related documents with Dense Passage Retrieval (DPR) and then uses them to answer questions. This paper proposes Fine-grained Lat… ▽ More

    Submitted 28 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: To appear at NeurIPS 2023. This is the camera-ready version. We fixed some numbers and added more experiments to address reviewers' comments

  38. arXiv:2309.16889  [pdf, other

    cs.CV

    Superpixel Transformers for Efficient Semantic Segmentation

    Authors: Alex Zihao Zhu, Jieru Mei, Siyuan Qiao, Hang Yan, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar

    Abstract: Semantic segmentation, which aims to classify every pixel in an image, is a key task in machine perception, with many applications across robotics and autonomous driving. Due to the high dimensionality of this task, most existing approaches use local operations, such as convolutions, to generate per-pixel features. However, these methods are typically unable to effectively leverage global context… ▽ More

    Submitted 2 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures, 4 tables. Presented at IROS 2023. Equal contribution by A. Zhu and J. Mei

  39. arXiv:2308.10123  [pdf, other

    cs.CV cs.AI

    3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation

    Authors: Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, Alan Yuille

    Abstract: Regression-based methods for 3D human pose estimation directly predict the 3D pose parameters from a 2D image using deep networks. While achieving state-of-the-art performance on standard benchmarks, their performance degrades under occlusion. In contrast, optimization-based methods fit a parametric body model to 2D features in an iterative manner. The localized reconstruction loss can potentially… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ICCV 2023, project page: https://3dnbf.github.io/

  40. arXiv:2308.07622  [pdf, other

    cs.MM

    EMID: An Emotional Aligned Dataset in Audio-Visual Modality

    Authors: Jialing Zou, Jiahao Mei, Guangze Ye, Tianyu Huai, Qiwei Shen, Daoguo Dong

    Abstract: In this paper, we propose Emotionally paired Music and Image Dataset (EMID), a novel dataset designed for the emotional matching of music and images, to facilitate auditory-visual cross-modal tasks such as generation and retrieval. Unlike existing approaches that primarily focus on semantic correlations or roughly divided emotional relations, EMID emphasizes the significance of emotional consisten… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  41. arXiv:2307.12591  [pdf, other

    cs.CV

    SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

    Authors: Yiqing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

    Abstract: Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023; project page: https://github.com/UCSC-VLAA/SwinMM/

  42. arXiv:2306.15349  [pdf, other

    cs.CV cs.AI

    SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion

    Authors: Jianbiao Mei, Yu Yang, Mengmeng Wang, Tianxin Huang, Xuemeng Yang, Yong Liu

    Abstract: Semantic scene completion (SSC) jointly predicts the semantics and geometry of the entire 3D scene, which plays an essential role in 3D scene understanding for autonomous driving systems. SSC has achieved rapid progress with the help of semantic context in segmentation. However, how to effectively exploit the relationships between the semantic context in semantic segmentation and geometric structu… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 8 pages, 5 figures, IROS2023

  43. arXiv:2306.15348  [pdf, other

    cs.CV cs.AI

    PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation

    Authors: Jianbiao Mei, Yu Yang, Mengmeng Wang, Xiaojun Hou, Laijian Li, Yong Liu

    Abstract: Reliable LiDAR panoptic segmentation (LPS), including both semantic and instance segmentation, is vital for many robotic applications, such as autonomous driving. This work proposes a new LPS framework named PANet to eliminate the dependency on the offset branch and improve the performance on large objects, which are always over-segmented by clustering algorithms. Firstly, we propose a non-learnin… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 8 pages, 3 figures, IROS2023

  44. arXiv:2305.19947  [pdf, other

    cs.CV cs.LG stat.ML

    A Geometric Perspective on Diffusion Models

    Authors: Defang Chen, Zhenyu Zhou, Jian-** Mei, Chunhua Shen, Chun Chen, Can Wang

    Abstract: Recent years have witnessed significant progress in develo** effective training and fast sampling techniques for diffusion models. A remarkable advancement is the use of stochastic differential equations (SDEs) and their marginal-preserving ordinary differential equations (ODEs) to describe data perturbation and generative modeling in a unified framework. In this paper, we carefully inspect the… ▽ More

    Submitted 30 September, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 38 pages

  45. arXiv:2305.19416  [pdf, other

    stat.ML cs.LG

    KrADagrad: Kronecker Approximation-Domination Gradient Preconditioned Stochastic Optimization

    Authors: Jonathan Mei, Alexander Moreno, Luke Walters

    Abstract: Second order stochastic optimizers allow parameter update step size and direction to adapt to loss curvature, but have traditionally required too much memory and compute for deep learning. Recently, Shampoo [Gupta et al., 2018] introduced a Kronecker factored preconditioner to reduce these requirements: it is used for large deep models [Anil et al., 2020] and in production [Anil et al., 2022]. How… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted in "Uncertainty in Artificial Intelligence" (2023)

  46. arXiv:2305.13185  [pdf, other

    cs.LG

    Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

    Authors: Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, **cheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

    Abstract: Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear fu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ICML 2023 accepted

  47. arXiv:2305.09028  [pdf, other

    stat.ML cs.LG

    SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels

    Authors: Alexander Moreno, Jonathan Mei, Luke Walters

    Abstract: Toeplitz Neural Networks (TNNs) (Qin et. al. 2023) are a recent sequence model with impressive results. They require O(n log n) computational complexity and O(n) relative positional encoder (RPE) multi-layer perceptron (MLP) and decay bias calls. We aim to reduce both. We first note that the RPE is a non-SPD (symmetric positive definite) kernel and the Toeplitz matrices are pseudo-Gram matrices. F… ▽ More

    Submitted 9 July, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Submitted to Neurips 2023

  48. arXiv:2302.14772  [pdf, other

    cs.CV cs.LG

    PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

    Authors: Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan, Chengru Song

    Abstract: Based on the weight-sharing mechanism, one-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models, largely reducing the search cost. However, several works have pointed out that the shared weights suffer from different gradient descent directions during training. And we further find that large gradient variance occurs during supernet training, which degra… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: To appear in CVPR 2023; we will update the camera-ready version soon

  49. arXiv:2302.13069  [pdf

    cs.CV cs.AI

    Medical visual question answering using joint self-supervised learning

    Authors: Yuan Zhou, **g Mei, Yiqin Yu, Tanveer Syeda-Mahmood

    Abstract: Visual Question Answering (VQA) becomes one of the most active research problems in the medical imaging domain. A well-known VQA challenge is the intrinsic diversity between the image and text modalities, and in the medical VQA task, there is another critical problem relying on the limited size of labelled image-question-answer data. In this study we propose an encoder-decoder framework that lever… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  50. arXiv:2302.08785  [pdf, other

    cs.RO cs.CV

    Few-shot 3D LiDAR Semantic Segmentation for Autonomous Driving

    Authors: Jilin Mei, Junbao Zhou, Yu Hu

    Abstract: In autonomous driving, the novel objects and lack of annotations challenge the traditional 3D LiDAR semantic segmentation based on deep learning. Few-shot learning is a feasible way to solve these issues. However, currently few-shot semantic segmentation methods focus on camera data, and most of them only predict the novel classes without considering the base classes. This setting cannot be direct… ▽ More

    Submitted 3 March, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted by ICRA 2023