Skip to main content

Showing 1–50 of 237 results for author: Shi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16039  [pdf, other

    cs.CV

    CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery

    Authors: Oluwatosin Alabi, Ko Ko Zayar Toe, Zijian Zhou, Charlie Budd, Nicholas Raison, Miao**g Shi, Tom Vercauteren

    Abstract: In laparoscopic and robotic surgery, precise tool instance segmentation is an essential technology for advanced computer-assisted interventions. Although publicly available procedures of routine surgeries exist, they often lack comprehensive annotations for tool instance segmentation. Additionally, the majority of standard datasets for tool segmentation are derived from porcine(pig) surgeries. To… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories

    Authors: Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, Maurice Meijer, Wim Dehaene, Marian Verhelst

    Abstract: Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Journal ref: 2023 24th International Symposium on Quality Electronic Design (ISQED)

  3. arXiv:2406.04595  [pdf, other

    cs.SD cs.CL eess.AS

    Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

    Authors: Xintong Wang, Mingqian Shi, Ye Wang

    Abstract: Mispronunciation Detection and Diagnosis (MDD) systems, leveraging Automatic Speech Recognition (ASR), face two main challenges in Mandarin Chinese: 1) The two-stage models create an information gap between the phoneme or tone classification stage and the MDD stage. 2) The scarcity of Mandarin MDD datasets limits model training. In this paper, we introduce a stateless RNN-T model for Mandarin MDD,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  4. arXiv:2406.00545  [pdf, ps, other

    cs.CV cs.AI

    Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation

    Authors: Xinyue Chen, Miao**g Shi

    Abstract: The performance of supervised semantic segmentation methods highly relies on the availability of large-scale training data. To alleviate this dependence, few-shot semantic segmentation (FSS) is introduced to leverage the model trained on base classes with sufficient data into the segmentation of novel classes with few data. FSS methods face the challenge of model generalization on novel classes du… ▽ More

    Submitted 9 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE International Conference on Multimedia and Expo (ICME) 2024 as an oral presentation

  5. arXiv:2405.17403  [pdf, other

    cs.LG cs.AI

    A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

    Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

    Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    ACM Class: I.2

  6. arXiv:2405.11690  [pdf, other

    cs.CV

    InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios

    Authors: Yinghao Huang, Leo Ho, Dafei Qin, Mingyi Shi, Taku Komura

    Abstract: We address the problem of accurate capture and expressive modelling of interactive behaviors happening between two persons in daily scenarios. Different from previous works which either only consider one person or focus on conversational gestures, we propose to simultaneously model the activities of two persons, and target objective-driven, dynamic, and coherent interactions which often span long… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally to this work

  7. arXiv:2405.02580  [pdf, other

    cs.SE cs.AI

    PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

    Authors: Ye Liu, Yue Xue, Daoyuan Wu, Yuqiang Sun, Yi Li, Miaolei Shi, Yang Liu

    Abstract: With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  8. arXiv:2405.01533  [pdf, other

    cs.CV

    OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

    Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

    Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  9. arXiv:2404.17528  [pdf, other

    cs.CV

    Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

    Authors: Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao

    Abstract: Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://gefucvpr24.github.io

  10. arXiv:2404.15602  [pdf, other

    cs.RO

    Decentralized Multi-Agent Trajectory Planning in Dynamic Environments with Spatiotemporal Occupancy Grid Maps

    Authors: Siyuan Wu, Gang Chen, Moji Shi, Javier Alonso-Mora

    Abstract: This paper proposes a decentralized trajectory planning framework for the collision avoidance problem of multiple micro aerial vehicles (MAVs) in environments with static and dynamic obstacles. The framework utilizes spatiotemporal occupancy grid maps (SOGM), which forecast the occupancy status of neighboring space in the near future, as the environment representation. Based on this representation… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 6 pages, 6 figures, accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA2024)

  11. arXiv:2404.15121  [pdf, other

    cs.GR cs.AI cs.CV

    Taming Diffusion Probabilistic Models for Character Control

    Authors: Rui Chen, Mingyi Shi, Shaoli Huang, ** Tan, Taku Komura, Xuelin Chen

    Abstract: We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's his… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGGRAPH 2024 (Conference Track). Project page and source codes: https://aiganimation.github.io/CAMDM/

  12. arXiv:2404.14848  [pdf, other

    cs.RO

    Evaluating Dynamic Environment Difficulty for Obstacle Avoidance Benchmarking

    Authors: Moji Shi, Gang Chen, Álvaro Serra Gómez, Siyuan Wu, Javier Alonso-Mora

    Abstract: Dynamic obstacle avoidance is a popular research topic for autonomous systems, such as micro aerial vehicles and service robots. Accurately evaluating the performance of dynamic obstacle avoidance methods necessitates the establishment of a metric to quantify the environment's difficulty, a crucial aspect that remains unexplored. In this paper, we propose four metrics to measure the difficulty of… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  13. arXiv:2404.07612  [pdf, ps, other

    cs.CY

    Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4

    Authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi

    Abstract: Generative AI based on foundation models provides a first glimpse into the world represented by machines trained on vast amounts of multimodal data ingested by these models during training. If we consider the resulting models as knowledge bases in their own right, this may open up new avenues for understanding places through the lens of machines. In this work, we adopt this thinking and select GPT… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Short paper accepted by AGILE 2024 conference (https://agile-gi.eu/conference-2024)

  14. SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers

    Authors: Weile Li, Muqing Shi, Zhonghua Hong

    Abstract: Traditional deep learning-based object detection networks often resize images during the data preprocessing stage to achieve a uniform size and scale in the feature map. Resizing is done to facilitate model propagation and fully connected classification. However, resizing inevitably leads to object deformation and loss of valuable information in the images. This drawback becomes particularly prono… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  15. arXiv:2404.02471  [pdf, other

    cs.IT

    Some bounds on the cardinality of the $b$-symbol weight spectrum of codes

    Authors: Hongwei Zhu, Shitao Li, Minjia Shi, Shu-Tao Xia, Patrick Sole

    Abstract: The size of the Hamming distance spectrum of a code has received great attention in recent research. The main objective of this paper is to extend these significant theories to the $b$-symbol distance spectrum. We examine this question for various types of codes, including unrestricted codes, additive codes, linear codes, and cyclic codes, successively. For the first three cases, we determine the… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  16. arXiv:2404.01727  [pdf, other

    cs.RO cs.CV

    Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

    Authors: Haoxiang Ma, Modi Shi, Boyang Gao, Di Huang

    Abstract: We focus on the generalization ability of the 6-DoF grasp detection method in this paper. While learning-based grasp detection methods can predict grasp poses for unseen objects using the grasp distribution learned from the training set, they often exhibit a significant performance drop when encountering objects with diverse shapes and structures. To enhance the grasp detection methods' generaliza… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  17. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  18. arXiv:2403.11511  [pdf, other

    cs.RO cs.CV

    Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation

    Authors: Haoxiang Ma, Ran Qin, Modi shi, Boyang Gao, Di Huang

    Abstract: This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at ICRA 2024

  19. arXiv:2403.06728  [pdf, other

    cs.CV

    Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning

    Authors: Zijian Zhou, Miao**g Shi, Meng Wei, Oluwatosin Alabi, Zijie Yue, Tom Vercauteren

    Abstract: Radiology report generation (RRG) has attracted significant attention due to its potential to reduce the workload of radiologists. Current RRG approaches are still unsatisfactory against clinical standards. This paper introduces a novel RRG method, \textbf{LM-RRG}, that integrates large models (LMs) with clinical quality reinforcement learning to generate accurate and comprehensive chest X-ray rad… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  20. arXiv:2403.02234  [pdf, other

    cs.CV

    3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

    Authors: Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

    Abstract: We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototy**. The sec… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/3DTopia/3DTopia

  21. arXiv:2402.11410  [pdf, ps, other

    cs.LG cs.DS stat.ML

    An Elementary Predictor Obtaining $2\sqrt{T}$ Distance to Calibration

    Authors: Eshwar Ram Arunachaleswaran, Natalie Collina, Aaron Roth, Mirah Shi

    Abstract: Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  22. arXiv:2402.11073  [pdf, other

    cs.CL cs.AI

    AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators

    Authors: **gwei Ni, Min**g Shi, Dominik Stammbach, Mrinmaya Sachan, Elliott Ash, Markus Leippold

    Abstract: With the rise of generative AI, automated fact-checking methods to combat misinformation are becoming more and more important. However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To addre… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL2024 Main Conference

  23. arXiv:2402.08753  [pdf, ps, other

    cs.GT cs.LG

    Forecasting for Swap Regret for All Downstream Agents

    Authors: Aaron Roth, Mirah Shi

    Abstract: We study the problem of making predictions so that downstream agents who best respond to them will be guaranteed diminishing swap regret, no matter what their utility functions are. It has been known since Foster and Vohra (1997) that agents who best-respond to calibrated forecasts have no swap regret. Unfortunately, the best known algorithms for guaranteeing calibrated forecasts in sequential adv… ▽ More

    Submitted 15 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  24. AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image

    Authors: Hamed Amini Amirkolaee, Miao**g Shi, Lianghua He, Mark Mulligan

    Abstract: The process of estimating and counting tree density using only a single aerial or satellite image is a difficult task in the fields of photogrammetry and remote sensing. However, it plays a crucial role in the management of forests. The huge variety of trees in varied topography severely hinders tree counting models to perform well. The purpose of this paper is to propose a framework that is learn… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted in ISPRS Journal of Photogrammetry and Remote Sensing

  25. arXiv:2401.16185  [pdf, other

    cs.CR cs.AI cs.SE

    LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning

    Authors: Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Wei Ma, Lyuye Zhang, Miaolei Shi, Yang Liu

    Abstract: Large language models (LLMs) have demonstrated significant potential for many downstream tasks, including those requiring human-level intelligence, such as vulnerability detection. However, recent attempts to use LLMs for vulnerability detection are still preliminary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability -- whether it originates from the mode… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: This is a technical report by Nanyang Technological University

  26. arXiv:2401.08256  [pdf, other

    cs.CV

    Multitask Learning in Minimally Invasive Surgical Vision: A Review

    Authors: Oluwatosin Alabi, Tom Vercauteren, Miao**g Shi

    Abstract: Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury. However, MIS poses additional complexity and burden on surgical teams. Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy. Recent advancements in machine learning and computer visio… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  27. arXiv:2312.17264  [pdf, other

    cs.CL cs.IR

    ESGReveal: An LLM-based approach for extracting structured data from ESG reports

    Authors: Yi Zou, Mengying Shi, Zhongjie Chen, Zhu Deng, ZongXiong Lei, Zihan Zeng, Shiming Yang, HongXiang Tong, Lei Xiao, Wenwen Zhou

    Abstract: ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports, catering to the critical need for reliable ESG information retrieval. This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques. The ESGReveal system includes an ESG metadata module for ta… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  28. arXiv:2312.09482  [pdf, ps, other

    cs.IT

    An open problem and a conjecture on binary linear complementary pairs of codes

    Authors: Shitao Li, Minjia Shi, San Ling

    Abstract: The existence of $q$-ary linear complementary pairs (LCPs) of codes with $q> 2$ has been completely characterized so far. This paper gives a characterization for the existence of binary LCPs of codes. As a result, we solve an open problem proposed by Carlet $et~al.$ (IEEE Trans. Inf. Theory 65(3): 1694-1704, 2019) and a conjecture proposed by Choi $et~al.$ (Cryptogr. Commun. 15(2): 469-486, 2023).

    Submitted 14 December, 2023; originally announced December 2023.

  29. arXiv:2312.01220  [pdf, other

    cs.CV

    Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

    Authors: Zhipeng Du, Miao**g Shi, Jiankang Deng

    Abstract: Detecting objects in low-light scenarios presents a persistent challenge, as detectors trained on well-lit data exhibit significant performance degradation on low-light data due to low visibility. Previous methods mitigate this issue by exploring image enhancement or object detection techniques with real low-light image datasets. However, the progress is impeded by the inherent difficulties about… ▽ More

    Submitted 27 March, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024

  30. arXiv:2312.01151  [pdf

    cs.CY cs.CL cs.SC

    Here Is Not There: Measuring Entailment-Based Trajectory Similarity for Location-Privacy Protection and Beyond

    Authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi, **meng Rao, Song Gao, Ling Cai, Anita Graser

    Abstract: While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic realit… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  31. arXiv:2311.16964  [pdf, other

    cond-mat.dis-nn cond-mat.mtrl-sci cs.LG

    Machine learning force-field models for metallic spin glass

    Authors: Menglin Shi, Sheng Zhang, Gia-Wei Chern

    Abstract: Metallic spin glass systems, such as dilute magnetic alloys, are characterized by randomly distributed local moments coupled to each other through a long-range electron-mediated effective interaction. We present a scalable machine learning (ML) framework for dynamical simulations of metallic spin glasses. A Behler-Parrinello type neural-network model, based on the principle of locality, is develop… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures

  32. arXiv:2311.16492  [pdf, other

    cs.CV

    VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation

    Authors: Zijian Zhou, Miao**g Shi, Holger Caesar

    Abstract: Panoptic Scene Graph Generation (PSG) aims at achieving a comprehensive image understanding by simultaneously segmenting objects and predicting relations among objects. However, the long-tail problem among relations leads to unsatisfactory results in real-world applications. Prior methods predominantly rely on vision information or utilize limited language information, such as object or relation n… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 22 pages, 9 figures

  33. arXiv:2311.02189  [pdf, other

    cs.CV

    FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling

    Authors: Yu Tian, Min Shi, Yan Luo, Ava Kouhana, Tobias Elze, Mengyu Wang

    Abstract: Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people's well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are a… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: ICLR 2024; Codes available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairSeg

  34. arXiv:2311.00354  [pdf, ps, other

    cs.CR

    Butson Hadamard matrices, bent sequences, and spherical codes

    Authors: Minjia Shi, Danni Lu, Andrés Armario, Ronan Egan, Ferruh Ozbudak, Patrick Solé

    Abstract: We explore a notion of bent sequence attached to the data consisting of an Hadamard matrix of order $n$ defined over the complex $q^{th}$ roots of unity, an eigenvalue of that matrix, and a Galois automorphism from the cyclotomic field of order $q.$ In particular we construct self-dual bent sequences for various $q\le 60$ and lengths $n\le 21.$ Computational construction methods comprise the resol… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  35. arXiv:2310.12511  [pdf, ps, other

    cs.IT

    The weight enumerator polynomials of the lifted codes of the projective Solomon-Stiffler codes

    Authors: Minjia Shi, Shitao Li, Tor Helleseth

    Abstract: Determining the weight distribution of a code is an old and fundamental topic in coding theory that has been thoroughly studied. In 1977, Helleseth, Kløve, and Mykkeltveit presented a weight enumerator polynomial of the lifted code over $\mathbb{F}_{q^\ell}$ of a $q$-ary linear code with significant combinatorial properties, which can determine the support weight distribution of this linear code.… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: This manuscript was first submitted on September 9, 2022

  36. arXiv:2310.09183  [pdf, other

    cs.LG cs.AI cs.DC

    PRIOR: Personalized Prior for Reactivating the Information Overlooked in Federated Learning

    Authors: Mingjia Shi, Yuhao Zhou, Kai Wang, Huaizheng Zhang, Shudong Huang, Qing Ye, Jiangcheng Lv

    Abstract: Classical federated learning (FL) enables training machine learning models without sharing data for privacy preservation, but heterogeneous data characteristic degrades the performance of the localized model. Personalized FL (PFL) addresses this by synthesizing personalized models from a global model via training on local data. Such a global model may overlook the specific information that the cli… ▽ More

    Submitted 10 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

    MSC Class: 68T07 ACM Class: I.2.11

  37. arXiv:2310.07355  [pdf, other

    cs.CV cs.LG

    IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training

    Authors: Che Liu, Sibo Cheng, Miao**g Shi, Anand Shah, Wenjia Bai, Rossella Arcucci

    Abstract: In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Under Review

  38. arXiv:2310.04863  [pdf, other

    cs.SD eess.AS

    SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

    Authors: Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

    Abstract: Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we intro… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  39. arXiv:2310.02492  [pdf, other

    cs.CV

    FairVision: Equitable Deep Learning for Eye Disease Screening via Fair Identity Scaling

    Authors: Yan Luo, Muhammad Osama Khan, Yu Tian, Min Shi, Zehao Dou, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Equity in AI for healthcare is crucial due to its direct impact on human well-being. Despite advancements in 2D medical imaging fairness, the fairness of 3D models remains underexplored, hindered by the small sizes of 3D fairness datasets. Since 3D imaging surpasses 2D imaging in SOTA clinical care, it is critical to understand the fairness of these 3D models. To address this research gap, we cond… ▽ More

    Submitted 12 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  40. arXiv:2309.17218  [pdf, other

    cs.CV

    When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo

    Authors: Tianqi Liu, Xinyi Ye, Weiyue Zhao, Zhiyu Pan, Min Shi, Zhiguo Cao

    Abstract: Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local f… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV2023

  41. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  42. arXiv:2309.12003  [pdf, ps, other

    cs.IT cs.CR

    A quaternary analogue of Tang-Ding codes

    Authors: Minjia Shi, Sihui Tao, Jon-Lark Kim, Patrick Sole

    Abstract: In a recent paper, Tang and Ding introduced a class of binary cyclic codes of rate close to one half with a designed lower bound on their minimum distance. The definition involves the base $2$ expansion of the integers in their defining set. In this paper we propose an analogue for quaternary codes. In addition, the performances of the subfield subcode and of the trace code (two binary cyclic code… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  43. arXiv:2309.06497  [pdf, other

    cs.LG cs.DC cs.MS math.OC

    A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

    Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhi**g Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat

    Abstract: Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the perform… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 38 pages, 8 figures, 5 tables

  44. arXiv:2309.05683  [pdf, other

    cs.LG cs.AI cs.RO

    EANet: Expert Attention Network for Online Trajectory Prediction

    Authors: Pengfei Yao, Tianlu Mao, Min Shi, **gkai Sun, Zhaoqi Wang

    Abstract: Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  45. arXiv:2309.01515  [pdf, other

    cs.DC cs.LG

    Federated cINN Clustering for Accurate Clustered Federated Learning

    Authors: Yuhao Zhou, Minjia Shi, Yuxin Tian, Yuanxi Li, Qing Ye, Jiancheng Lv

    Abstract: Federated Learning (FL) presents an innovative approach to privacy-preserving distributed machine learning and enables efficient crowd intelligence on a large scale. However, a significant challenge arises when coordinating FL with crowd intelligence which diverse client groups possess disparate objectives due to data heterogeneity or distinct tasks. To address this challenge, we propose the Feder… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  46. arXiv:2308.13415  [pdf, other

    eess.IV cs.CV cs.LG

    An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation

    Authors: Tiarna Lee, Esther Puyol-Antón, Bram Ruijsink, Keana Aitcheson, Miao**g Shi, Andrew P. King

    Abstract: In medical imaging, artificial intelligence (AI) is increasingly being used to automate routine tasks. However, these algorithms can exhibit and exacerbate biases which lead to disparate performances between protected groups. We investigate the impact of model choice on how imbalances in subject sex and race in training datasets affect AI-based cine cardiac magnetic resonance image segmentation. W… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  47. arXiv:2308.13411  [pdf, other

    cs.CV

    Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning

    Authors: Yan Luo, Min Shi, Yu Tian, Tobias Elze, Mengyu Wang

    Abstract: Glaucoma is the number one cause of irreversible blindness globally. A major challenge for accurate glaucoma detection and progression forecasting is the bottleneck of limited labeled patients with the state-of-the-art (SOTA) 3D retinal imaging data of optical coherence tomography (OCT). To address the data scarcity issue, this paper proposes two solutions. First, we develop a novel generalization… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  48. arXiv:2308.05232  [pdf, other

    cs.CV cs.LG

    SegMatch: A semi-supervised learning method for surgical instrument segmentation

    Authors: Meng Wei, Charlie Budd, Luis C. Garcia-Peraza-Herrera, Reuben Dorent, Miao**g Shi, Tom Vercauteren

    Abstract: Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: preprint under review, 12 pages, 7 figures

  49. arXiv:2308.02916  [pdf, other

    cs.LG cs.AI

    Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket

    Authors: Yuwen Wang, Shunyu Liu, Kaixuan Chen, Tongtian Zhu, Ji Qiao, Mengjie Shi, Yuanyu Wan, Mingli Song

    Abstract: Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned inf… ▽ More

    Submitted 10 August, 2023; v1 submitted 5 August, 2023; originally announced August 2023.

    Comments: 17 pages, 10 figures, Accept by ECAI2023

  50. arXiv:2308.01907  [pdf, other

    cs.CV

    The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

    Authors: Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao

    Abstract: We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new dataset (AS-1B) with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 mi… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Technical Report