Skip to main content

Showing 1–50 of 71 results for author: Shi, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14696  [pdf, other

    eess.SY cs.AI

    Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach

    Authors: Kexin Tian, Haotian Shi, Yang Zhou, Sixu Li

    Abstract: Given the complexity and nonlinearity inherent in traffic dynamics within vehicular platoons, there exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability. Currently, there are two predominant approaches: the physics model-based approach and the Artificial Intelligence (AI)--based approach. Knowing the facts that the physical-based… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations

    Authors: Chao He, Hongmei Shi, Ruixin Li, Jianbo Li, ZuJun Yu

    Abstract: The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentia… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Journal ref: Advanced Engineering Informatics, 2024

  3. arXiv:2406.08750  [pdf

    eess.SY

    The expressway network design problem for multiple urban subregions based on the macroscopic fundamental diagram

    Authors: Yunran Di, Weihua Zhang, Haotian Shi, Heng Ding, **biao Huo, Bin Ran

    Abstract: As urbanization advances, cities are expanding, leading to a more decentralized urban structure and longer average commuting durations. The construction of an urban expressway system emerges as a critical strategy to tackle this challenge. However, the traditional link-level network design method faces modeling and solution challenges when dealing with the large-scale expressway network design pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2405.17028  [pdf, other

    cs.SD eess.AS

    RSET: Remap**-based Sorting Method for Emotion Transfer Speech Synthesis

    Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, **g Xiao

    Abstract: Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in develo** emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

  5. arXiv:2405.06125  [pdf

    eess.SY

    Cooperative Route Guidance and Flow Control for Mixed Road Networks Comprising Expressway and Arterial Network

    Authors: Yunran Di, Haotian Shi, Weihua Zhang, Heng Ding, Xiaoyan Zheng, Bin Ran

    Abstract: Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission mo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

  7. arXiv:2404.12794  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, **tao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

    Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Ob… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/Terminal-K/MambaMOS

  8. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  9. arXiv:2403.20058  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks

    Authors: Luoyu Wang, Yitian Tao, Qing Yang, Yan Liang, Siwei Liu, Hongcheng Shi, Dinggang Shen, Han Zhang

    Abstract: Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of P… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 11 pages

  10. arXiv:2403.14773  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM eess.IV

    StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

    Authors: Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

    Abstract: Text-to-video diffusion models enable the generation of high-quality videos that follow text instructions, making it easy to create diverse and individual content. However, existing approaches mostly focus on high-quality short video generation (typically 16 or 24 frames), ending up with hard-cuts when naively extended to the case of long video synthesis. To overcome these limitations, we introduc… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: https://github.com/Picsart-AI-Research/StreamingT2V

  11. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  12. arXiv:2403.08504  [pdf, other

    cs.CV cs.RO eess.IV

    OccFiner: Offboard Occupancy Refinement with Hybrid Propagation

    Authors: Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Zhijian Zhao, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang

    Abstract: Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the acc… ▽ More

    Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  13. arXiv:2403.08164  [pdf, other

    cs.SD cs.LG eess.AS

    EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

    Authors: Ziqi Liang, Haoxiang Shi, Jiawei Wang, Keda Lu

    Abstract: Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techn… ▽ More

    Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by the 27th IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2024). arXiv admin note: substantial text overlap with arXiv:2211.01948

  14. arXiv:2403.05912  [pdf, other

    eess.IV cs.CV

    Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

    Authors: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

    Abstract: Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  15. arXiv:2402.18275  [pdf, other

    cs.SD cs.CL eess.AS

    Exploration of Adapter for Noise Robust Automatic Speech Recognition

    Authors: Hao Shi, Tatsuya Kawahara

    Abstract: Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial. Integrating adapters into neural networks has emerged as a potent technique for transfer learning. This study thoroughly investigates adapter-based ASR adaptation in noisy environments. We conducted experiments using the CHiME--4 dataset. The results show that inserting the adapter in the shallow layer y… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  16. arXiv:2401.16700  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

    Authors: Jianbin Jiao, Xina Cheng, Weijie Chen, Xiaoting Yin, Hao Shi, Kailun Yang

    Abstract: 3D human pose estimation captures the human joint points in three-dimensional space while kee** the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are pr… ▽ More

    Submitted 25 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to IJCNN 2024. The source code will be available at https://github.com/WUJINHUAN/3D-human-pose

  17. arXiv:2312.10716  [pdf, other

    eess.IV cs.AR

    A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network

    Authors: Siyu Zhang, Wendong Mao, Huihong Shi, Zhongfeng Wang

    Abstract: Video compression is widely used in digital television, surveillance systems, and virtual reality. Real-time video decoding is crucial in practical scenarios. Recently, neural video compression (NVC) combines traditional coding with deep learning, achieving impressive compression efficiency. Nevertheless, the NVC models involve high computational costs and complex memory access patterns, challengi… ▽ More

    Submitted 18 December, 2023; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by DATE 2024

  18. arXiv:2312.03376  [pdf, other

    eess.SY

    Beacon-enabled TDMA Ultraviolet Communication Network System Design and Realization

    Authors: Yuchen Pan, Fei Long, ** Li, Haotian Shi, Jiazhao Shi, Hanlin Xiao, Chen Gong, Zhengyuan Xu

    Abstract: Nonline of sight (NLOS) ultraviolet (UV) scattering communication can serve as a good candidate for outdoor optical wireless communication (OWC) in the cases of non-perfect transmitter-receiver alignment and radio silence. We design and demonstrate a NLOS UV scattering communication network system in this paper, where a beacon-enabled time division multiple access (TDMA) scheme is adopted. In our… ▽ More

    Submitted 15 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  19. arXiv:2311.04591  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Rethinking Event-based Human Pose Estimation with 3D Event Representations

    Authors: Xiaoting Yin, Hao Shi, Jiaan Chen, Ze Wang, Yaozu Ye, Huajian Ni, Kailun Yang, Kaiwei Wang

    Abstract: Human pose estimation is a fundamental and appealing task in computer vision. Traditional frame-based cameras and videos are commonly applied, yet, they become less reliable in scenarios under high dynamic range or heavy motion blur. In contrast, event cameras offer a robust solution for navigating these challenging contexts. Predominant methodologies incorporate event cameras into learning framew… ▽ More

    Submitted 1 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Extended version of arXiv:2206.04511. The code and dataset are available at https://github.com/MasterHow/EventPointPose

  20. arXiv:2310.03085  [pdf, other

    cs.LG cs.CV cs.IT eess.IV

    Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising

    Authors: Hui Shi, Yann Traonmilin, J-F Aujol

    Abstract: We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural n… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  21. arXiv:2310.02815  [pdf, other

    cs.CV cs.RO eess.IV

    CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

    Authors: Hao Shi, Chengshan Pang, Jiaming Zhang, Kailun Yang, Yuhao Wu, Huajian Ni, Yining Lin, Rainer Stiefelhagen, Kaiwei Wang

    Abstract: Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses pre… ▽ More

    Submitted 17 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: The source code will be made publicly available at https://github.com/MasterHow/CoBEV

  22. arXiv:2310.00240  [pdf, other

    cs.CV eess.IV

    Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

    Authors: Siyu Jiao, Yunchao Wei, Yaowei Wang, Yao Zhao, Humphrey Shi

    Abstract: Recently, pre-trained vision-language models have been increasingly used to tackle the challenging zero-shot segmentation task. Typical solutions follow the paradigm of first generating mask proposals and then adopting CLIP to classify them. To maintain the CLIP's zero-shot transferability, previous practices favour to freeze CLIP during training. However, in the paper, we reveal that CLIP is inse… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  23. arXiv:2309.16128  [pdf, other

    cs.CV eess.IV

    Joint Correcting and Refinement for Balanced Low-Light Image Enhancement

    Authors: Nana Yu, Hong Shi, Yahong Han

    Abstract: Low-light image enhancement tasks demand an appropriate balance among brightness, color, and illumination. While existing methods often focus on one aspect of the image without considering how to pay attention to this balance, which will cause problems of color distortion and overexposure etc. This seriously affects both human visual perception and the performance of high-level visual models. In t… ▽ More

    Submitted 19 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

  24. arXiv:2308.08179  [pdf

    eess.SY

    A Robust Integrated Multi-Strategy Bus Control System via Deep Reinforcement Learning

    Authors: Qinghui Nie, Jishun Ou, Haiyang Zhang, Jiawei Lu, Shen Li, Haotian Shi

    Abstract: An efficient urban bus control system has the potential to significantly reduce travel delays and streamline the allocation of transportation resources, thereby offering enhanced and user-friendly transit services to passengers. However, bus operation efficiency can be impacted by bus bunching. This problem is notably exacerbated when the bus system operates along a signalized corridor with unpred… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  25. arXiv:2308.07104  [pdf, other

    cs.CV cs.RO eess.IV

    FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving

    Authors: Zhonghua Yi, Hao Shi, Kailun Yang, Qi Jiang, Yaozu Ye, Ze Wang, Huajian Ni, Kaiwei Wang

    Abstract: Key-point-based scene understanding is fundamental for autonomous driving applications. At the same time, optical flow plays an important role in many vision tasks. However, due to the implicit bias of equal attention on all points, classic data-driven optical flow estimation methods yield less satisfactory performance on key points, limiting their implementations in key-point-critical safety-rele… ▽ More

    Submitted 22 September, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The source code of FocusFlow will be available at https://github.com/ZhonghuaYi/FocusFlow_official

  26. arXiv:2308.05975  [pdf

    eess.IV

    A Self-supervised SAR Image Despeckling Strategy Based on Parameter-sharing Convolutional Neural Networks

    Authors: Liang Chen, Yifei Yin, Hao Shi, Qingqing Sheng, Wei Li

    Abstract: Speckle noise is generated due to the SAR imaging mechanism, which brings difficulties in SAR image interpretation. Hence, despeckling is a helpful step in SAR pre-processing. Nowadays, deep learning has been proved to be a progressive method for SAR image despeckling. Most deep learning methods for despeckling are based on supervised learning, which needs original SAR images and speckle-free SAR… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  27. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  28. arXiv:2307.05033  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Anytime Optical Flow Estimation with Event Cameras

    Authors: Yaozu Ye, Hao Shi, Kailun Yang, Ze Wang, Xiaoting Yin, Yining Lin, Mao Liu, Yaonan Wang, Kaiwei Wang

    Abstract: Optical flow estimation is a fundamental task in the field of autonomous driving. Event cameras are capable of responding to log-brightness changes in microseconds. Its characteristic of producing responses only to the changing region is particularly suitable for optical flow estimation. In contrast to the super low-latency response speed of event cameras, existing datasets collected via event cam… ▽ More

    Submitted 19 October, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Code will be available at https://github.com/Yaozhuwa/EVA-Flow

  29. arXiv:2307.04513  [pdf, other

    eess.IV cs.CV

    CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

    Authors: Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, Winston Chong, Jianfei Cai

    Abstract: New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep m… ▽ More

    Submitted 14 September, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: Accepted by MICCAI 2023 (Early Acceptance)

  30. arXiv:2306.12992  [pdf, other

    cs.CV eess.IV physics.optics

    Minimalist and High-Quality Panoramic Imaging with PSF-aware Transformers

    Authors: Qi Jiang, Shaohua Gao, Yao Gao, Kailun Yang, Zhonghua Yi, Hao Shi, Lei Sun, Kaiwei Wang

    Abstract: High-quality panoramic images with a Field of View (FoV) of 360-degree are essential for contemporary panoramic computer vision tasks. However, conventional imaging systems come with sophisticated lens designs and heavy optical components. This disqualifies their usage in many mobile and wearable applications where thin and portable, minimalist imaging systems are desired. In this paper, we propos… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: The dataset and code will be available at https://github.com/zju-jiangqi/PCIE-PART

  31. arXiv:2306.06663  [pdf, other

    cs.CV cs.RO eess.IV

    LF-PGVIO: A Visual-Inertial-Odometry Framework for Large Field-of-View Cameras using Points and Geodesic Segments

    Authors: Ze Wang, Kailun Yang, Hao Shi, Yufan Zhang, Zhijie Xu, Fei Gao, Kaiwei Wang

    Abstract: In this paper, we propose LF-PGVIO, a Visual-Inertial-Odometry (VIO) framework for large Field-of-View (FoV) cameras with a negative plane using points and geodesic segments. The purpose of our research is to unleash the potential of point-line odometry with large-FoV omnidirectional cameras, even for cameras with negative-plane FoV. To achieve this, we propose an Omnidirectional Curve Segment Det… ▽ More

    Submitted 11 March, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The source code will be made publicly available at https://github.com/flysoaryun/LF-PGVIO

  32. arXiv:2305.10734  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

    Authors: Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

    Abstract: Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us… ▽ More

    Submitted 28 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  33. arXiv:2305.08585  [pdf, other

    cs.CV eess.IV

    Toward Moiré-Free and Detail-Preserving Demosaicking

    Authors: Xuanchen Li, Yan Niu, Bo Zhao, Haoyuan Shi, Zitong An

    Abstract: 3D convolutions are commonly employed by demosaicking neural models, in the same way as solving other image restoration problems. Counter-intuitively, we show that 3D convolutions implicitly impede the RGB color spectra from exchanging complementary information, resulting in spectral-inconsistent inference of the local spatial high frequency components. As a consequence, shallow 3D convolution net… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 11 pages, 5 figures, 5 tables

  34. arXiv:2305.06701  [pdf, ps, other

    cs.SD eess.AS

    Extending Audio Masked Autoencoders Toward Audio Restoration

    Authors: Zhi Zhong, Hao Shi, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced benefits, there has been rising interest in how to improve the performance of pretrained models for restoration tasks, e.g., s… ▽ More

    Submitted 17 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: WASPAA 2023.Copyright 2023 IEEE.Personal use of this material is permitted.Permission from IEEE must be obtained for all other uses,in any current or future media,including reprinting/republishing this material for advertising or promotional purposes, creating new collective works,for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works

  35. arXiv:2305.04205  [pdf, other

    cs.CV cs.RO eess.IV

    Bi-Mapper: Holistic BEV Semantic Map** for Autonomous Driving

    Authors: Siyu Li, Kailun Yang, Hao Shi, Jiaming Zhang, Jiacheng Lin, Zhifeng Teng, Zhiyong Li

    Abstract: A semantic map of the road scene, covering fundamental road elements, is an essential ingredient in autonomous driving systems. It provides important perception foundations for positioning and planning when rendered in the Bird's-Eye-View (BEV). Currently, the prior knowledge of hypothetical depth can guide the learning of translating front perspective views into BEV directly with the help of cali… ▽ More

    Submitted 6 September, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE Robotics and Automation Letters (RA-L). The source code is publicly available at https://github.com/lynn-yu/Bi-Mapper

  36. arXiv:2303.14593  [pdf, other

    cs.SD eess.AS

    Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

    Authors: Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, Tatsuya Kawahara

    Abstract: Time-domain speech enhancement (SE) has recently been intensively investigated. Among recent works, DEMUCS introduces multi-resolution STFT loss to enhance performance. However, some resolutions used for STFT contain non-stationary signals, and it is challenging to learn multi-resolution frequency losses simultaneously with only one output. For better use of multi-resolution frequency information,… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

  37. arXiv:2303.14095  [pdf, other

    cs.CV cs.RO eess.IV

    PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View

    Authors: Ze Shi, Hao Shi, Kailun Yang, Zhe Yin, Yining Lin, Kaiwei Wang

    Abstract: Visual place recognition has gained significant attention in recent years as a crucial technology in autonomous driving and robotics. Currently, the two main approaches are the perspective view retrieval (P2P) paradigm and the equirectangular image retrieval (E2E) paradigm. However, it is practical and natural to assume that users only have consumer-grade pinhole cameras to obtain query perspectiv… ▽ More

    Submitted 28 July, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to ITSC 2023. Code and datasets will be made available at https://github.com/zafirshi/PanoVPR

  38. arXiv:2303.13842  [pdf, other

    cs.CV cs.RO eess.IV

    FishDreamer: Towards Fisheye Semantic Completion via Unified Image Outpainting and Segmentation

    Authors: Hao Shi, Yu Li, Kailun Yang, Jiaming Zhang, Kunyu Peng, Alina Roitberg, Yaozu Ye, Huajian Ni, Kaiwei Wang, Rainer Stiefelhagen

    Abstract: This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applic… ▽ More

    Submitted 20 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR OmniCV 2023. Code and datasets will be available at https://github.com/MasterHow/FishDreamer

  39. arXiv:2211.11293  [pdf, other

    cs.CV eess.IV

    Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer

    Authors: Hao Shi, Qi Jiang, Kailun Yang, Xiaoting Yin, Ze Wang, Kaiwei Wang

    Abstract: Vision sensors are widely applied in vehicles, robots, and roadside infrastructure. However, due to limitations in hardware cost and system size, camera Field-of-View (FoV) is often restricted and may not provide sufficient coverage. Nevertheless, from a spatiotemporal perspective, it is possible to obtain information beyond the camera's physical FoV from past video streams. In this paper, we prop… ▽ More

    Submitted 22 June, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The source code and dataset are made publicly available at https://github.com/MasterHow/FlowLens

  40. arXiv:2211.11257  [pdf, other

    cs.CV eess.IV physics.optics

    Computational Imaging for Machine Perception: Transferring Semantic Segmentation beyond Aberrations

    Authors: Qi Jiang, Hao Shi, Shaohua Gao, Jiaming Zhang, Kailun Yang, Lei Sun, Huajian Ni, Kaiwei Wang

    Abstract: Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through the Computational Imaging (CI) technique, ignoring the feasibility of advancing semantic segmentation. In this paper, we… ▽ More

    Submitted 14 March, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Transactions on Computational Imaging (TCI). The project page is at https://github.com/zju-jiangqi/CIADA

  41. arXiv:2211.01046  [pdf, other

    eess.AS cs.CL cs.SD

    Monolingual Recognizers Fusion for Code-switching Speech Recognition

    Authors: Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin, Yanbing Yang, Jianwu Dang

    Abstract: The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recogn… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  42. arXiv:2207.11860  [pdf, other

    cs.CV cs.RO eess.IV

    Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation

    Authors: Jiaming Zhang, Kailun Yang, Hao Shi, Simon Reiß, Kunyu Peng, Chaoxiang Ma, Haodong Fu, Philip H. S. Torr, Kaiwei Wang, Rainer Stiefelhagen

    Abstract: In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360° imagery. To tackle these problems, first, we propose the upgraded Transformer for Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE)… ▽ More

    Submitted 31 May, 2024; v1 submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Extended version of CVPR 2022 paper arXiv:2203.01452. Code is available at https://github.com/jamycheung/Trans4PASS

  43. arXiv:2207.04211  [pdf, other

    cs.AI cs.CV cs.IR cs.LG eess.IV

    BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

    Authors: Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

    Abstract: Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text, which potentially impacts a wide variety of real-world applications, such as internet search and fashion retrieval. In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding la… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

  44. arXiv:2207.03904  [pdf, other

    eess.SY

    Privacy Preservation by Local Design in Cooperative Networked Control Systems

    Authors: Chao Yang, Wen Yang, Hongbo Shi

    Abstract: In this paper, we study the privacy preservation problem in a cooperative networked control system working for the task of LQG control. The system consists of a user and a server: the user owns the state process, the server provides computation capability, and the user employs the server to compute control inputs for it. To enable the server's computation, the user needs to provide the measurement… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 13 pages, 7 figures, submitted to Transactions on Automatic Control

  45. arXiv:2206.14580  [pdf, other

    cs.CL eess.AS

    Language-specific Characteristic Assistance for Code-switching Speech Recognition

    Authors: Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang

    Abstract: Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, most existing methods have no language constraints on LSEs and underutili… ▽ More

    Submitted 11 July, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  46. arXiv:2206.06070  [pdf, other

    eess.IV cs.CV physics.optics

    Annular Computational Imaging: Capture Clear Panoramic Images through Simple Lens

    Authors: Qi Jiang, Hao Shi, Lei Sun, Shaohua Gao, Kailun Yang, Kaiwei Wang

    Abstract: Panoramic Annular Lens (PAL) composed of few lenses has great potential in panoramic surrounding sensing tasks for mobile and wearable devices because of its tiny size and large Field of View (FoV). However, the image quality of tiny-volume PAL confines to optical limit due to the lack of lenses for aberration correction. In this paper, we propose an Annular Computational Imaging (ACI) framework t… ▽ More

    Submitted 29 December, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted to IEEE Transactions on Computational Imaging (TCI). Code and datasets are publicly available at https://github.com/zju-jiangqi/ACI-PI2RNet

  47. arXiv:2206.04647  [pdf, other

    eess.IV cs.CV cs.LG

    VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

    Authors: Zeyuan Chen, Yinbo Chen, **gwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang

    Abstract: Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-Time Video Super-Resolution (STVSR) are developed to incorporate temporal interpolation and spatial super-resolution in a unified framework. Howe… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR 2022. Project page: http://zeyuan-chen.com/VideoINR/

  48. arXiv:2205.05570  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Review on Panoramic Imaging and Its Applications in Scene Understanding

    Authors: Shaohua Gao, Kailun Yang, Hao Shi, Kaiwei Wang, Jian Bai

    Abstract: With the rapid development of high-speed communication and artificial intelligence technologies, human perception of real-world scenes is no longer limited to the use of small Field of View (FoV) and low-dimensional scene detection devices. Panoramic imaging emerges as the next generation of innovative intelligent instruments for environmental perception and measurement. However, while satisfying… ▽ More

    Submitted 14 October, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to IEEE Transactions on Instrumentation and Measurement. 34 pages, 15 figures, 420 references

  49. arXiv:2203.03140  [pdf, other

    eess.SP

    An Improved Automatic Modulation Classification Scheme Based on Adaptive Fusion Network

    Authors: Hao Shi, Qi Peng, Yiqi Zhuang

    Abstract: Due to the over-fitting problem caused by imbalance samples, there is still room to improve the performance of data-driven automatic modulation classification (AMC) in noisy scenarios. By fully considering the signal characteristics, an AMC scheme based on adaptive fusion network (AFNet) is proposed in this work. The AFNet can extract and aggregate multi-scale spatial features of in-phase and quad… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 5 pages, 6 figures, Accepted to IEEE VTC 2022-Spring

  50. arXiv:2202.13388  [pdf, other

    cs.CV cs.RO eess.IV

    PanoFlow: Learning 360° Optical Flow for Surrounding Temporal Understanding

    Authors: Hao Shi, Yifan Zhou, Kailun Yang, Xiaoting Yin, Ze Wang, Yaozu Ye, Zhe Yin, Shi Meng, Peng Li, Kaiwei Wang

    Abstract: Optical flow estimation is a basic task in self-driving and robotics systems, which enables to temporally interpret traffic scenes. Autonomous vehicles clearly benefit from the ultra-wide Field of View (FoV) offered by 360° panoramic sensors. However, due to the unique imaging process of panoramic cameras, models designed for pinhole images do not directly generalize satisfactorily to 360° panoram… ▽ More

    Submitted 29 November, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

    Comments: Code and dataset are publicly available at https://github.com/MasterHow/PanoFlow