Skip to main content

Showing 1–17 of 17 results for author: Lei, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.03274  [pdf, other

    eess.AS cs.AI cs.SD

    Enhancing CTC-based speech recognition with diverse modeling units

    Authors: Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang

    Abstract: In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvem… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  3. arXiv:2404.07932  [pdf, other

    cs.CV eess.IV

    FusionMamba: Efficient Image Fusion with State Space Model

    Authors: Siran Peng, Xiangyu Zhu, Haoyu Deng, Zhen Lei, Liang-Jian Deng

    Abstract: Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fiel… ▽ More

    Submitted 10 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2401.16067  [pdf, other

    eess.IV cs.MM

    Encoding Time and Energy Model for SVT-AV1 based on Video Complexity

    Authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, Christian Herglotz, André Kaup

    Abstract: The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption,… ▽ More

    Submitted 30 January, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 5 pages, 1 figure, accepted for IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024

  5. arXiv:2310.07062  [pdf, other

    cs.SD cs.LG eess.AS

    Acoustic Model Fusion for End-to-end Speech Recognition

    Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

    Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2309.13404  [pdf, other

    eess.IV cs.CV

    Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos

    Authors: Rongfeng Wei, **lin Wu, Xuexue Bai, Ming Feng, Zhen Lei, Hongbin Liu, Zhen Chen

    Abstract: In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the categor… ▽ More

    Submitted 20 June, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted by ICRA 2024 Workshop on C4 Surgical Robotic Systems in the Embodied AI Era; Surgical Tool Localization in Endoscopic Videos Challenge of MICCAI2023

  7. arXiv:2307.05208  [pdf, other

    eess.IV

    Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

    Authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, André Kaup, Christian Herglotz

    Abstract: Current developments in video encoding technology lead to continuously improving compression performance but at the expense of increasingly higher computational demands. Regarding the online video traffic increases during the last years and the concomitant need for video encoding, encoder complexity control mechanisms are required to restrict the processing time to a sufficient extent in order to… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, accepted for IEEE International Conference on Image Processing (ICIP) 2023

  8. arXiv:2307.05087  [pdf, other

    cs.CV eess.IV

    SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

    Authors: Zhengxin Lei, Feng Xu, Jiangtao Wei, Feng Cai, Feng Wang, Ya-Qiu **

    Abstract: SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms w… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  9. arXiv:2211.16433   

    eess.SY

    On Robust Observer Design for System Motion on SE(3) Using Onboard Visual Sensors

    Authors: Tong Zhang, Ying Tan, Xiang Chen, Zike Lei

    Abstract: Onboard visual sensing has been widely used in the unmanned ground vehicle (UGV) and/or unmanned aerial vehicle (UAV), which can be modeled as dynamic systems on SE(3). The onboard sensing outputs of the dynamic system can usually be applied to derive the relative position between the feature marks and the system, but bearing with explicit geometrical constraint. Such a visual geometrical constrai… ▽ More

    Submitted 21 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Need Further Improvement

  10. arXiv:2203.14485  [pdf, other

    cs.CV eess.SY

    Optimization of Directional Landmark Deployment for Visual Observer on SE(3)

    Authors: Zike Lei, Xi Chen, Ying Tan, Xiang Chen, Li Chai

    Abstract: An optimization method is proposed in this paper for novel deployment of given number of directional landmarks (location and pose) within a given region in the 3-D task space. This new deployment technique is built on the geometric models of both landmarks and the monocular camera. In particular, a new concept of Multiple Coverage Probability (MCP) is defined to characterize the probability of at… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  11. arXiv:2110.09748  [pdf, other

    eess.SY

    User Based Design and Evaluation Pipeline for Indoor Airships

    Authors: Zhaoliang Zheng, Jiahao Li, Parth Agrawal, Zhao Lei, Aaron John-Sabu, Ankur Mehta

    Abstract: Designing a controllable airship for non-expert users or preemptively evaluating the performance of desired airships has always been a very challenging problem. This paper explores the blimp design parameter space from the aspect of the user by considering various distributions of thrust, combinations of propulsive mechanisms, and balloon shapes. We provide open-source modular hardware and reconfi… ▽ More

    Submitted 23 November, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Submitting to ICRA 2022

  12. arXiv:2109.07112   

    cs.RO eess.SY

    Learning Friction Model for Magnet-actuated Tethered Capsule Robot

    Authors: Yi Wang, Yuyang Tu, Yuchen He, Xutian Deng, Ziwei Lei, Jianwei Zhang, Miao Li

    Abstract: The potential diagnostic applications of magnet-actuated capsules have been greatly increased in recent years. For most of these potential applications, accurate position control of the capsule have been highly demanding. However, the friction between the robot and the environment as well as the drag force from the tether play a significant role during the motion control of the capsule. Moreover,… ▽ More

    Submitted 1 October, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: Because it overlaps with the previous article arvix:2108.07151, we apply for return.Thank you

  13. arXiv:2105.08629  [pdf, other

    eess.IV cs.CV cs.LG

    Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report

    Authors: Andrey Ignatov, Kim Byeoung-su, Radu Timofte, Angeline Pouget, Fenglong Song, Cheng Li, Shuai Xiao, Zhongqian Fu, Matteo Maggioni, Yibin Huang, Shen Cheng, Xin Lu, Yifeng Zhou, Liangyu Chen, Donghao Liu, Xiangyu Zhang, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Bin Huang , et al. (7 additional authors not shown)

    Abstract: Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.07809, arXiv:2105.07825

  14. arXiv:2008.08352  [pdf, other

    eess.IV cs.LG

    Deep Controllable Backlight Dimming

    Authors: Lvyin Duan, Demetris Marnerides, Alan Chalmers, Zhichun Lei, Kurt Debattista

    Abstract: Dual-panel displays require local dimming algorithms in order to reproduce content with high fidelity and high dynamic range. In this work, a novel deep learning based local dimming method is proposed for rendering HDR images on dual-panel HDR displays. The method uses a Convolutional Neural Network to predict backlight values, using as input the HDR image that is to be displayed. The model is des… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  15. arXiv:2007.02096  [pdf

    eess.IV cs.CV cs.LG

    Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

    Authors: Yue Sun, Kun Gao, Zhengwang Wu, Zhihao Lei, Ying Wei, Jun Ma, ** Yang, Xue Feng, Li Zhao, Trung Le Phan, Jitae Shin, Tao Zhong, Yu Zhang, Lequan Yu, Caizi Li, Ramesh Basnet, M. Omair Ahmad, M. N. S. Swamy, Wenao Ma, Qi Dou, Toan Duc Bui, Camilo Bermudez Noguera, Bennett Landman, Ian H. Gotlib, Kathryn L. Humphreys , et al. (8 additional authors not shown)

    Abstract: To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i… ▽ More

    Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Journal ref: IEEE Transactions on Medical Imaging, 40(5), 1363-1376, 2021

  16. arXiv:2004.00787  [pdf, other

    eess.IV eess.SY

    Radial Coverage Strength for Optimization of Multi-Camera Deployment

    Authors: Zike Lei, Xi Chen, Xiang Chen, Li Chai

    Abstract: In this paper, a new concept, radial coverage strength, is first proposed to characterize the visual sensing performance when the orientation of the target pose is considered. In particular, the elevation angle of the optical pose of the visual sensor is taken to decompose the visual coverage strength into effective and ineffective components, motivated by the imaging intuition. An optimization pr… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: 11 pages, 14 figures

  17. Eye in the Sky: Drone-Based Object Tracking and 3D Localization

    Authors: Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang

    Abstract: Drones, or general UAVs, equipped with a single camera have been widely deployed to a broad range of applications, such as aerial photography, fast goods delivery and most importantly, surveillance. Despite the great progress achieved in computer vision algorithms, these algorithms are not usually optimized for dealing with images or video sequences acquired by drones, due to various challenges su… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: Accepted to ACMMM2019