Skip to main content

Showing 1–8 of 8 results for author: Zhuo, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2308.02915  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

    Authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan

    Abstract: When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM MM 2023

  2. arXiv:2306.17103  [pdf, other

    cs.CL cs.SD eess.AS

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Authors: Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi LI, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

    Abstract: We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language mo… ▽ More

    Submitted 21 November, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 2 figures, 5 tables, accepted by ISMIR 2023

  3. arXiv:2306.10548  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MARBLE: Music Audio Representation Benchmark for Universal Evaluation

    Authors: Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

    Abstract: In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue… ▽ More

    Submitted 23 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: camera-ready version for NeurIPS 2023

  4. arXiv:2211.11248  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Video Background Music Generation: Dataset, Method and Evaluation

    Authors: Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

    Abstract: Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a comp… ▽ More

    Submitted 4 August, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted by ICCV2023

  5. arXiv:2207.05049  [pdf, other

    cs.CV eess.IV

    Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

    Authors: Long Zhuo, Guangcong Wang, Shikai Li, Wayne Wu, Ziwei Liu

    Abstract: Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in generating a photo-realistic video from a sequence of semantic maps. However, this pipeline suffers from high computational cost and long inference latency, which largely depends on two essential factors: 1) network architecture parameters, 2) sequential data stream. Recently, the parameters of image-based generative models have… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, Project Page: https://fast-vid2vid.github.io/ , Code: https://github.com/fast-vid2vid/fast-vid2vid

  6. arXiv:2008.08526  [pdf, other

    eess.IV cs.CV

    Blur-Attention: A boosting mechanism for non-uniform blurred image restoration

    Authors: Xiaoguang Li, Feifan Yang, Kin Man Lam, Li Zhuo, Jiafeng Li

    Abstract: Dynamic scene deblurring is a challenging problem in computer vision. It is difficult to accurately estimate the spatially varying blur kernel by traditional methods. Data-driven-based methods usually employ kernel-free end-to-end map** schemes, which are apt to overlook the kernel estimation. To address this issue, we propose a blur-attention module to dynamically capture the spatially varying… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  7. arXiv:2006.15588  [pdf, other

    eess.IV cs.CV cs.LG

    A lateral semicircular canal segmentation based geometric calibration for human temporal bone CT Image

    Authors: Xiaoguang Li, Peng Fu, Hongxia Yin, ZhenChang Wang, Li Zhuo, Hui Zhang

    Abstract: Computed Tomography (CT) of the temporal bone has become an important method for diagnosing ear diseases. Due to the different posture of the subject and the settings of CT scanners, the CT image of the human temporal bone should be geometrically calibrated to ensure the symmetry of the bilateral anatomical structure. Manual calibration is a time-consuming task for radiologists and an important pr… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

  8. arXiv:1903.09294  [pdf, other

    eess.SP cs.NI

    Hybrid Precoder and Combiner for Imperfect Beam Alignment in mmWave MIMO Systems

    Authors: Chandan Pradhan, Ang Li, Li Zhuo, Yonghui Li, Branka Vucetic

    Abstract: In this letter, we aim to design a robust hybrid precoder and combiner against beam misalignment in millimeter-wave (mmWave) communication systems. We consider the inclusion of the `error statistics' into the precoder and combiner design, where the array response that incorporates the distribution of the misalignment error is first derived. An iterative algorithm is then proposed to design the rob… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

    Comments: 4 pages