Skip to main content

Showing 1–7 of 7 results for author: Wang, Y F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18871  [pdf, other

    eess.AS cs.CL

    DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2402.16321  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

    Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

    Abstract: Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variatio… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  3. arXiv:2305.17343  [pdf, other

    cs.CV cs.SD eess.AS

    Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

    Authors: Yung-Hsuan Lai, Yen-Chun Chen, Yu-Chiang Frank Wang

    Abstract: Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, Listen, and Parse dataset (LLP), we investigate the under-explored unaligned setting, where the goal is to recognize audio and visual events in a video… ▽ More

    Submitted 2 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  4. arXiv:2105.00708  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

    Authors: Yan-Bo Lin, Yu-Chiang Frank Wang

    Abstract: Human perceives rich auditory experience with distinct sound heard by ears. Videos recorded with binaural audio particular simulate how human receives ambient sound. However, a large number of videos are with monaural audio only, which would degrade the user experience due to the lack of ambient information. To address this issue, we propose an audio spatialization framework to convert a monaural… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: AAAI'21

  5. arXiv:2007.09163  [pdf, other

    cs.CV cs.LG cs.NE eess.IV

    Wavelet Channel Attention Module with a Fusion Network for Single Image Deraining

    Authors: Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang

    Abstract: Single image deraining is a crucial problem because rain severely degenerates the visibility of images and affects the performance of computer vision tasks like outdoor surveillance systems and intelligent vehicles. In this paper, we propose the new convolutional neural network (CNN) called the wavelet channel attention module with a fusion network. Wavelet transform and the inverse wavelet transf… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted to IEEE ICIP 2020

    Journal ref: 2020 IEEE International Conference on Image Processing (ICIP)

  6. arXiv:1806.09250  [pdf

    physics.ins-det eess.SP

    Electronics of Time-of-flight Measurement for Back-n at CSNS

    Authors: T. Yu, P. Cao, X. Y. Ji, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. **g, L. Kang , et al. (46 additional authors not shown)

    Abstract: Back-n is a white neutron experimental facility at China Spallation Neutron Source (CSNS). The time structure of the primary proton beam make it fully applicable to use TOF (time-of-flight) method for neutron energy measuring. We implement the electronics of TOF measurement on the general-purpose readout electronics designed for all of the seven detectors in Back-n. The electronics is based on PXI… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: 4 pages, 13 figures, 21st IEEE Real Time Conference

  7. arXiv:1806.09249  [pdf

    physics.ins-det eess.SP

    T0 Fan-out for Back-n White Neutron Facility at CSNS

    Authors: X. Y. Ji, P. Cao, T. Yu, L. K. Xie, X. R. Huang, Q. An, H. Y. Bai, J. Bao, Y. H. Chen, P. J. Cheng, Z. Q. Cui, R. R. Fan, C. Q. Feng, M. H. Gu, Z. J. Han, G. Z. He, Y. C. He, Y. F. He, H. X. Huang, W. L. Huang, X. L. Ji, H. Y. Jiang, W. Jiang, H. Y. **g, L. Kang , et al. (46 additional authors not shown)

    Abstract: the main physics goal for Back-n white neutron facility at China Spallation Neutron Source (CSNS) is to measure nuclear data. The energy of neutrons is one of the most important parameters for measuring nuclear data. Method of time of flight (TOF) is used to obtain the energy of neutrons. The time when proton bunches hit the thick tungsten target is considered as the start point of TOF. T0 signal,… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: 3 pages, 6 figures, the 21st IEEE Real Time Conference