Skip to main content

Showing 1–2 of 2 results for author: Xinyuan, H L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.11252  [pdf, other

    cs.CL cs.LG

    HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

    Authors: Cihan Xiao, Henry Li Xinyuan, **yi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur

    Abstract: We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-English translations, containing 600+ hours of Cantonese audio, its standard traditional Chinese transcript, and English translation, segmented and aligned at the sentence level. We describe the notable challenges in corpus preparation: segmentation, alignment of long audio recordings, and sentence-level alignment with non-verb… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  2. arXiv:2301.07209  [pdf, other

    cs.CL cs.LG

    Learning a Formality-Aware Japanese Sentence Representation

    Authors: Henry Li Xinyuan, Ray Lee, Jerry Chen, Kelly Marchisio

    Abstract: While the way intermediate representations are generated in encoder-decoder sequence-to-sequence models typically allow them to preserve the semantics of the input sentence, input features such as formality might be left out. On the other hand, downstream tasks such as translation would benefit from working with a sentence representation that preserves formality in addition to semantics, so as to… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.