Skip to main content

Showing 1–3 of 3 results for author: Chan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20094  [pdf, other

    cs.CL cs.LG

    Scaling Synthetic Data Creation with 1,000,000,000 Personas

    Authors: Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, Dong Yu

    Abstract: We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub -- a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world's total population), acting as distri… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Work in progress

  2. arXiv:2309.15294  [pdf

    physics.flu-dyn cs.LG

    Multiple Case Physics-Informed Neural Network for Biomedical Tube Flows

    Authors: Hong Shen Wong, Wei Xuan Chan, Bing Huan Li, Choon Hwai Yap

    Abstract: Fluid dynamics computations for tube-like geometries are important for biomedical evaluation of vascular and airway fluid dynamics. Physics-Informed Neural Networks (PINNs) have recently emerged as a good alternative to traditional computational fluid dynamics (CFD) methods. The vanilla PINN, however, requires much longer training time than the traditional CFD methods for each specific flow scenar… ▽ More

    Submitted 4 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 24 pages, 8 figures, 5 tables

  3. arXiv:2005.13211  [pdf, other

    eess.AS cs.SD

    Insertion-Based Modeling for End-to-End Automatic Speech Recognition

    Authors: Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chan

    Abstract: End-to-end (E2E) models have gained attention in the research field of automatic speech recognition (ASR). Many E2E models proposed so far assume left-to-right autoregressive generation of an output token sequence except for connectionist temporal classification (CTC) and its variants. However, left-to-right decoding cannot consider the future output context, and it is not always optimal for ASR.… ▽ More

    Submitted 15 November, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: INTERSPEECH 2020