Skip to main content

Showing 1–7 of 7 results for author: Shih, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.18211  [pdf

    eess.SP cs.CV cs.LG

    WiFi-TCN: Temporal Convolution for Human Interaction Recognition based on WiFi signal

    Authors: Chih-Yang Lin, Chia-Yu Lin, Yu-Tso Liu, Timothy K. Shih

    Abstract: The utilization of Wi-Fi based human activity recognition has gained considerable interest in recent times, primarily owing to its applications in various domains such as healthcare for monitoring breath and heart rate, security, elderly care. These Wi-Fi-based methods exhibit several advantages over conventional state-of-the-art techniques that rely on cameras and sensors, including lower costs a… ▽ More

    Submitted 11 January, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Paper is currently under review at IEEE Access

  2. arXiv:2303.07578  [pdf, ps, other

    cs.SD cs.LG eess.AS

    VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

    Authors: Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

    Abstract: We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Cha… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Presentation accepted at ICASSP 2023

  3. arXiv:2301.10335  [pdf, other

    cs.SD cs.LG eess.AS

    Multilingual Multiaccented Multispeaker TTS with RADTTS

    Authors: Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

    Abstract: We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfe… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 5 pages, submitted to ICASSP 2023

  4. arXiv:2203.01786  [pdf, other

    cs.SD cs.LG eess.AS

    Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

    Authors: Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

    Abstract: Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for ha… ▽ More

    Submitted 27 June, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 22 pages, 11 figures, 3 tables

  5. arXiv:2108.10447  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    One TTS Alignment To Rule Them All

    Authors: Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei **, Bryan Catanzaro

    Abstract: Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durati… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  6. arXiv:2005.05957  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

    Authors: Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro

    Abstract: In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple a… ▽ More

    Submitted 16 July, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: 10 pages, 7 pictures

  7. Light Field Synthesis by Training Deep Network in the Refocused Image Domain

    Authors: Chang-Le Liu, Kuang-Tsu Shih, Jiun-Woei Huang, Homer H. Chen

    Abstract: Light field imaging, which captures spatio-angular information of incident light on image sensor, enables many interesting applications like image refocusing and augmented reality. However, due to the limited sensor resolution, a trade-off exists between the spatial and angular resolution. To increase the angular resolution, view synthesis techniques have been adopted to generate new views from ex… ▽ More

    Submitted 28 April, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted to IEEE Transactions on Image Processing