Skip to main content

Showing 1–19 of 19 results for author: Sun, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09869  [pdf, ps, other

    cs.SD eess.AS

    MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

    Authors: Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, Shinji Watanabe

    Abstract: Speech discrete representation has proven effective in various downstream applications due to its superior compression rate of the waveform, fast convergence during training, and compatibility with other modalities. Discrete units extracted from self-supervised learning (SSL) models have emerged as a prominent approach for obtaining speech discrete representation. However, while discrete units hav… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  2. arXiv:2401.07342  [pdf, other

    eess.AS cs.LG

    Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms

    Authors: Anchen Sun, Juan J Londono, Batya Elbaum, Luis Estrada, Roberto Jose Lazo, Laura Vitale, Hugo Gonzalez Villasanti, Riccardo Fusaroli, Lynn K Perry, Daniel S Messinger

    Abstract: Young children spend substantial portions of their waking hours in noisy preschool classrooms. In these environments, children's vocal interactions with teachers are critical contributors to their language outcomes, but manually transcribing these interactions is prohibitive. Using audio from child- and teacher-worn recorders, we propose an automated framework that uses open source software both t… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures, 3 tables, The paper has been accepted to 2024 IEEE International Conference on Development and Learning (ICDL) as a full oral presentation and will appear in the IEEE ICDL proceedings

  3. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  4. arXiv:2310.02720  [pdf, other

    cs.SD eess.AS

    Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

    Authors: Jiatong Shi, Hirofumi Inaguma, Xutai Ma, Ilia Kulikov, Anna Sun

    Abstract: Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech signals. In contrast, this paper aims to incorporate multi-resolution information into speech self-supervised representation learning. We introduce a SSL model that l… ▽ More

    Submitted 30 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR2024 as spotlight

  5. arXiv:2309.08837  [pdf, other

    cs.SD eess.AS

    FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

    Authors: Jianzong Wang, Xulong Zhang, Aolan Sun, Ning Cheng, **g Xiao

    Abstract: This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and i… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

  6. arXiv:2308.03240  [pdf, other

    math.OC eess.SY

    Carbon-Aware Optimal Power Flow

    Authors: Xin Chen, Andy Sun, Wenbo Shi, Na Li

    Abstract: To facilitate effective decarbonization of the electric power sector, this paper introduces the generic Carbon-aware Optimal Power Flow (C-OPF) method for power system decision-making that considers demand-side carbon accounting and emission management. Built upon the classic optimal power flow (OPF) model, the C-OPF method incorporates carbon emission flow equations and constraints, as well as ca… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  7. arXiv:2305.03101  [pdf, other

    cs.CL cs.SD eess.AS

    Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

    Authors: Yun Tang, Anna Y. Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden D. Tomasello, Juan Pino

    Abstract: Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks for speech-to-text tasks. They are designed for different purposes and each has its own benefits and drawbacks for speech-to-text tasks. In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. The new… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: ACL 2023 main conference

  8. arXiv:2304.11783  [pdf, other

    eess.IV cs.GR cs.MM

    Rip Current Detection in Nearshore Areas through UAV Video Analysis with Almost Local-Isometric Embedding Techniques on Sphere

    Authors: Anchen Sun, Kaiqi Yang

    Abstract: Rip currents pose a significant danger to those who visit beaches, as they can swiftly pull swimmers away from shore. Detecting these currents currently relies on costly equipment and is challenging to implement on a larger scale. The advent of unmanned aerial vehicles (UAVs) and camera technology, however, has made monitoring near-shore regions more accessible and scalable. This paper proposes a… ▽ More

    Submitted 20 February, 2024; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: 10 pages, 9 figures, 3 tables

  9. arXiv:2304.11547  [pdf, other

    cs.SD eess.AS

    SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

    Authors: Jianzong Wang, Xulong Zhang, Haobin Tang, Aolan Sun, Ning Cheng, **g Xiao

    Abstract: In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features predicted from an acoustic model. However, there are always distortions existing in the predicted acoustic features, compared to those of the groundtruth, especially in the common case of poor acoustic modeling due to low-quality training data. To overcome such limits… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted by IJCNN2023. 2023 International Joint Conference on Neural Networks (IJCNN2023)

  10. Accelerated partial separable model using dimension-reduced optimization technique for ultra-fast cardiac MRI

    Authors: Zhongsen Li, Aiqi Sun, Chuyu Liu, Haining Wei, Shuai Wang, Mingzhu Fu, Rui Li

    Abstract: Objective. Imaging dynamic object with high temporal resolution is challenging in magnetic resonance imaging (MRI). Partial separable (PS) model was proposed to improve the imaging quality by reducing the degrees of freedom of the inverse problem. However, PS model still suffers from long acquisition time and even longer reconstruction time. The main objective of this study is to accelerate the PS… ▽ More

    Submitted 1 April, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 23 pages, 11 figures. Accepted as manuscript on Physics in Medicine & Biology

  11. arXiv:2208.08757  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

    Authors: SiCheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng

    Abstract: One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic. Existing works generally disentangle timbre, while information about pitch, rhythm and content is still mixed together. To perform one-shot VC effectively with further disentangling these speech components, we employ random resampling for pitch and content encoder and use the va… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: 5 pages,5 figures,INTERSPEECH 2022

  12. arXiv:2207.11309  [pdf, other

    eess.SY

    Impacts of Dynamic Line Ratings on the ERCOT Transmission System

    Authors: Thomas Lee, Vineet Jagadeesan Nair, Andy Sun

    Abstract: Grid regulators and participants are paying increasing attention to Dynamic Line Ratings (DLR) as a new approach to address transmission system bottlenecks. In this paper, a thorough comparison of DLR, Ambient Adjusted Ratings (AAR), and the traditional Static Line Ratings (SLR) are conducted on a synthetic ERCOT grid. Estimates of DLR and AAR are calculated using an equation based on heat balance… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: 6 pages, 8 figures

  13. A Distributed Scheme for Stability Assessment in Large-Scale Structure-Preserving Models via Singular Perturbation

    Authors: Amin Gholami, Xu Andy Sun

    Abstract: Assessing small-signal stability of power systems composed of thousands of interacting generators is a computationally challenging task. To reduce the computational burden, this paper introduces a novel condition to assess and certify small-signal stability. Using this certificate, we can see the impact of network topology and system parameters (generators' dam** and inertia) on the eigenvalues… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: https://hdl.handle.net/10125/71001

    Journal ref: Proceedings of the 54th Hawaii International Conference on System Sciences, 2021

  14. arXiv:2103.15308  [pdf, other

    eess.SY math.DS

    Stability of Multi-Microgrids: New Certificates, Distributed Control, and Braess's Paradox

    Authors: Amin Gholami, Xu Andy Sun

    Abstract: This paper investigates the theory of resilience and stability in multi-microgrid networks. We derive new sufficient conditions to guarantee small-signal stability of multi-microgrids in both lossless and lossy networks. The new stability certificate for lossy networks only requires local information, thus leads to a fully distributed control scheme. Moreover, we study the impact of network topolo… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

  15. arXiv:2012.02626  [pdf, other

    eess.AS cs.CL cs.SD

    GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis

    Authors: Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, **g Xiao

    Abstract: This paper introduces a graphical representation approach of prosody boundary (GraphPB) in the task of Chinese speech synthesis, intending to parse the semantic and syntactic relationship of input sequences in a graphical domain for improving the prosody performance. The nodes of the graph embedding are formed by prosodic words, and the edges are formed by the other prosodic boundaries, namely pro… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted to SLT 2021

  16. arXiv:2010.06662  [pdf, other

    math.DS eess.SY

    The Impact of Dam** in Second-Order Dynamical Systems with Applications to Power Grid Stability

    Authors: Amin Gholami, X. Andy Sun

    Abstract: We consider a broad class of second-order dynamical systems and study the impact of dam** as a system parameter on the stability, hyperbolicity, and bifurcation in such systems. We prove a monotonic effect of dam** on the hyperbolicity of the equilibrium points of the corresponding first-order system. This provides a rigorous formulation and theoretical justification for the intuitive notion t… ▽ More

    Submitted 19 July, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Journal ref: SIAM Journal on Applied Dynamical Systems 21 (2022) 405-437

  17. A Fast Certificate for Power System Small-Signal Stability

    Authors: Amin Gholami, Xu Andy Sun

    Abstract: Swing equations are an integral part of a large class of power system dynamical models used in rotor angle stability assessment. Despite intensive studies, some fundamental properties of lossy swing equations are still not fully understood. In this paper, we develop a sufficient condition for certifying the stability of equilibrium points (EPs) of these equations, and illustrate the effects of dam… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Journal ref: 2020 59th IEEE Conference on Decision and Control (CDC)

  18. arXiv:2003.01924  [pdf, other

    eess.AS cs.CL cs.SD

    GraphTTS: graph-to-sequence modelling in neural text-to-speech

    Authors: Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, **g Xiao

    Abstract: This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms. The graphical inputs consist of node and edge representations constructed from input texts. The encoding of these graphical inputs incorporates syntax information by a GNN encoder module. Besides, applying the encoder of GraphTTS as a graph au… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: Accepted to ICASSP 2020

  19. arXiv:1904.08855  [pdf, ps, other

    eess.SY math.OC

    Solvability of Power Flow Equations Through Existence and Uniqueness of Complex Fixed Point

    Authors: Bai Cui, Xu Andy Sun

    Abstract: Variations of loading level and changes in system topological property may cause the operating point of an electric power systems to move gradually towards the verge of its transmission capability, which can lead to catastrophic outcomes such as voltage collapse blackout. From a modeling perspective, voltage collapse is closely related to the solvability of power flow equations. Determining condit… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.