Skip to main content

Showing 1–11 of 11 results for author: Liao, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.03872  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP-Emo: Towards Empathetic Large Speech-Language Models

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Junhong Wu, Chengqing Zong, Jiajun Zhang

    Abstract: The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we pr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2405.19041  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP-KD: Bootstrap** Language-Speech Pre-training via Knowledge Distillation

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Jiajun Zhang

    Abstract: Recent end-to-end approaches have shown promise in extending large language models (LLMs) to speech inputs, but face limitations in directly assessing and optimizing alignment quality and fail to achieve fine-grained alignment due to speech-text length mismatch. We introduce BLSP-KD, a novel approach for Bootstrap** Language-Speech Pretraining via Knowledge Distillation, which addresses these li… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2312.14518  [pdf, other

    q-bio.NC cs.CV eess.IV

    Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification

    Authors: Minghui Liao, Guojia Wan, Bo Du

    Abstract: Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular characteristics of neurons is relatively low and costly. With the advancements in electron microscopy imaging and analysis techniques for brain tissue, we… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  4. Transferred Thin Film Lithium Niobate as Millimeter Wave Acoustic Filter Platforms

    Authors: Omar Barrera, Sinwoo Cho, Kenny Hyunh, Jack Kramer, Michael Liao, Vakhtang Chulukhadze, Lezli Matto, Mark S. Goorsky, Ruochen Lu

    Abstract: This paper reports the first high-performance acoustic filters toward millimeter wave (mmWave) bands using transferred single-crystal thin film lithium niobate (LiNbO3). By transferring LiNbO3 on the top of silicon (Si) and sapphire (Al2O3) substrates with an intermediate amorphous Si (aSi) bonding and sacrificial layer, we demonstrate compact acoustic filters with record-breaking performance beyo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 4 pages, 8 figures, accepted by IEEE MEMS 2024

  5. arXiv:2309.00916  [pdf, other

    cs.CL cs.SD eess.AS

    BLSP: Bootstrap** Language-Speech Pre-training via Behavior Alignment of Continuation Writing

    Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, **liang Lu, Junhong Wu, Yuchen Liu, Chengqing Zong, Jiajun Zhang

    Abstract: The emergence of large language models (LLMs) has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are use… ▽ More

    Submitted 28 May, 2024; v1 submitted 2 September, 2023; originally announced September 2023.

  6. Thin-Film Lithium Niobate Acoustic Resonator with High Q of 237 and k2 of 5.1% at 50.74 GHz

    Authors: Jack Kramer, Vakhtang Chulukhadze, Kenny Huynh, Omar Barrera, Michael Liao, Sinwoo Cho, Lezli Matto, Mark S. Goorsky, Ruochen Lu

    Abstract: This work reports a 50.74 GHz lithium niobate (LiNbO3) acoustic resonator with a high quality factor (Q) of 237 and an electromechanical coupling (k2) of 5.17% resulting in a figure of merit (FoM, Q x k2) of 12.2. The LiNbO3 resonator employs a novel bilayer periodically poled piezoelectric film (P3F) 128 Y-cut LiNbO3 on amorphous silicon (a-Si) on sapphire stack to achieve low losses and high cou… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 4 pages, 5 figures, published in 2023 Joint Conference of the IEEE International Frequency Control Symposium & European Frequency and Time Forum (IEEE IFCS 2023)

    Journal ref: 2023 Joint Conference of the European Frequency and Time Forum and IEEE International Frequency Control Symposium (EFTF/IFCS), Toyama, Japan, 2023

  7. arXiv:2104.14631  [pdf, other

    cs.CV eess.IV

    Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

    Authors: Sibo Zhang, Jiahong Yuan, Miao Liao, Liangjun Zhang

    Abstract: With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video g… ▽ More

    Submitted 22 January, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: ICASSP 2022

  8. arXiv:2007.09198  [pdf, other

    cs.CV cs.LG eess.AS

    Speech2Video Synthesis with 3D Skeleton Regularization and Expressive Body Poses

    Authors: Miao Liao, Sibo Zhang, Peng Wang, Hao Zhu, Xinxin Zuo, Ruigang Yang

    Abstract: In this paper, we propose a novel approach to convert given speech audio to a photo-realistic speaking video of a specific person, where the output video has synchronized, realistic, and expressive rich body dynamics. We achieve this by first generating 3D skeleton movements from the audio sequence using a recurrent neural network (RNN), and then synthesizing the output video via a conditional gen… ▽ More

    Submitted 8 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted by ACCV 2020

  9. arXiv:2007.08854  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    DVI: Depth Guided Video Inpainting for Autonomous Driving

    Authors: Miao Liao, Feixiang Lu, Dingfu Zhou, Sibo Zhang, Wei Li, Ruigang Yang

    Abstract: To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated via this common 3D map. In order to fill a target… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  10. arXiv:2004.10934  [pdf, other

    cs.CV eess.IV

    YOLOv4: Optimal Speed and Accuracy of Object Detection

    Authors: Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao

    Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normal… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  11. arXiv:1608.00299  [pdf, other

    eess.SY

    Optimization Algorithms for Catching Data Manipulators in Power System Estimation Loops

    Authors: Mang Liao, Aranya Chakrabortty

    Abstract: In this paper we develop a set of algorithms that can detect the identities of malicious data-manipulators in distributed optimization loops for estimating oscillation modes in large power system models. The estimation is posed in terms of a consensus problem among multiple local estimators that jointly solve for the characteristic polynomial of the network model. If any of these local estimates a… ▽ More

    Submitted 28 March, 2017; v1 submitted 31 July, 2016; originally announced August 2016.

    Comments: 15 pages, 23 figures, and 2 tables