Search | arXiv e-print repository

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrect tones. To address the issue, we propose the ToneUnit framework, which leverages annotated data with tone labels as CTC supervision to learn tone-aware discrete speech units for Mandarin Chinese speech. Our findings indicate that the discrete units acquired through the TonUnit resolve the "tone shift" issue in synthesized Chinese speech and yield favorable results in English synthesis. Moreover, the experimental results suggest that finite scalar quantization enhances the effectiveness of ToneUnit. Notably, ToneUnit can work effectively even with minimal annotated data. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2404.06563 [pdf, other]

Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows

Authors: Lindsey Linxi Wei, Chung Yik Edward Yeung, Hongjian Yu, **gchuan Zhou, Dong He, Magdalena Balazinska

Abstract: We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies bet… ▽ More We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies between model saliency and human attention. This demonstration makes the following contributions:(1) the introduction of MaskSearch's graphical user interface (GUI), which enables interactive exploration of image databases through mask properties, (2) hands-on opportunities for users to explore MaskSearch's capabilities and constraints within machine learning workflows, and (3) an opportunity for conference attendees to understand how MaskSearch accelerates queries over image masks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2312.02385 [pdf, other]

doi 10.1007/s00162-024-00695-0

Adaptive spectral proper orthogonal decomposition of tonal flows

Authors: Brandon C. Y. Yeung, Oliver T. Schmidt

Abstract: An adaptive algorithm for spectral proper orthogonal decomposition (SPOD) of mixed broadband-tonal turbulent flows is developed. Sharp peak resolution at tonal frequencies is achieved by locally minimizing the bias of the spectrum. Smooth spectrum estimates of broadband regions are achieved by locally reducing the variance of the spectrum. The method utilizes multitaper estimation with sine tapers… ▽ More An adaptive algorithm for spectral proper orthogonal decomposition (SPOD) of mixed broadband-tonal turbulent flows is developed. Sharp peak resolution at tonal frequencies is achieved by locally minimizing the bias of the spectrum. Smooth spectrum estimates of broadband regions are achieved by locally reducing the variance of the spectrum. The method utilizes multitaper estimation with sine tapers. An iterative criterion based on modal convergence is introduced to enable the SPOD to adapt to spectral features. For tonal flows, the adaptivity is controlled by a single user input; for broadband flows, a constant number of sine tapers is recommended without adaptivity. The discrete version of Parseval's theorem for SPOD is stated. Proper normalization of the tapers ensures that Parseval's theorem is satisfied in expectation. Drastic savings in computational complexity and memory usage are facilitated by two aspects: (i) sine tapers, which permit post hoc windowing of a single Fourier transform; and (ii) time-domain lossless compression using a QR or eigenvalue decomposition. Sine-taper SPOD is demonstrated on time-resolved particle image velocimetry (TR-PIV) data from an open cavity flow and high-fidelity large-eddy simulation (LES) data from a round jet, with and without adaptivity. For the tonal cavity flow, the adaptive algorithm outperforms Slepian-based multitaper SPOD in terms of variance and local bias of the spectrum, mode convergence, and memory usage. △ Less

Submitted 21 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Journal ref: Theoretical and Computational Fluid Dynamics, 2024

arXiv:2310.05374 [pdf, other]

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Authors: Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

Abstract: Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech proces… ▽ More Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. We train a latent synthesizer to convert textual data into an intermediate latent representation of a pre-trained speech model. These pseudo acoustic representations of textual data augment acoustic data for model training. We evaluate LaSyn on low-resource automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an E2E baseline trained on LibriSpeech train-clean-100, with relative word error rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM) and EM-Tree accuracies on STOP respectively. With fewer parameters, the results of LaSyn are competitive to published state-of-the-art works. The results demonstrate the quality of the augmented training data. △ Less

Submitted 24 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: 15 pages, 8 figures, 8 tables, Accepted to EMNLP 2023 Findings

arXiv:2309.11808 [pdf, other]

Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package

Authors: Marcin Rogowski, Brandon C. Y. Yeung, Oliver T. Schmidt, Romit Maulik, Lisandro Dalcin, Matteo Parsani, Gianmarco Mengaldo

Abstract: We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the… ▽ More We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the PySPOD (https://github.com/MathEXLab/PySPOD) library and makes use of the standard message passing interface (MPI) library, implemented in Python via mpi4py (https://mpi4py.readthedocs.io/en/stable/). An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, hel** to uncover new unexplored spatio-temporal patterns. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.03097 [pdf, other]

An Algorithm for Modelling Escalator Fixed Loss Energy for PHM and sustainable energy usage

Authors: Xuwen Hu, Jiaqi Qiu, Yu Lin, Inez Maria Zwetsloot, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

Abstract: Prognostic Health Management (PHM) is designed to assess and monitor the health status of systems, anticipate the onset of potential failure, and prevent unplanned downtime. In recent decades, collecting massive amounts of real-time sensor data enabled condition monitoring (CM) and consequently, detection of abnormalities to support maintenance decision-making. Additionally, the utilization of PHM… ▽ More Prognostic Health Management (PHM) is designed to assess and monitor the health status of systems, anticipate the onset of potential failure, and prevent unplanned downtime. In recent decades, collecting massive amounts of real-time sensor data enabled condition monitoring (CM) and consequently, detection of abnormalities to support maintenance decision-making. Additionally, the utilization of PHM techniques can support energy sustainability efforts by optimizing energy usage and identifying opportunities for energy-saving measures. Escalators are efficient machines for transporting people and goods, and measuring energy consumption in time can facilitate PHM of escalators. Fixed loss energy, or no-load energy, of escalators denotes the energy consumption by an unloaded escalator. Fixed loss energy varies over time indicating varying operating conditions. In this paper, we propose to use escalators' fixed loss energy for PHM. We propose an approach to compute daily fixed loss energy based on energy consumption sensor data. The proposed approach is validated using a set of experimental data. The advantages and disadvantages of each approach are also presented, and recommendations are given. Finally, to illustrate PHM, we set up an EWMA chart for monitoring the fixed loss over time and demonstrate the potential in reducing energy costs associated with escalator operation. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.03822 [pdf, other]

Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effects of eccentricity. Here, we present observational results for a waveform-independent search sensitive to eccentric black hole coalescences, covering the third observing run (O3) of the LIGO and Virgo detectors. We identified no new high-significance candidates beyond those that were already identified with searches focusing on quasi-circular binaries. We determine the sensitivity of our search to high-mass (total mass $M>70$ $M_\odot$) binaries covering eccentricities up to 0.3 at 15 Hz orbital frequency, and use this to compare model predictions to search results. Assuming all detections are indeed quasi-circular, for our fiducial population model, we place an upper limit for the merger rate density of high-mass binaries with eccentricities $0 < e \leq 0.3$ at $0.33$ Gpc$^{-3}$ yr$^{-1}$ at 90\% confidence level. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 24 pages, 5 figures

Report number: LIGO-P2300080

arXiv:2307.02739 [pdf]

Evaluating the predicted eruption times of geysers in Yellowstone National Park

Authors: Daniel J. Rhee, Ka Yee Yeung

Abstract: This study aims to evaluate the accuracy of predicted eruption times of popular geysers in the Yellowstone National Park. The Yellowstone National Park was the first national park in the United States and is known for its geothermal features consisting of many highly popular geysers such as the Old Faithful. Geysers are fascinating to national park visitors because their eruptions could range from… ▽ More This study aims to evaluate the accuracy of predicted eruption times of popular geysers in the Yellowstone National Park. The Yellowstone National Park was the first national park in the United States and is known for its geothermal features consisting of many highly popular geysers such as the Old Faithful. Geysers are fascinating to national park visitors because their eruptions could range from small bubbles to jets of water that are hundreds of meters high, and their eruptions could last from seconds to hours. To help tourists plan their visits, the US National Park Service and other independent groups publish predicted eruption times of popular geysers. We hypothesized that the models developed by the US National Park Service are very accurate with little discrepancy from independent analysis, as park rangers monitor the geysers constantly and likely adjust their models over time according to changing conditions underground, and patterns observed. In addition, since researchers in the park likely rely on these predictions, the models would need to be fine-tuned to ensure that no unnecessary effort or resources are wasted in probing the geysers for variables such as temperature and acidity. In this study, we focused on the Old Faithful and Beehive Geyser by downloading actual eruption times, conducting statistical regression analyses, studying the patterns of eruption times, and evaluating the accuracy of different statistical models. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2307.02572 [pdf, other]

Conditional Korhunen-Loéve regression model with Basis Adaptation for high-dimensional problems: uncertainty quantification and inverse modeling

Authors: Yu-Hong Yeung, Ramakrishna Tipireddy, David A. Barajas-Solano, Alexandre M. Tartakovsky

Abstract: We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems as a function of the systems' spatially heterogeneous parameter fields with applications to uncertainty quantification and parameter estimation in high-dimensional problems. Practitioners often formulate finite-dimensional representations of spatially heterogeneous parameter field… ▽ More We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems as a function of the systems' spatially heterogeneous parameter fields with applications to uncertainty quantification and parameter estimation in high-dimensional problems. Practitioners often formulate finite-dimensional representations of spatially heterogeneous parameter fields using truncated unconditional Karhunen-Loéve expansions (KLEs) for a certain choice of unconditional covariance kernel and construct surrogate models of the observable response with respect to the random variables in the KLE. When direct measurements of the parameter fields are available, we propose improving the accuracy of these surrogate models by representing the parameter fields via conditional Karhunen-Loéve expansions (CKLEs). CKLEs are constructed by conditioning the covariance kernel of the unconditional expansion on the direct measurements via Gaussian process regression and then truncating the corresponding KLE. We apply the proposed methodology to constructing surrogate models via the Basis Adaptation (BA) method of the stationary hydraulic head response, measured at spatially discrete observation locations, of a groundwater flow model of the Hanford Site, as a function of the 1,000-dimensional representation of the model's log-transmissivity field. We find that BA surrogate models of the hydraulic head based on CKLEs are more accurate than BA surrogate models based on unconditional expansions for forward uncertainty quantification tasks. Furthermore, we find that inverse estimates of the hydraulic transmissivity field computed using CKLE-based BA surrogate models are more accurate than those computed using unconditional BA surrogate models. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 29 pages, 4 figures, 5 tables

arXiv:2306.05436 [pdf, other]

Remaining Useful Life Modelling with an Escalator Health Condition Analytic System

Authors: Inez M. Zwetsloot, Yu Lin, Jiaqi Qiu, Lishuai Li, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

Abstract: The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic syste… ▽ More The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic system for escalators to support refurbishment decisions. The analytic system consists of four parts: 1) online data gathering and processing; 2) a dashboard for condition monitoring; 3) a health index model; and 4) remaining useful life prediction. The results can be used for a) predicting the remaining useful life of the escalators, in order to support asset replacement planning and b) monitoring the real-time condition of escalators; including alerts when vibration exceeds the threshold and signal diagnosis, giving an indication of possible root cause (components) of the alert signal. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 14 pages, 12 figures, 7 tables

arXiv:2306.02203 [pdf, other]

Coalitions in International Litigation: A Network Perspective

Authors: R. Mastrandrea, G. Antuofermo, M. Ovadek, T. Y. -C. Yeung, A. Dyevre, G. Caldarelli

Abstract: We apply network science principles to analyze the coalitions formed by European Union (EU) nations and institutions during litigation proceedings at the European Court of Justice. By constructing Friends and Foes networks, we explore their characteristics and dynamics through the application of cluster detection, motif analysis, and duplex analysis. Our findings demonstrate that the Friends and F… ▽ More We apply network science principles to analyze the coalitions formed by European Union (EU) nations and institutions during litigation proceedings at the European Court of Justice. By constructing Friends and Foes networks, we explore their characteristics and dynamics through the application of cluster detection, motif analysis, and duplex analysis. Our findings demonstrate that the Friends and Foes networks exhibit disassortative behavior, highlighting the inclination of nodes to connect with dissimilar nodes. Furthermore, there is a correlation among centrality measures, indicating that member states and institutions with a larger number of connections play a prominent role in bridging the network. An examination of the modularity of the networks reveals that coalitions tend to align along regional and institutional lines, rather than national government divisions. Additionally, an analysis of triadic binary motifs uncovers a greater level of reciprocity within the Foes network compared to the Friends network. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 13 pages 11 figures, style and bibtex files included

arXiv:2302.03676 [pdf, other]

doi 10.3847/1538-4365/acdc9f

Open data from the third observing run of LIGO, Virgo, KAGRA and GEO

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné, A. Allocca , et al. (1719 additional authors not shown)

Abstract: The global network of gravitational-wave observatories now includes five detectors, namely LIGO Hanford, LIGO Livingston, Virgo, KAGRA, and GEO 600. These detectors collected data during their third observing run, O3, composed of three phases: O3a starting in April of 2019 and lasting six months, O3b starting in November of 2019 and lasting five months, and O3GK starting in April of 2020 and lasti… ▽ More The global network of gravitational-wave observatories now includes five detectors, namely LIGO Hanford, LIGO Livingston, Virgo, KAGRA, and GEO 600. These detectors collected data during their third observing run, O3, composed of three phases: O3a starting in April of 2019 and lasting six months, O3b starting in November of 2019 and lasting five months, and O3GK starting in April of 2020 and lasting 2 weeks. In this paper we describe these data and various other science products that can be freely accessed through the Gravitational Wave Open Science Center at https://gwosc.org. The main dataset, consisting of the gravitational-wave strain time series that contains the astrophysical signals, is released together with supporting data useful for their analysis and documentation, tutorials, as well as analysis software packages. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 27 pages, 3 figures

Report number: LIGO-P2200316

arXiv:2301.11279 [pdf, other]

Gaussian process regression and conditional Karhunen-Loéve models for data assimilation in inverse problems

Authors: Yu-Hong Yeung, David A. Barajas-Solano, Alexandre M. Tartakovsky

Abstract: We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models of physical systems with spatially heterogeneous parameter fields. These fields are approximated using low-dimensional conditional Karhunen-Loéve expansions, which are constructed using Gaussian process regression models of these fields trained on the parameters'… ▽ More We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models of physical systems with spatially heterogeneous parameter fields. These fields are approximated using low-dimensional conditional Karhunen-Loéve expansions, which are constructed using Gaussian process regression models of these fields trained on the parameters' measurements. We then assimilate measurements of the state of the system and compute the maximum a posteriori estimate of the CKLE coefficients by solving a nonlinear least-squares problem. When solving this optimization problem, we efficiently compute the Jacobian of the vector objective by exploiting the sparsity structure of the linear system of equations associated with the forward solution of the physics problem. The CKLEMAP method provides better scalability compared to the standard MAP method. In the MAP method, the number of unknowns to be estimated is equal to the number of elements in the numerical forward model. On the other hand, in CKLEMAP, the number of unknowns (CKLE coefficients) is controlled by the smoothness of the parameter field and the number of measurements, and is in general much smaller than the number of discretization nodes, which leads to a significant reduction of computational cost with respect to the standard MAP method. To show its advantage in scalability, we apply CKLEMAP to estimate the transmissivity field in a two-dimensional steady-state subsurface flow model of the Hanford Site by assimilating synthetic measurements of transmissivity and hydraulic head. We find that the execution time of CKLEMAP scales nearly linearly as $N^{1.33}$, where $N$ is the number of discretization nodes, while the execution time of standard MAP scales as $N^{2.91}$. The CKLEMAP method improved execution time without sacrificing accuracy when compared to the standard MAP. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 27 pages, 7 figures

arXiv:2301.07891 [pdf]

Simulation of environmental impacts on the synthesis of carbyne with more than 6000 atoms for emerging continuously tunable energy barriers in CNT-based transistors

Authors: Chi Ho Wong, Yan Ming Yeung, Xin Zhao, Wing Cheung Law, Chak-yin Tang, Chee Leung Mak, Chi Wah Leung, Lei Shi, Rolf Lortz

Abstract: Transistors made up of carbon nanotubes CNT have demonstrated excellent current-voltage characteristics which outperform some high-grade silicon-based transistors. A continuously tunable energy barrier across semiconductor interfaces is desired to make the CNT-based transistors more robust. Despite the direct band gap of carbyne inside a CNT can be widely tuned by strain, the size of carbyne canno… ▽ More Transistors made up of carbon nanotubes CNT have demonstrated excellent current-voltage characteristics which outperform some high-grade silicon-based transistors. A continuously tunable energy barrier across semiconductor interfaces is desired to make the CNT-based transistors more robust. Despite the direct band gap of carbyne inside a CNT can be widely tuned by strain, the size of carbyne cannot be controlled easily. The production of a monoatomic chain with more than 6000 carbon atoms is an enormous technological challenge. To predict the optimal chain length of a carbyne in different molecular environments, we have developed a Monte Carlo model in which a finite-length carbyne with a size of 4000-15000 atoms is encapsulated by a CNT at finite temperatures. Our simulation shows that the stability of the carbyne@nanotube is strongly influenced by the nature and porosity of the CNT, the external pressure, the temperature and the chain length. We have observed an initiation of chain-breaking process in a compressed carbyne@nanotube. Our work provides much needed input for optimising the carbyne length to produce carbon chains much longer than 6000 atoms at ~300K. Design rules are proposed for synthesizing ~1% strained carbyne@(6,5)CNT as a component in CNT-based transistors to tune the energy barriers continuously. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2207.14558

arXiv:2212.05671 [pdf, other]

Saddle-Point Approach to Large-Time Volatility Smile

Authors: Chun Yat Yeung, Ali Hirsa

Abstract: We extend upon the saddle-point equation presented in [1] to derive large-time model-implied volatility smiles, providing its theoretical foundation and studying its applications in classical models. As long as characteristic function fulfills a Lévy-type scaling behavior in large time, the approach allows us to study analytically the large-time smile behaviors under specific models, and moreover,… ▽ More We extend upon the saddle-point equation presented in [1] to derive large-time model-implied volatility smiles, providing its theoretical foundation and studying its applications in classical models. As long as characteristic function fulfills a Lévy-type scaling behavior in large time, the approach allows us to study analytically the large-time smile behaviors under specific models, and moreover, to reach a very wide class of arbitrage-free model-inspired parametrizations, in the same manner as stochastic-volatility-inspired (SVI). △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2206.11250 [pdf, other]

Depth-aware Glass Surface Detection with Cross-modal Context Mining

Authors: Jiaying Lin, Yuen Hei Yeung, Rynson W. H. Lau

Abstract: Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflection… ▽ More Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflections, as a prior. However, they are all based on input RGB images.We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which can offer additional insights to complement the RGB image features for glass surface detection. In this paper, we propose a novel framework for glass surface detection by incorporating RGB-D information, with two novel modules: (1) a cross-modal context mining (CCM) module to adaptively learn individual and mutual context features from RGB and depth information, and (2) a depth-missing aware attention (DAA) module to explicitly exploit spatial locations where missing depths occur to help detect the presence of glass surfaces. In addition, we propose a large-scale RGB-D glass surface detection dataset, called \textit{RGB-D GSD}, for RGB-D glass surface detection. Our dataset comprises 3,009 real-world RGB-D glass surface images with precise annotations. Extensive experimental results show that our proposed model outperforms state-of-the-art methods. △ Less

Submitted 22 June, 2022; originally announced June 2022.

arXiv:2204.09125 [pdf]

Mobility Analysis Workflow (MAW): An accessible, interoperable, and reproducible container system for processing raw mobile data

Authors: Xiangyang Guan, Cynthia Chen, Ian Ren, Ka Yee Yeung, Ling-Hong Hung, Wes J. Lloyd

Abstract: Mobility analysis, or understanding and modeling of people's mobility patterns in terms of when, where, and how people move from one place to another, is fundamentally important as such information is the basis for large-scale investment decisions on the nation's multi-modal transportation infrastructure. Recent rise of using passively generated mobile data from mobile devices have raised question… ▽ More Mobility analysis, or understanding and modeling of people's mobility patterns in terms of when, where, and how people move from one place to another, is fundamentally important as such information is the basis for large-scale investment decisions on the nation's multi-modal transportation infrastructure. Recent rise of using passively generated mobile data from mobile devices have raised questions on using such data for capturing the mobility patterns of a population because: 1) there is a great variety of different kinds of mobile data and their respective properties are unknown; and 2) data pre-processing and analysis methods are often not explicitly reported. The high stakes involved with mobility analysis and issues associated with the passively generated mobile data call for mobility analysis (including data, methods and results) to be accessible to all, interoperable across different computing systems, reproducible and reusable by others. In this study, a container system named Mobility Analysis Workflow (MAW) that integrates data, methods and results, is developed. Built upon the containerization technology, MAW allows its users to easily create, configure, modify, execute and share their methods and results in the form of Docker containers. Tools for operationalizing MAW are also developed and made publicly available on GitHub. One use case of MAW is the comparative analysis for the impacts of different pre-processing and mobility analysis methods on inferred mobility patterns. This study finds that different pre-processing and analysis methods do have impacts on the resulting mobility patterns. The creation of MAW and a better understanding of the relationship between data, methods and resulting mobility patterns as facilitated by MAW represent an important first step toward promoting reproducibility and reusability in mobility analysis with passively-generated data. △ Less

Submitted 19 April, 2022; originally announced April 2022.

MSC Class: 91C20 ACM Class: J.6

arXiv:2204.05460 [pdf, other]

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

Authors: Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

Abstract: This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s… ▽ More This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped symbol sequence, aligning recognized symbol sequence with target text to determine locations and types of required edit operations, and generating the corrected speech. Experiments show that the quality and naturalness of corrected speech depend on the performance of speech recognition and alignment modules, as well as the granularity level of editing operations. The proposed system is evaluated on two corpora: a manually perturbed version of VCTK and L2-ARCTIC. The results demonstrate that our system is able to correct mispronunciation and reduce accent in speech recordings. Audio samples are available online for demonstration https://daxintan-cuhk.github.io/CorrectSpeech/ . △ Less

Submitted 13 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted by ISCSLP 2022

arXiv:2204.02731 [pdf, other]

doi 10.1103/PhysRevE.105.044406

Revealing directed effective connectivity of cortical neuronal networks from measurements

Authors: Chumin Sun, K. C. Lin, C. Y. Yeung, Emily S. C. Ching, Yu-Ting Huang, Pik-Yin Lai, C. K. Chan

Abstract: In the study of biological networks, one of the major challenges is to understand the relationships between network structure and dynamics. In this paper, we model in vitro cortical neuronal cultures as stochastic dynamical systems and apply a method that reconstructs directed networks from dynamics [Ching and Tam, Phys. Rev. E 95, 010301(R), 2017] to reveal directed effective connectivity, namely… ▽ More In the study of biological networks, one of the major challenges is to understand the relationships between network structure and dynamics. In this paper, we model in vitro cortical neuronal cultures as stochastic dynamical systems and apply a method that reconstructs directed networks from dynamics [Ching and Tam, Phys. Rev. E 95, 010301(R), 2017] to reveal directed effective connectivity, namely the directed links and synaptic weights, of the neuronal cultures from voltage measurements recorded by a multielectrode array. The effective connectivity so obtained reproduces several features of cortical regions in rats and monkeys and has similar network properties as the synaptic network of the nematode C. elegans, the only organism whose entire nervous system has been mapped out as of today. The distribution of the incoming degree is bimodal and the distributions of the average incoming and outgoing synaptic strength are non-Gaussian with long tails. The effective connectivity captures different information from the commonly studied functional connectivity, estimated using statistical correlation between spiking activities. The average synaptic strengths of excitatory incoming and outgoing links are found to increase with the spiking activity in the estimated effective connectivity but not in the functional connectivity estimated using the same sets of voltage measurements. These results thus demonstrate that the reconstructed effective connectivity can capture the general properties of synaptic connections and better reveal relationships between network structure and dynamics. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2201.12155 [pdf, other]

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng

Abstract: Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to r… ▽ More Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory. The linguistic theory requires that any monolingual fragment that occurs in the code-switching sentence must occur in one of the monolingual sentences. The theory establishes a bridge between monolingual data and code-switching data. We leverage this linguistics theory to design the code-switching E2E ASR model. The proposed model efficiently transfers language knowledge from rich monolingual data to improve the performance of the code-switching ASR model. We evaluate our model on ASRU 2019 Mandarin-English code-switching challenge dataset. Compared to the baseline model, our proposed model achieves a 17.12% relative error reduction. △ Less

Submitted 29 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: arXiv admin note: text overlap with arXiv:2010.14798,the paper has been accepted by Insterspeech 2022

arXiv:2201.10207 [pdf, other]

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

Authors: Wenyong Huang, Zhenhe Zhang, Yu Ting Yeung, Xin Jiang, Qun Liu

Abstract: We introduce a new approach for speech pre-training named SPIRAL which works by learning denoising representation of perturbed data in a teacher-student framework. Specifically, given a speech utterance, we first feed the utterance to a teacher network to obtain corresponding representation. Then the same utterance is perturbed and fed to a student network. The student network is trained to output… ▽ More We introduce a new approach for speech pre-training named SPIRAL which works by learning denoising representation of perturbed data in a teacher-student framework. Specifically, given a speech utterance, we first feed the utterance to a teacher network to obtain corresponding representation. Then the same utterance is perturbed and fed to a student network. The student network is trained to output representation resembling that of the teacher. At the same time, the teacher network is updated as moving average of student's weights over training steps. In order to prevent representation collapse, we apply an in-utterance contrastive loss as pre-training objective and impose position randomization on the input to the teacher. SPIRAL achieves competitive or better results compared to state-of-the-art speech pre-training method wav2vec 2.0, with significant reduction of training cost (80% for BASE model, 65% for LARGE model). Furthermore, we address the problem of noise-robustness that is critical to real-world speech applications. We propose multi-condition pre-training by perturbing the student's input with various types of additive noise. We demonstrate that multi-condition pre-trained SPIRAL models are more robust to noisy speech (9.0% - 13.3% relative word error rate reduction on real noisy test data), compared to applying multi-condition training solely in the fine-tuning stage. Source code is available at https://github.com/huawei-noah/Speech-Backbones/tree/main/SPIRAL. △ Less

Submitted 6 March, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: ICLR 2022

arXiv:2111.08191 [pdf, other]

CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

Authors: Nianzu Zheng, Liqun Deng, Wenyong Huang, Yu Ting Yeung, Baohua Xu, Yuanyuan Guo, Yasheng Wang, Xiao Chen, Xin Jiang, Qun Liu

Abstract: Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD.… ▽ More Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD. We utilize conv-transformer structure to encode input speech in a streaming manner. A coupled cross-attention (CoCA) mechanism is proposed to integrate frame-level acoustic features with encoded reference linguistic features. CoCA also enables our model to perform mispronunciation classification with whole utterances. The proposed model allows system fusion between the streaming output and mispronunciation classification output for further performance enhancement. We evaluate CoCA-MDD on publicly available corpora. CoCA-MDD achieves F1 scores of 57.03% and 60.78% for streaming and fusion modes respectively on L2-ARCTIC. For phone-level pronunciation scoring, CoCA-MDD achieves 0.58 Pearson correlation coefficient (PCC) value on SpeechOcean762. △ Less

Submitted 29 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 5 pages, 4 figures, Accepted by INTERSPEECH 2022

arXiv:2108.00037 [pdf, other]

doi 10.1029/2021WR031023

Physics-Informed Machine Learning Method for Large-Scale Data Assimilation Problems

Authors: Yu-Hong Yeung, David A. Barajas-Solano, Alexandre M. Tartakovsky

Abstract: We develop a physics-informed machine learning approach for large-scale data assimilation and parameter estimation and apply it for estimating transmissivity and hydraulic head in the two-dimensional steady-state subsurface flow model of the Hanford Site given synthetic measurements of said variables. In our approach, we extend the physics-informed conditional Karhunen-Loéve expansion (PICKLE) met… ▽ More We develop a physics-informed machine learning approach for large-scale data assimilation and parameter estimation and apply it for estimating transmissivity and hydraulic head in the two-dimensional steady-state subsurface flow model of the Hanford Site given synthetic measurements of said variables. In our approach, we extend the physics-informed conditional Karhunen-Loéve expansion (PICKLE) method for modeling subsurface flow with unknown flux (Neumann) and varying head (Dirichlet) boundary conditions. We demonstrate that the PICKLE method is comparable in accuracy with the standard maximum a posteriori (MAP) method, but is significantly faster than MAP for large-scale problems. Both methods use a mesh to discretize the computational domain. In MAP, the parameters and states are discretized on the mesh; therefore, the size of the MAP parameter estimation problem directly depends on the mesh size. In PICKLE, the mesh is used to evaluate the residuals of the governing equation, while the parameters and states are approximated by the truncated conditional Karhunen-Loéve expansions with the number of parameters controlled by the smoothness of the parameter and state fields, and not by the mesh size. For a considered example, we demonstrate that the computational cost of PICKLE increases near linearly (as $N_{FV}^{1.15}$) with the number of grid points $N_{FV}$, while that of MAP increases much faster as $N_{FV}^{3.28}$. We demonstrated that once trained for one set of Dirichlet boundary conditions (i.e., one river stage), the PICKLE method provides accurate estimates of the hydraulic head for any value of the Dirichlet boundary conditions (i.e., for any river stage). △ Less

Submitted 30 July, 2021; originally announced August 2021.

Comments: 28 pages, 9 figures submitted to Water Resources Research

arXiv:2107.01554 [pdf, other]

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

Authors: Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

Abstract: This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bi… ▽ More This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bidirectional fusion are proposed to effectively incorporate the contextual information related to the edited region and achieve smooth transition at both left and right boundaries. Distortion introduced to the unmodified parts of the utterance is alleviated. The EditSpeech system is developed and evaluated on English and Chinese in multi-speaker scenarios. Objective and subjective evaluation demonstrate that EditSpeech outperforms a few baseline systems in terms of low spectral distortion and preferred speech quality. Audio samples are available online for demonstration https://daxintan-cuhk.github.io/EditSpeech/ . △ Less

Submitted 7 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: Accepted by ASRU 2021

arXiv:2106.10132 [pdf, other]

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

Abstract: One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally ignores the correlation between different speech representations during training, which causes leakage of content information into the speaker representation and t… ▽ More One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally ignores the correlation between different speech representations during training, which causes leakage of content information into the speaker representation and thus degrades VC performance. To alleviate this issue, we employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training, to achieve proper disentanglement of content, speaker and pitch representations, by reducing their inter-dependencies in an unsupervised manner. Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations for retaining source linguistic content and intonation variations, while capturing target speaker characteristics. In doing so, the proposed approach achieves higher speech naturalness and speaker similarity than current state-of-the-art one-shot VC systems. Our code, pre-trained models and demo are available at https://github.com/Wendison/VQMIVC. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: Accepted to Interspeech 2021. Code, pre-trained models and demo are available at https://github.com/Wendison/VQMIVC

arXiv:2106.10127 [pdf, other]

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

Abstract: Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domai… ▽ More Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domain, due to high costs of annotating sizeable datasets. This paper makes a first attempt to formulate cross-domain DSD as an unsupervised domain adaptation (UDA) problem. We use labelled source-domain data and unlabelled target-domain data, and propose a multi-task learning strategy, including dysarthria presence classification (DPC), domain adversarial training (DAT) and mutual information minimization (MIM), which aim to learn dysarthria-discriminative and domain-invariant biomarker embeddings. Specifically, DPC helps biomarker embeddings capture critical indicators of dysarthria; DAT forces biomarker embeddings to be indistinguishable in source and target domains; and MIM further reduces the correlation between biomarker embeddings and domain-related cues. By treating the UASPEECH and TORGO corpora respectively as the source and target domains, experiments show that the incorporation of UDA attains absolute increases of 22.2% and 20.0% respectively in utterance-level weighted average recall and speaker-level accuracy. △ Less

Submitted 18 June, 2021; originally announced June 2021.

Comments: Accepted to Interspeech 2021

arXiv:2012.12704 [pdf, ps, other]

Estimating The Effect Of Subscription based Streaming Services On The Demand For Game Consoles

Authors: Tung Yu Marco Chan, Yue Zhang, Tsun Yi Yeung

Abstract: In this paper, we attempt to estimate the effect of the implementation of subscription-based streaming services on the demand of the associated game consoles. We do this by applying the BLP demand estimation model proposed by Berry (1994). This results in a linear demand specification which can be identified using conventional identification methods such as instrumental variables estimation and fi… ▽ More In this paper, we attempt to estimate the effect of the implementation of subscription-based streaming services on the demand of the associated game consoles. We do this by applying the BLP demand estimation model proposed by Berry (1994). This results in a linear demand specification which can be identified using conventional identification methods such as instrumental variables estimation and fixed-effects models. We find that given our dataset, the two-stage least squares (2SLS) regression provides us with convincing estimates that subscription-based streaming services does have a positive effect on the demand of game consoles as proposed by the general principle of complementary goods. △ Less

Submitted 23 December, 2020; originally announced December 2020.

arXiv:2010.11657 [pdf, other]

The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

Authors: Renyu Wang, Ruilin Tong, Yu Ting Yeung, Xiao Chen

Abstract: This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020. Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals. We replace conventional energy-based voice activity detection (VAD) with a neural network based VAD. The neural… ▽ More This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020. Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals. We replace conventional energy-based voice activity detection (VAD) with a neural network based VAD. The neural network based VAD provides more accurate annotation of speech segments containing only background music, noise, and other interference, which is crucial to diarisation performance. We apply agglomerative hierarchical clustering (AHC) of x-vectors and variational Bayesian hidden Markov model (VB-HMM) based iterative clustering for speaker clustering. Experimental results demonstrate that our proposed system achieves substantial improvements over the baseline system, yielding diarisation error rate (DER) of 10.45%, and Jacard error rate (JER) of 22.46% on the evaluation set. △ Less

Submitted 23 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 5 pages, 2 figures, A report about our diarisation system for VoxCeleb Challenge, Interspeech conference workshop

arXiv:2008.05750 [pdf, other]

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Authors: Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen

Abstract: Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with encoder-decoder architecture, is only suitable for offline ASR. It relies on an attention mechanism to learn alignments, and encodes input audio bidirectionally. The hig… ▽ More Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with encoder-decoder architecture, is only suitable for offline ASR. It relies on an attention mechanism to learn alignments, and encodes input audio bidirectionally. The high computation cost of Transformer decoding also limits its use in production streaming systems. To make Transformer suitable for streaming ASR, we explore Transducer framework as a streamable way to learn alignments. For audio encoding, we apply unidirectional Transformer with interleaved convolution layers. The interleaved convolution layers are used for modeling future context which is important to performance. To reduce computation cost, we gradually downsample acoustic input, also with the interleaved convolution layers. Moreover, we limit the length of history context in self-attention to maintain constant computation cost for each decoding step. We show that this architecture, named Conv-Transformer Transducer, achieves competitive performance on LibriSpeech dataset (3.6\% WER on test-clean) without external language models. The performance is comparable to previously published streamable Transformer Transducer and strong hybrid streaming ASR systems, and is achieved with smaller look-ahead window (140~ms), fewer parameters and lower frame rate. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: Accepted by INTERSPEECH 2020

arXiv:2007.09028 [pdf, other]

Sequential Explanations with Mental Model-Based Policies

Authors: Arnold YS Yeung, Shalmali Joshi, Joseph Jay Williams, Frank Rudzicz

Abstract: The act of explaining across two parties is a feedback loop, where one provides information on what needs to be explained and the other provides an explanation relevant to this information. We apply a reinforcement learning framework which emulates this format by providing explanations based on the explainee's current mental model. We conduct novel online human experiments where explanations gener… ▽ More The act of explaining across two parties is a feedback loop, where one provides information on what needs to be explained and the other provides an explanation relevant to this information. We apply a reinforcement learning framework which emulates this format by providing explanations based on the explainee's current mental model. We conduct novel online human experiments where explanations generated by various explanation methods are selected and presented to participants, using policies which observe participants' mental models, in order to optimize an interpretability proxy. Our results suggest that mental model-based policies (anchored in our proposed state representation) may increase interpretability over multiple sequential explanations, when compared to a random selection baseline. This work provides insight into how to select explanations which increase relevant information for users, and into conducting human-grounded experimentation to understand interpretability. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted into ICML 2020 Workshop on Human Interpretability in Machine Learning (Spotlight)

arXiv:2005.11491 [pdf, other]

Container Profiler: Profiling Resource Utilization of Containerized Big Data Pipelines

Authors: Varik Hoang, Ling-Hong Hung, David Perez, Huazeng Deng, Raymond Schooley, Niharika Arumilli, Ka Yee Yeung, Wes Lloyd

Abstract: This paper presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over fifty Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time series profiling at a co… ▽ More This paper presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over fifty Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time series profiling at a configurable sampling interval to enable continuous monitoring of the resources consumed by containerized tasks and pipelines. To investigate the utility of the Container Profiler, we profile the resource utilization requirements of a multi-stage bioinformatics analytical pipeline (RNA sequencing using unique molecular identifiers). We examine profiling metrics to assess patterns of CPU, disk, and network resource utilization across the different stages of the pipeline. We also quantify the profiling overhead of our Container Profiler tool to assess the impact of profiling a running pipeline with different levels of profiling granularity verifying that impacts are negligible. The Container Profiler provides a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized applications that run locally or on the cloud. This can help identify bottlenecks where more resources are needed to improve performance. △ Less

Submitted 7 February, 2023; v1 submitted 23 May, 2020; originally announced May 2020.

arXiv:1903.03493 [pdf]

Perfect Absorption Metasurfaces with Multiple Meta-Resonances

Authors: Suet To Tang, Joshua Chau, Ka Yan Au Yeung, Z. Yang

Abstract: We show that the hybrid resonances of a DMR backed by a cavity are meta-resonances, in that they can be made as perfect as possible by fine tuning the structural parameters but without the requirements of extreme materials properties, such as zero dissipation. Instead, dissipation in the DMR is essential for the realization of perfect meta-resonances. We experimentally demonstrate such perfection… ▽ More We show that the hybrid resonances of a DMR backed by a cavity are meta-resonances, in that they can be made as perfect as possible by fine tuning the structural parameters but without the requirements of extreme materials properties, such as zero dissipation. Instead, dissipation in the DMR is essential for the realization of perfect meta-resonances. We experimentally demonstrate such perfection by tuning the structure of a HMR till its reflection is as low as 0.426 % . Besides the primary meta-resonances that are originated from the strong resonances of the DMR, weak hitchhiker resonances can also produce meta-resonances as perfect as the primary ones. The depth of the reflection dips is insensitive to the strength of the resonances involved, but critically depends on the degree of impedance match to air brought mostly by fine tuning the structure parameters, such as the cavity volume, the mass of the platelet, or the pre-tension in the membrane. Using the eccentricity of the platelet position in the DMR, a number of resonances and anti-resonances are generated, resulting in up to five meta-resonances within the range of 200 Hz to 1000 Hz, with the highest reflection being 7 % and the lowest being 1.2 % . Other means of introducing hitchhiker meta-resonances are also reported. △ Less

Submitted 16 March, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

arXiv:1812.10071 [pdf, other]

Coupled Recurrent Network (CRN)

Authors: Lin Sun, Kui Jia, Yuejia Shen, Silvio Savarese, Dit Yan Yeung, Bertram E. Shi

Abstract: Many semantic video analysis tasks can benefit from multiple, heterogenous signals. For example, in addition to the original RGB input sequences, sequences of optical flow are usually used to boost the performance of human action recognition in videos. To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel strea… ▽ More Many semantic video analysis tasks can benefit from multiple, heterogenous signals. For example, in addition to the original RGB input sequences, sequences of optical flow are usually used to boost the performance of human action recognition in videos. To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel streams of Recurrent Neural Networks (RNNs). However, two-stream RNNs do not fully exploit the reciprocal information contained in the multiple signals, let alone exploit it in a recurrent manner. To this end, we propose in this paper a novel recurrent architecture, termed Coupled Recurrent Network (CRN), to deal with multiple input sources. In CRN, the parallel streams of RNNs are coupled together. Key design of CRN is a Recurrent Interpretation Block (RIB) that supports learning of reciprocal feature representations from multiple signals in a recurrent manner. Different from RNNs which stack the training loss at each time step or the last time step, we propose an effective and efficient training strategy for CRN. Experiments show the efficacy of the proposed CRN. In particular, we achieve the new state of the art on the benchmark datasets of human action recognition and multi-person pose estimation. △ Less

Submitted 25 March, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

arXiv:1811.08621 [pdf]

Super Dam** of Mechanical Vibrations

Authors: Ka Yan Au Yeung, Brian Yang, Liang Sun, Kehang Bai, Z. Yang

Abstract: We report the phenomenon of coherent super decay, where a linear sum of several damped oscillators can collectively decay much faster than the individual ones in the first stage, followed by stagnating ones after more than 90 percent of the energy has already been dissipated. The parameters of the damped oscillators for CSD are determined by the process of response function decomposition, which is… ▽ More We report the phenomenon of coherent super decay, where a linear sum of several damped oscillators can collectively decay much faster than the individual ones in the first stage, followed by stagnating ones after more than 90 percent of the energy has already been dissipated. The parameters of the damped oscillators for CSD are determined by the process of response function decomposition, which is to use several slow decay response functions to approximate the response function of a fast decay reference resonator. Evidence established in experiments and in finite element simulations not only strongly supported the numerical investigations, but also uncovered an unexplored region of the tuned mass damper parameter space where TMDs with total mass less than 0.2 percent of a primary free body can damp its first resonance up to a dam** ratio of 4.6 percent. Our findings also shed light onto the intriguing underline connections between complex functions with different singular points. △ Less

Submitted 3 December, 2018; v1 submitted 21 November, 2018; originally announced November 2018.

arXiv:1811.00328 [pdf, other]

AMPS: A Real-time Mesh Cutting Algorithm for Surgical Simulations

Authors: Yu-Hong Yeung, Alex Pothen, Jessica Crouch

Abstract: We present the AMPS algorithm, a finite element solution method that combines principal submatrix updates and Schur complement techniques, well-suited for interactive simulations of deformation and cutting of finite element meshes. Our approach features real-time solutions to the updated stiffness matrix systems to account for interactive changes in mesh connectivity and boundary conditions. Updat… ▽ More We present the AMPS algorithm, a finite element solution method that combines principal submatrix updates and Schur complement techniques, well-suited for interactive simulations of deformation and cutting of finite element meshes. Our approach features real-time solutions to the updated stiffness matrix systems to account for interactive changes in mesh connectivity and boundary conditions. Updates are accomplished by an augmented matrix formulation of the stiffness equations to maintain its consistency with changes to the underlying model without refactorization at each timestep. As changes accumulate over multiple simulation timesteps, the augmented solution algorithm enables tens or hundreds of updates per second. Acceleration schemes that exploit sparsity, memoization and parallelization lead to the updates being computed in real-time. The complexity analysis and experimental results for this method demonstrate that it scales linearly with the problem size. Results for cutting and deformation of 3D elastic models are reported for meshes with node counts up to 50,000, and involve models of astigmatism surgery and the brain. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: 20 pages, 9 figures, 3 tables

arXiv:1807.11659 [pdf]

Serverless computing provides on-demand high performance computing for biomedical research

Authors: Dimitar Kumanov, Ling-Hong Hung, Wes Lloyd, Ka Yee Yeung

Abstract: Cloud computing offers on-demand, scalable computing and storage, and has become an essential resource for the analyses of big biomedical data. The usual approach to cloud computing requires users to reserve and provision virtual servers. An emerging alternative is to have the provider allocate machine resources dynamically. This type of serverless computing has tremendous potential for biomedical… ▽ More Cloud computing offers on-demand, scalable computing and storage, and has become an essential resource for the analyses of big biomedical data. The usual approach to cloud computing requires users to reserve and provision virtual servers. An emerging alternative is to have the provider allocate machine resources dynamically. This type of serverless computing has tremendous potential for biomedical research in terms of ease-of-use, instantaneous scalability and cost effectiveness. In our proof of concept example, we demonstrate how serverless computing provides low cost access to hundreds of CPUs, on demand, with little or no setup. In particular, we illustrate that the all-against-all pairwise comparison among all unique human proteins can be accomplished in approximately 2 minutes, at a cost of less than $1, using Amazon Web Services Lambda. This is a 250x speedup compared to running the same task on a typical laptop computer. △ Less

Submitted 31 July, 2018; originally announced July 2018.

arXiv:1708.03958 [pdf, other]

Lattice Long Short-Term Memory for Human Action Recognition

Authors: Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese

Abstract: Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term Memory (… ▽ More Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term Memory (LSTM), are able to learn temporal motion dynamics. However, naively applying RNNs to video sequences in a convolutional manner implicitly assumes that motions in videos are stationary across different spatial locations. This assumption is valid for short-term motions but invalid when the duration of the motion is long. In this work, we propose Lattice-LSTM (L2STM), which extends LSTM by learning independent hidden state transitions of memory cells for individual spatial locations. This method effectively enhances the ability to model dynamics across time and addresses the non-stationary issue of long-term motion dynamics without significantly increasing the model complexity. Additionally, we introduce a novel multi-modal training procedure for training our network. Unlike traditional two-stream architectures which use RGB and optical flow information as input, our two-stream model leverages both modalities to jointly train both input gates and both forget gates in the network rather than treating the two streams as separate entities with no information about the other. We apply this end-to-end system to benchmark datasets (UCF-101 and HMDB-51) of human action recognition. Experiments show that on both datasets, our proposed method outperforms all existing ones that are based on LSTM and/or CNNs of similar model complexities. △ Less

Submitted 13 August, 2017; originally announced August 2017.

Comments: ICCV2017

arXiv:1706.03147 [pdf, ps, other]

AMPS: An Augmented Matrix Formulation for Principal Submatrix Updates with Application to Power Grids

Authors: Yu-Hong Yeung, Alex Pothen, Mahantesh Halappanavar, Zhenyu Huang

Abstract: We present AMPS, an augmented matrix approach to update the solution to a linear system of equations when the matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to perform N - k contingency analysis, i.e., determine the state of the system when exactly k links from N fail. Our algorithms augm… ▽ More We present AMPS, an augmented matrix approach to update the solution to a linear system of equations when the matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to perform N - k contingency analysis, i.e., determine the state of the system when exactly k links from N fail. Our algorithms augment the matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. We analyze the time complexity of both algorithms, and show that it is bounded by the number of nonzeros in a subset of the columns of the Cholesky factor that are selected by the nonzeros in the sparse right-hand-side vector. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing N - k contingency analysis on a 778 thousand bus grid, updating a solution with k = 20 elements in 16 milliseconds on an Intel Xeon processor. △ Less

Submitted 9 June, 2017; originally announced June 2017.

Comments: 19 pages, 4 figures, 2 tables, SIAM Journal on Scientific Computing

MSC Class: 65F50; 65F10; 65F05; 65Y20 ACM Class: G.1.3, G.1.10

arXiv:1603.04835 [pdf, other]

A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data

Authors: William Chad Young, Ka Yee Yeung, Adrian E. Raftery

Abstract: Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculatin… ▽ More Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach. △ Less

Submitted 15 March, 2016; originally announced March 2016.

arXiv:1602.06316 [pdf, other]

Model-based clustering with data correction for removing artifacts in gene expression data

Authors: William Chad Young, Ka Yee Yeung, Adrian E. Raftery

Abstract: The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution leading to… ▽ More The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis. △ Less

Submitted 19 February, 2016; originally announced February 2016.

Comments: 28 pages

arXiv:1212.4170 [pdf]

doi 10.1063/1.4775668

Two-Path Solid-State Interferometry Using Ultra-Subwavelength 2D Plasmonic Waves

Authors: Kitty Y. M. Yeung, Hosang Yoon, William Andress, Ken West, Loren Pfeiffer, Donhee Ham

Abstract: We report an on-chip solid-state Mach-Zehnder interferometer operating on two-dimensional (2D) plasmonic waves at microwave frequencies. Two plasmonic paths are defined with GaAs/AlGaAs 2D electron gas 80 nm below a metallic gate. The gated 2D plasmonic waves achieve a velocity of ~c/300 (c: free-space light speed). Due to this ultra-subwavelength confinement, the resolution of the 2D plasmonic in… ▽ More We report an on-chip solid-state Mach-Zehnder interferometer operating on two-dimensional (2D) plasmonic waves at microwave frequencies. Two plasmonic paths are defined with GaAs/AlGaAs 2D electron gas 80 nm below a metallic gate. The gated 2D plasmonic waves achieve a velocity of ~c/300 (c: free-space light speed). Due to this ultra-subwavelength confinement, the resolution of the 2D plasmonic interferometer is two orders of magnitude higher than that of its electromagnetic counterpart at a given frequency. This GHz proof-of-concept at cryogenic temperatures can be scaled to the THz IR range for room temperature operation, while maintaining the benefits of the ultra-subwavelength confinement. △ Less

Submitted 17 December, 2012; originally announced December 2012.

Comments: 18 pages, 11 figures. The article has been submitted to Applied Physics Letters. After it is published, it will be found at http://apl.aip.org/

arXiv:1207.1112 [pdf, ps, other]

doi 10.1103/PhysRevB.85.245445

Measuring the mode volume of plasmonic nanocavities using coupled optical emitters

Authors: Kasey J. Russell, Kitty Y. M. Yeung, Evelyn Hu

Abstract: Metallic optical systems can confine light to deep sub-wavelength dimensions, but verifying the level of confinement at these length scales typically requires specialized techniques and equipment for probing the near-field of the structure. We experimentally measured the confinement of a metal-based optical cavity by using the cavity modes themselves as a sensitive probe of the cavity characterist… ▽ More Metallic optical systems can confine light to deep sub-wavelength dimensions, but verifying the level of confinement at these length scales typically requires specialized techniques and equipment for probing the near-field of the structure. We experimentally measured the confinement of a metal-based optical cavity by using the cavity modes themselves as a sensitive probe of the cavity characteristics. By perturbing the cavity modes with conformal dielectric layers of sub-nm thickness using atomic layer deposition, we find the exponential decay length of the modes to be less than 5% of the free-space wavelength (λ) and the mode volume to be of order λ^3/1000. These results provide experimental confirmation of the deep sub-wavelength confinement capabilities of metal-based optical cavities. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: 11 pages, 4 figures

Journal ref: Physical Review B, v. 85, pp. 245445 (2012)

arXiv:0909.0927 [pdf, ps, other]

Minimizing atomic configurations of short range pair potentials in two dimensions: crystallization in the Wulff shape

Authors: Yuen Au Yeung, Gero Friesecke, Bernd Schmidt

Abstract: We investigate ground state configurations of atomic systems in two dimensions interacting via short range pair potentials. As the number of particles tends to infinity, we show that low-energy configurations converge to a macroscopic cluster of finite surface area and constant density, the latter being given by the density of atoms per unit volume in the triangular lattice. In the special case… ▽ More We investigate ground state configurations of atomic systems in two dimensions interacting via short range pair potentials. As the number of particles tends to infinity, we show that low-energy configurations converge to a macroscopic cluster of finite surface area and constant density, the latter being given by the density of atoms per unit volume in the triangular lattice. In the special case of the Heitmann-Radin sticky disc potential and exact ground states, we show that the macroscopic cluster has a (unique) Wulff shape. This is done by showing that the atomistic energy, after subtracting off a bulk part and re-scaling, Gamma-converges to a macroscopic anisotropic surface energy. △ Less

Submitted 4 September, 2009; originally announced September 2009.

MSC Class: 70C20; 49-XX; 82B24

Showing 1–43 of 43 results for author: Yeung, Y