-
Locality enhanced dynamic biasing and sampling strategies for contextual ASR
Authors:
Md Asif Jalal,
Pablo Peso Parada,
George Pavlidis,
Vasileios Moschopoulos,
Karthikeyan Saravanan,
Chrysovalantis-Giorgos Kontoulis,
Jisi Zhang,
Anastasios Drosou,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t…
▽ More
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the training of CB for ASR with correlation plots between the bias embeddings among various training stages. Secondly, we introduce a neighbourhood attention (NA) that localizes self attention (SA) to the nearest neighbouring frames to further refine the CB output. The results show that this proposed approach provides on average a 25.84% relative WER improvement on LibriSpeech sets and rare-word evaluation compared to the baseline.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Consistency Based Unsupervised Self-training For ASR Personalisation
Authors:
Jisi Zhang,
Vandana Rajan,
Haaris Mehmood,
David Tuckey,
Pablo Peso Parada,
Md Asif Jalal,
Karthikeyan Saravanan,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model…
▽ More
On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model robustness. The majority of ASR personalisation methods assume labelled user data for supervision. Personalisation without any labelled data is challenging due to limited data size and poor quality of recorded audio samples. This work addresses unsupervised personalisation by develo** a novel consistency based training method via pseudo-labelling. Our method achieves a relative Word Error Rate Reduction (WERR) of 17.3% on unlabelled training data and 8.1% on held-out data compared to a pre-trained model, and outperforms the current state-of-the art methods.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Point Cloud Completion by Learning Shape Priors
Authors:
Xiaogang Wang,
Marcelo H Ang Jr,
Gim Hee Lee
Abstract:
In view of the difficulty in reconstructing object details in point cloud completion, we propose a shape prior learning method for object completion. The shape priors include geometric information in both complete and the partial point clouds. We design a feature alignment strategy to learn the shape prior from complete points, and a coarse to fine strategy to incorporate partial prior in the fine…
▽ More
In view of the difficulty in reconstructing object details in point cloud completion, we propose a shape prior learning method for object completion. The shape priors include geometric information in both complete and the partial point clouds. We design a feature alignment strategy to learn the shape prior from complete points, and a coarse to fine strategy to incorporate partial prior in the fine stage. To learn the complete objects prior, we first train a point cloud auto-encoder to extract the latent embeddings from complete points. Then we learn a map** to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses. The feature alignment losses consist of a L2 distance and an adversarial loss obtained by Maximum Mean Discrepancy Generative Adversarial Network (MMD-GAN). The L2 distance optimizes the partial features towards the complete ones in the feature space, and MMD-GAN decreases the statistical distance of two point features in a Reproducing Kernel Hilbert Space. We achieve state-of-the-art performances on the point cloud completion task. Our code is available at https://github.com/xiaogangw/point-cloud-completion-shape-prior.
△ Less
Submitted 15 July, 2021; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Joint Orthogonal Band and Power Allocation for Energy Fairness in WPT System with Nonlinear Logarithmic Energy Harvesting Model
Authors:
Jaeseob Han,
Gyeong Ho Lee,
Sangdon Park,
Jun Kyun Choi
Abstract:
Wireless power transmission (WPT) is expected to play an important role in the Internet of Things services by providing the perpetual operation of IoT sensors. However, to prolong the IoT network's lifetime, the efficient resource allocation algorithm is required, in particular, the energy fairness issue among IoT sensors has been a critical challenge of the WPT system. In this paper, considering…
▽ More
Wireless power transmission (WPT) is expected to play an important role in the Internet of Things services by providing the perpetual operation of IoT sensors. However, to prolong the IoT network's lifetime, the efficient resource allocation algorithm is required, in particular, the energy fairness issue among IoT sensors has been a critical challenge of the WPT system. In this paper, considering energy fairness as the minimum received energy of all energy poverty IoT sensors (EPISs), we allocate orthogonal frequency bands to several EPISs and transfer the RF power on each orthogonal band, using energy beamforming. Based on the energy poverty, we propose orthogonal frequency bands assignment rule, granting the priority to the EPISs with less received energy. We also formulate two transmission power allocation problems, incorporated the nonlinear logarithm-energy harvesting (EH) model. First, the total received power maximization (TRPM) problem is presented and solved by combining the well-known Karush-Kuhn-Tucker (KKT) conditions with the modified water-filling algorithm. Second, the common received power maximization (CRPM) problem is formulated and the optimal solution is derived using the iterative bisection search method. To apply the bisection search method to the problem, this paper proposes a method of specifying the scope of the solution for the objective function defined by the sum of monotonous functions. In numerical results, assuming the mobility of EPISs by the one-dimensional random walk model, the effectiveness of the mobility of EPISs on the minimum received energy of all EPISs is presented. Finally, the performance of the proposed resource allocation schemes is verified by comparing other resources allocation schemes, such as Round robin and equal power distribution
△ Less
Submitted 30 March, 2020;
originally announced March 2020.