Locality enhanced dynamic biasing and sampling strategies for contextual ASR
Authors:
Md Asif Jalal,
Pablo Peso Parada,
George Pavlidis,
Vasileios Moschopoulos,
Karthikeyan Saravanan,
Chrysovalantis-Giorgos Kontoulis,
Jisi Zhang,
Anastasios Drosou,
Gil Ho Lee,
Jungin Lee,
Seokyeong Jung
Abstract:
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t…
▽ More
Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the training of CB for ASR with correlation plots between the bias embeddings among various training stages. Secondly, we introduce a neighbourhood attention (NA) that localizes self attention (SA) to the nearest neighbouring frames to further refine the CB output. The results show that this proposed approach provides on average a 25.84% relative WER improvement on LibriSpeech sets and rare-word evaluation compared to the baseline.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
Deep learning approaches in food recognition
Authors:
Chairi Kiourt,
George Pavlidis,
Stella Markantonatou
Abstract:
Automatic image-based food recognition is a particularly challenging task. Traditional image analysis approaches have achieved low classification accuracy in the past, whereas deep learning approaches enabled the identification of food types and their ingredients. The contents of food dishes are typically deformable objects, usually including complex semantics, which makes the task of defining the…
▽ More
Automatic image-based food recognition is a particularly challenging task. Traditional image analysis approaches have achieved low classification accuracy in the past, whereas deep learning approaches enabled the identification of food types and their ingredients. The contents of food dishes are typically deformable objects, usually including complex semantics, which makes the task of defining their structure very difficult. Deep learning methods have already shown very promising results in such challenges, so this chapter focuses on the presentation of some popular approaches and techniques applied in image-based food recognition. The three main lines of solutions, namely the design from scratch, the transfer learning and the platform-based approaches, are outlined, particularly for the task at hand, and are tested and compared to reveal the inherent strengths and weaknesses. The chapter is complemented with basic background material, a section devoted to the relevant datasets that are crucial in light of the empirical approaches adopted, and some concluding remarks that underline the future directions.
△ Less
Submitted 8 April, 2020; v1 submitted 4 April, 2020;
originally announced April 2020.