-
Depth-guided Free-space Segmentation for a Mobile Robot
Authors:
Christos Sevastopoulos,
Joey Hussain,
Stasinos Konstantopoulos,
Vangelis Karkaletsis,
Fillia Makedon
Abstract:
Accurate indoor free-space segmentation is a challenging task due to the complexity and the dynamic nature that indoor environments exhibit. We propose an indoors free-space segmentation method that associates large depth values with navigable regions. Our method leverages an unsupervised masking technique that, using positive instances, generates segmentation labels based on textural homogeneity…
▽ More
Accurate indoor free-space segmentation is a challenging task due to the complexity and the dynamic nature that indoor environments exhibit. We propose an indoors free-space segmentation method that associates large depth values with navigable regions. Our method leverages an unsupervised masking technique that, using positive instances, generates segmentation labels based on textural homogeneity and depth uniformity. Moreover, we generate superpixels corresponding to areas of higher depth and align them with features extracted from a Dense Prediction Transformer (DPT). Using the estimated free-space masks and the DPT feature representation, a SegFormer model is fine-tuned on our custom-collected indoor dataset. Our experiments demonstrate sufficient performance in intricate scenarios characterized by cluttered obstacles and challenging identification of free space.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
A Shift In Artistic Practices through Artificial Intelligence
Authors:
Kıvanç Tatar,
Petter Ericson,
Kelsey Cotton,
Paola Torres Núñez del Prado,
Roser Batlle-Roca,
Beatriz Cabrero-Daniel,
Sara Ljungblad,
Georgios Diapoulis,
Jabbar Hussain
Abstract:
The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globall…
▽ More
The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globally, how does this new paradigm shift challenge the status quo in artistic practices? What kind of changes will AI technology bring to music, arts, and new media?
△ Less
Submitted 10 April, 2024; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition
Authors:
Enes Yavuz Ugan,
Christian Huber,
Juan Hussain,
Alexander Waibel
Abstract:
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech recognition (ASR) it is commonly known that these systems are very data-intensive. However, there is only a few transcribed and aligned CS speech available. To overcome t…
▽ More
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech recognition (ASR) it is commonly known that these systems are very data-intensive. However, there is only a few transcribed and aligned CS speech available. To overcome this problem and train multilingual systems which can transcribe CS speech, we propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are concatenated. By using this training data, our E2E model improves on transcribing CS speech. It also surpasses monolingual models on monolingual tests. The results show that this augmentation technique can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
△ Less
Submitted 3 July, 2023; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition
Authors:
Christian Huber,
Juan Hussain,
Sebastian Stüker,
Alexander Waibel
Abstract:
Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition (ASR). When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, numbers or technical terms. To alleviate this probl…
▽ More
Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition (ASR). When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, numbers or technical terms. To alleviate this problem we supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. After the training of the ASR system, and when it has already been deployed, a relevant word can be added or subtracted instantly without the need for further training. In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize compared to a strong baseline.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
A Practical Approach towards Causality Mining in Clinical Text using Active Transfer Learning
Authors:
Musarrat Hussain,
Fahad Ahmed Satti,
Jamil Hussain,
Taqdir Ali,
Syed Imran Ali,
Hafiz Syed Muhammad Bilal,
Gwang Hoon Park,
Sungyoung Lee
Abstract:
Objective: Causality mining is an active research area, which requires the application of state-of-the-art natural language processing techniques. In the healthcare domain, medical experts create clinical text to overcome the limitation of well-defined and schema driven information systems. The objective of this research work is to create a framework, which can convert clinical text into causal kn…
▽ More
Objective: Causality mining is an active research area, which requires the application of state-of-the-art natural language processing techniques. In the healthcare domain, medical experts create clinical text to overcome the limitation of well-defined and schema driven information systems. The objective of this research work is to create a framework, which can convert clinical text into causal knowledge. Methods: A practical approach based on term expansion, phrase generation, BERT based phrase embedding and semantic matching, semantic enrichment, expert verification, and model evolution has been used to construct a comprehensive causality mining framework. This active transfer learning based framework along with its supplementary services, is able to extract and enrich, causal relationships and their corresponding entities from clinical text. Results: The multi-model transfer learning technique when applied over multiple iterations, gains performance improvements in terms of its accuracy and recall while kee** the precision constant. We also present a comparative analysis of the presented techniques with their common alternatives, which demonstrate the correctness of our approach and its ability to capture most causal relationships. Conclusion: The presented framework has provided cutting-edge results in the healthcare domain. However, the framework can be tweaked to provide causality detection in other domains, as well. Significance: The presented framework is generic enough to be utilized in any domain, healthcare services can gain massive benefits due to the voluminous and various nature of its data. This causal knowledge extraction framework can be used to summarize clinical text, create personas, discover medical knowledge, and provide evidence to clinical decision making.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
AI Driven Knowledge Extraction from Clinical Practice Guidelines: Turning Research into Practice
Authors:
Musarrat Hussain,
Jamil Hussain,
Taqdir Ali,
Fahad Ahmed Satti,
Sungyoung Lee
Abstract:
Background and Objectives: Clinical Practice Guidelines (CPGs) represent the foremost methodology for sharing state-of-the-art research findings in the healthcare domain with medical practitioners to limit practice variations, reduce clinical cost, improve the quality of care, and provide evidence based treatment. However, extracting relevant knowledge from the plethora of CPGs is not feasible for…
▽ More
Background and Objectives: Clinical Practice Guidelines (CPGs) represent the foremost methodology for sharing state-of-the-art research findings in the healthcare domain with medical practitioners to limit practice variations, reduce clinical cost, improve the quality of care, and provide evidence based treatment. However, extracting relevant knowledge from the plethora of CPGs is not feasible for already burdened healthcare professionals, leading to large gaps between clinical findings and real practices. It is therefore imperative that state-of-the-art Computing research, especially machine learning is used to provide artificial intelligence based solution for extracting the knowledge from CPGs and reducing the gap between healthcare research/guidelines and practice. Methods: This research presents a novel methodology for knowledge extraction from CPGs to reduce the gap and turn the latest research findings into clinical practice. First, our system classifies the CPG sentences into four classes such as condition-action, condition-consequences, action, and not-applicable based on the information presented in a sentence. We use deep learning with state-of-the-art word embedding, improved word vectors technique in classification process. Second, it identifies qualifier terms in the classified sentences, which assist in recognizing the condition and action phrases in a sentence. Finally, the condition and action phrase are processed and transformed into plain rule If Condition(s) Then Action format. Results: We evaluate the methodology on three different domains guidelines including Hypertension, Rhinosinusitis, and Asthma. The deep learning model classifies the CPG sentences with an accuracy of 95%. While rule extraction was validated by user-centric approach, which achieved a Jaccard coefficient of 0.6, 0.7, and 0.4 with three human experts extracted rules, respectively.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.