Search | arXiv e-print repository

Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review

Abstract: While supervised learning has achieved significant success in computer vision tasks, acquiring high-quality annotated data remains a bottleneck. This paper explores both scholarly and non-scholarly works in AI-assistive deep learning image annotation systems that provide textual suggestions, captions, or descriptions of the input image to the annotator. This potentially results in higher annotatio… ▽ More While supervised learning has achieved significant success in computer vision tasks, acquiring high-quality annotated data remains a bottleneck. This paper explores both scholarly and non-scholarly works in AI-assistive deep learning image annotation systems that provide textual suggestions, captions, or descriptions of the input image to the annotator. This potentially results in higher annotation efficiency and quality. Our exploration covers annotation for a range of computer vision tasks including image classification, object detection, regression, instance, semantic segmentation, and pose estimation. We review various datasets and how they contribute to the training and evaluation of AI-assistive annotation systems. We also examine methods leveraging neuro-symbolic learning, deep active learning, and self-supervised learning algorithms that enable semantic image understanding and generate free-text output. These include image captioning, visual question answering, and multi-modal reasoning. Despite the promising potential, there is limited publicly available work on AI-assistive image annotation with textual output capabilities. We conclude by suggesting future research directions to advance this field, emphasizing the need for more publicly accessible datasets and collaborative efforts between academia and industry. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Accepted IEEE ETNCC 2024, 9 pages

arXiv:2403.10916 [pdf, other]

FishNet: Deep Neural Networks for Low-Cost Fish Stock Estimation

Authors: Moseli Mots'oehli, Anton Nikolaev, Wawan B. IGede, John Lynham, Peter J. Mous, Peter Sadowski

Abstract: Fish stock assessment often involves manual fish counting by taxonomy specialists, which is both time-consuming and costly. We propose FishNet, an automated computer vision system for both taxonomic classification and fish size estimation from images captured with a low-cost digital camera. The system first performs object detection and segmentation using a Mask R-CNN to identify individual fish f… ▽ More Fish stock assessment often involves manual fish counting by taxonomy specialists, which is both time-consuming and costly. We propose FishNet, an automated computer vision system for both taxonomic classification and fish size estimation from images captured with a low-cost digital camera. The system first performs object detection and segmentation using a Mask R-CNN to identify individual fish from images containing multiple fish, possibly consisting of different species. Then each fish species is classified and the length is predicted using separate machine learning models. To develop the model, we use a dataset of 300,000 hand-labeled images containing 1.2M fish of 163 different species and ranging in length from 10cm to 250cm, with additional annotations and quality control methods used to curate high-quality training data. On held-out test data sets, our system achieves a 92% intersection over union on the fish segmentation task, a 89% top-1 classification accuracy on single fish species classification, and a 2.3cm mean absolute error on the fish length estimation task. △ Less

Submitted 27 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

Comments: IEEE COINS 2024

arXiv:2310.09141 [pdf, ps, other]

PuoBERTa: Training and evaluation of a curated language model for Setswana

Authors: Vukosi Marivate, Moseli Mots'Oehli, Valencia Wagner, Richard Lastrucci, Isheanesu Dzingirai

Abstract: Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked language model trained specifically for Setswana. We cover how we collected, curated, and prepared diverse monolingual texts to generate a high-quality corpu… ▽ More Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked language model trained specifically for Setswana. We cover how we collected, curated, and prepared diverse monolingual texts to generate a high-quality corpus for PuoBERTa's training. Building upon previous efforts in creating monolingual resources for Setswana, we evaluated PuoBERTa across several NLP tasks, including part-of-speech (POS) tagging, named entity recognition (NER), and news categorisation. Additionally, we introduced a new Setswana news categorisation dataset and provided the initial benchmarks using PuoBERTa. Our work demonstrates the efficacy of PuoBERTa in fostering NLP capabilities for understudied languages like Setswana and paves the way for future research directions. △ Less

Submitted 24 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Accepted for SACAIR 2023

arXiv:2302.11075 [pdf, other]

Deep Active Learning in the Presence of Label Noise: A Survey

Authors: Moseli Mots'oehli, Kyungim Baek

Abstract: Deep active learning has emerged as a powerful tool for training deep learning models within a predefined labeling budget. These models have achieved performances comparable to those trained in an offline setting. However, deep active learning faces substantial issues when dealing with classification datasets containing noisy labels. In this literature review, we discuss the current state of deep… ▽ More Deep active learning has emerged as a powerful tool for training deep learning models within a predefined labeling budget. These models have achieved performances comparable to those trained in an offline setting. However, deep active learning faces substantial issues when dealing with classification datasets containing noisy labels. In this literature review, we discuss the current state of deep active learning in the presence of label noise, highlighting unique approaches, their strengths, and weaknesses. With the recent success of vision transformers in image classification tasks, we provide a brief overview and consider how the transformer layers and attention mechanisms can be used to enhance diversity, importance, and uncertainty-based selection in queries sent to an oracle for labeling. We further propose exploring contrastive learning methods to derive good image representations that can aid in selecting high-value samples for labeling in an active learning setting. We also highlight the need for creating unified benchmarks and standardized datasets for deep active learning in the presence of label noise for image classification to promote the reproducibility of research. The review concludes by suggesting avenues for future research in this area. △ Less

Submitted 19 September, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 20 pages, PhD literature review

arXiv:2211.00731 [pdf, other]

Comparision Of Adversarial And Non-Adversarial LSTM Music Generative Models

Authors: Moseli Mots'oehli, Anna Sergeevna Bosman, Johan Pieter De Villiers

Abstract: Algorithmic music composition is a way of composing musical pieces with minimal to no human intervention. While recurrent neural networks are traditionally applied to many sequence-to-sequence prediction tasks, including successful implementations of music composition, their standard supervised learning approach based on input-to-output map** leads to a lack of note variety. These models can the… ▽ More Algorithmic music composition is a way of composing musical pieces with minimal to no human intervention. While recurrent neural networks are traditionally applied to many sequence-to-sequence prediction tasks, including successful implementations of music composition, their standard supervised learning approach based on input-to-output map** leads to a lack of note variety. These models can therefore be seen as potentially unsuitable for tasks such as music generation. Generative adversarial networks learn the generative distribution of data and lead to varied samples. This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data. The resulting music samples are evaluated by human listeners, their preferences recorded. The evaluation indicates that adversarial training produces more aesthetically pleasing music. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: Submitted to a 2023 conference, 20 pages, 13 figures

arXiv:2209.00213 [pdf, other]

Public Parking Spot Detection And Geo-localization Using Transfer Learning

Authors: Moseli Mots'oehli, Yao Chao Yang

Abstract: In cities around the world, locating public parking lots with vacant parking spots is a major problem, costing commuters time and adding to traffic congestion. This work illustrates how a dataset of Geo-tagged images from a mobile phone camera, can be used in navigating to the most convenient public parking lot in Johannesburg with an available parking space, detected by a neural network powered-p… ▽ More In cities around the world, locating public parking lots with vacant parking spots is a major problem, costing commuters time and adding to traffic congestion. This work illustrates how a dataset of Geo-tagged images from a mobile phone camera, can be used in navigating to the most convenient public parking lot in Johannesburg with an available parking space, detected by a neural network powered-public camera. The images are used to fine-tune a Detectron2 model pre-trained on the ImageNet dataset to demonstrate detection and segmentation of vacant parking spots, we then add the parking lot's corresponding longitude and latitude coordinates to recommend the most convenient parking lot to the driver based on the Haversine distance and number of available parking spots. Using the VGG Image Annotation (VIA) we use images from an expanding dataset of images, and annotate these with polygon outlines of the four different types of objects of interest: cars, open parking spots, people, and car number plates. We use the segmentation model to ensure number plates can be occluded in production for car registration anonymity purposes. We get an 89% and 82% intersection over union cover score on cars and parking spaces respectively. This work has the potential to help reduce the amount of time commuters spend searching for free public parking, hence easing traffic congestion in and around shop** complexes and other public places, and maximize people's utility with respect to driving on public roads. △ Less

Submitted 17 October, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: Accepted for presentation at SACAIR 2022. 11 pages,5 figures

Showing 1–6 of 6 results for author: Mots'oehli, M