-
PICS: Pipeline for Image Captioning and Search
Authors:
Grant Rosario,
David Noever
Abstract:
The growing volume of digital images necessitates advanced systems for efficient categorization and retrieval, presenting a significant challenge in database management and information retrieval. This paper introduces PICS (Pipeline for Image Captioning and Search), a novel approach designed to address the complexities inherent in organizing large-scale image repositories. PICS leverages the advan…
▽ More
The growing volume of digital images necessitates advanced systems for efficient categorization and retrieval, presenting a significant challenge in database management and information retrieval. This paper introduces PICS (Pipeline for Image Captioning and Search), a novel approach designed to address the complexities inherent in organizing large-scale image repositories. PICS leverages the advancements in Large Language Models (LLMs) to automate the process of image captioning, offering a solution that transcends traditional manual annotation methods. The approach is rooted in the understanding that meaningful, AI-generated captions can significantly enhance the searchability and accessibility of images in large databases. By integrating sentiment analysis into the pipeline, PICS further enriches the metadata, enabling nuanced searches that extend beyond basic descriptors. This methodology not only simplifies the task of managing vast image collections but also sets a new precedent for accuracy and efficiency in image retrieval. The significance of PICS lies in its potential to transform image database systems, harnessing the power of machine learning and natural language processing to meet the demands of modern digital asset management.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Satellite Captioning: Large Language Models to Augment Labeling
Authors:
Grant Rosario,
David Noever
Abstract:
With the growing capabilities of modern object detection networks and datasets to train them, it has gotten more straightforward and, importantly, less laborious to get up and running with a model that is quite adept at detecting any number of various objects. However, while image datasets for object detection have grown and continue to proliferate (the current most extensive public set, ImageNet,…
▽ More
With the growing capabilities of modern object detection networks and datasets to train them, it has gotten more straightforward and, importantly, less laborious to get up and running with a model that is quite adept at detecting any number of various objects. However, while image datasets for object detection have grown and continue to proliferate (the current most extensive public set, ImageNet, contains over 14m images with over 14m instances), the same cannot be said for textual caption datasets. While they have certainly been growing in recent years, caption datasets present a much more difficult challenge due to language differences, grammar, and the time it takes for humans to generate them. Current datasets have certainly provided many instances to work with, but it becomes problematic when a captioner may have a more limited vocabulary, one may not be adequately fluent in the language, or there are simple grammatical mistakes. These difficulties are increased when the images get more specific, such as remote sensing images. This paper aims to address this issue of potential information and communication shortcomings in caption datasets. To provide a more precise analysis, we specify our domain of images to be remote sensing images in the RSICD dataset and experiment with the captions provided here. Our findings indicate that ChatGPT grammar correction is a simple and effective way to increase the performance accuracy of caption models by making data captions more diverse and grammatically correct.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Grading Conversational Responses Of Chatbots
Authors:
Grant Rosario,
David Noever
Abstract:
Chatbots have long been capable of answering basic questions and even responding to obscure prompts, but recently their improvements have been far more significant. Modern chatbots like Open AIs ChatGPT3 not only have the ability to answer basic questions but can write code and movie scripts and imitate well-known people. In this paper, we analyze ChatGPTs' responses to various questions from a da…
▽ More
Chatbots have long been capable of answering basic questions and even responding to obscure prompts, but recently their improvements have been far more significant. Modern chatbots like Open AIs ChatGPT3 not only have the ability to answer basic questions but can write code and movie scripts and imitate well-known people. In this paper, we analyze ChatGPTs' responses to various questions from a dataset of queries from the popular Quora forum. We submitted sixty questions to ChatGPT and scored the answers based on three industry-standard metrics for grading machine translation: BLEU, METEOR, and ROUGE. These metrics allow us to compare the machine responses with the most upvoted human answer to the same question to assess ChatGPT's ability to submit a humanistic reply. The results showed that while the responses and translation abilities of ChatGPT are remarkable, they still fall short of what a typical human reaction would be.
△ Less
Submitted 31 January, 2023;
originally announced March 2023.
-
Soft Labels for Rapid Satellite Object Detection
Authors:
Matthew Ciolino,
Grant Rosario,
David Noever
Abstract:
Soft labels in image classification are vector representations of an image's true classification. In this paper, we investigate soft labels in the context of satellite object detection. We propose using detections as the basis for a new dataset of soft labels. Much of the effort in creating a high-quality model is gathering and annotating the training data. If we could use a model to generate a da…
▽ More
Soft labels in image classification are vector representations of an image's true classification. In this paper, we investigate soft labels in the context of satellite object detection. We propose using detections as the basis for a new dataset of soft labels. Much of the effort in creating a high-quality model is gathering and annotating the training data. If we could use a model to generate a dataset for us, we could not only rapidly create datasets, but also supplement existing open-source datasets. Using a subset of the xView dataset, we train a YOLOv5 model to detect cars, planes, and ships. We then use that model to generate soft labels for the second training set which we then train and compare to the original model. We show that soft labels can be used to train a model that is almost as accurate as a model trained on the original data.
△ Less
Submitted 27 January, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Soft-labeling Strategies for Rapid Sub-Ty**
Authors:
Grant Rosario,
David Noever,
Matt Ciolino
Abstract:
The challenge of labeling large example datasets for computer vision continues to limit the availability and scope of image repositories. This research provides a new method for automated data collection, curation, labeling, and iterative training with minimal human intervention for the case of overhead satellite imagery and object detection. The new operational scale effectively scanned an entire…
▽ More
The challenge of labeling large example datasets for computer vision continues to limit the availability and scope of image repositories. This research provides a new method for automated data collection, curation, labeling, and iterative training with minimal human intervention for the case of overhead satellite imagery and object detection. The new operational scale effectively scanned an entire city (68 square miles) in grid search and yielded a prediction of car color from space observations. A partially trained yolov5 model served as an initial inference seed to output further, more refined model predictions in iterative cycles. Soft labeling here refers to accepting label noise as a potentially valuable augmentation to reduce overfitting and enhance generalized predictions to previously unseen test data. The approach takes advantage of a real-world instance where a cropped image of a car can automatically receive sub-type information as white or colorful from pixel values alone, thus completing an end-to-end pipeline without overdependence on human labor.
△ Less
Submitted 19 January, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.