-
Classification Tree-based Active Learning: A Wrapper Approach
Authors:
Ashna Jose,
Emilie Devijver,
Massih-Reza Amini,
Noel Jakse,
Roberta Poloni
Abstract:
Supervised machine learning often requires large training sets to train accurate models, yet obtaining large amounts of labeled data is not always feasible. Hence, it becomes crucial to explore active learning methods for reducing the size of training sets while maintaining high accuracy. The aim is to select the optimal subset of data for labeling from an initial unlabeled set, ensuring precise p…
▽ More
Supervised machine learning often requires large training sets to train accurate models, yet obtaining large amounts of labeled data is not always feasible. Hence, it becomes crucial to explore active learning methods for reducing the size of training sets while maintaining high accuracy. The aim is to select the optimal subset of data for labeling from an initial unlabeled set, ensuring precise prediction of outcomes. However, conventional active learning approaches are comparable to classical random sampling. This paper proposes a wrapper active learning method for classification, organizing the sampling process into a tree structure, that improves state-of-the-art algorithms. A classification tree constructed on an initial set of labeled samples is considered to decompose the space into low-entropy regions. Input-space based criteria are used thereafter to sub-sample from these regions, the total number of points to be labeled being decomposed into each region. This adaptation proves to be a significant enhancement over existing active learning methods. Through experiments conducted on various benchmark data sets, the paper demonstrates the efficacy of the proposed framework by being effective in constructing accurate classification models, even when provided with a severely restricted labeled data set.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Enhancing Lidar-based Object Detection in Adverse Weather using Offset Sequences in Time
Authors:
Raphael van Kempen,
Tim Rehbronn,
Abin Jose,
Johannes Stegmaier,
Bastian Lampe,
Timo Woopen,
Lutz Eckstein
Abstract:
Automated vehicles require an accurate perception of their surroundings for safe and efficient driving. Lidar-based object detection is a widely used method for environment perception, but its performance is significantly affected by adverse weather conditions such as rain and fog. In this work, we investigate various strategies for enhancing the robustness of lidar-based object detection by proce…
▽ More
Automated vehicles require an accurate perception of their surroundings for safe and efficient driving. Lidar-based object detection is a widely used method for environment perception, but its performance is significantly affected by adverse weather conditions such as rain and fog. In this work, we investigate various strategies for enhancing the robustness of lidar-based object detection by processing sequential data samples generated by lidar sensors. Our approaches leverage temporal information to improve a lidar object detection model, without the need for additional filtering or pre-processing steps. We compare $10$ different neural network architectures that process point cloud sequences including a novel augmentation strategy introducing a temporal offset between frames of a sequence during training and evaluate the effectiveness of all strategies on lidar point clouds under adverse weather conditions through experiments. Our research provides a comprehensive study of effective methods for mitigating the effects of adverse weather on the reliability of lidar-based object detection using sequential data that are evaluated using public datasets such as nuScenes, Dense, and the Canadian Adverse Driving Conditions Dataset. Our findings demonstrate that our novel method, involving temporal offset augmentation through randomized frame skip** in sequences, enhances object detection accuracy compared to both the baseline model (Pillar-based Object Detection) and no augmentation.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis
Authors:
Adam Byfield,
William Poulett,
Ben Wallace,
Anusha Jose,
Shatakshi Tyagi,
Smita Shembekar,
Adnan Qayyum,
Junaid Qadir,
Muhammad Bilal
Abstract:
Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do n…
▽ More
Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do not directly apply to ML models since these algorithms are adaptable and learn from curated datasets through a training process. However, adapting established principles, such as boundary testing using synthetic test data can effectively bridge this gap. To this end, we present a novel technique called Mix-Up Boundary Analysis (MUBA) that facilitates evaluating image classifiers in terms of prediction fairness. We evaluated MUBA for two important medical imaging tasks -- brain tumour classification and breast cancer classification -- and achieved promising results. This research aims to showcase the importance of adapting traditional assurance principles for assessing ML models to enhance the safety and reliability of healthcare technologies. To facilitate future research, we plan to publicly release our code for MUBA.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Data Filtering Networks
Authors:
Alex Fang,
Albin Madappally Jose,
Amit Jain,
Ludwig Schmidt,
Alexander Toshev,
Vaishaal Shankar
Abstract:
Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we s…
▽ More
Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset. Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks: for instance, a model that performs well on ImageNet can yield worse training sets than a model with low ImageNet accuracy that is trained on a small amount of high-quality data. Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets. Specifically, our best performing dataset DFN-5B enables us to train state-of-the-art CLIP models for their compute budgets: among other improvements on a variety of tasks, a ViT-H trained on our dataset achieves 84.4% zero-shot transfer accuracy on ImageNet, out-performing models trained on other datasets such as LAION-2B, DataComp-1B, or OpenAI's WIT. In order to facilitate further research in dataset design, we also release a new 2 billion example dataset DFN-2B and show that high performance data filtering networks can be trained from scratch using only publicly available data.
△ Less
Submitted 5 November, 2023; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers
Authors:
Reza Azad,
Amirhossein Kazerouni,
Alaa Sulaiman,
Afshin Bozorgpour,
Ehsan Khodapanah Aghdam,
Abin Jose,
Dorit Merhof
Abstract:
Medical image segmentation is a critical task that plays a vital role in diagnosis, treatment planning, and disease monitoring. Accurate segmentation of anatomical structures and abnormalities from medical images can aid in the early detection and treatment of various diseases. In this paper, we address the local feature deficiency of the Transformer model by carefully re-designing the self-attent…
▽ More
Medical image segmentation is a critical task that plays a vital role in diagnosis, treatment planning, and disease monitoring. Accurate segmentation of anatomical structures and abnormalities from medical images can aid in the early detection and treatment of various diseases. In this paper, we address the local feature deficiency of the Transformer model by carefully re-designing the self-attention map to produce accurate dense prediction in medical images. To this end, we first apply the wavelet transformation to decompose the input feature map into low-frequency (LF) and high-frequency (HF) subbands. The LF segment is associated with coarse-grained features while the HF components preserve fine-grained features such as texture and edge information. Next, we reformulate the self-attention operation using the efficient Transformer to perform both spatial and context attention on top of the frequency representation. Furthermore, to intensify the importance of the boundary information, we impose an additional attention map by creating a Gaussian pyramid on top of the HF components. Moreover, we propose a multi-scale context enhancement block within skip connections to adaptively model inter-scale dependencies to overcome the semantic gap among stages of the encoder and decoder modules. Throughout comprehensive experiments, we demonstrate the effectiveness of our strategy on multi-organ and skin lesion segmentation benchmarks. The implementation code will be available upon acceptance. \href{https://github.com/mindflow-institue/WaveFormer}{GitHub}.
△ Less
Submitted 12 September, 2023; v1 submitted 25 August, 2023;
originally announced August 2023.
-
An Encoder-Decoder Approach for Packing Circles
Authors:
Akshay Kiran Jose,
Gangadhar Karevvanavar,
Rajshekhar V Bhat
Abstract:
The problem of packing smaller objects within a larger object has been of interest since decades. In these problems, in addition to the requirement that the smaller objects must lie completely inside the larger objects, they are expected to not overlap or have minimum overlap with each other. Due to this, the problem of packing turns out to be a non-convex problem, obtaining whose optimal solution…
▽ More
The problem of packing smaller objects within a larger object has been of interest since decades. In these problems, in addition to the requirement that the smaller objects must lie completely inside the larger objects, they are expected to not overlap or have minimum overlap with each other. Due to this, the problem of packing turns out to be a non-convex problem, obtaining whose optimal solution is challenging. As such, several heuristic approaches have been used for obtaining sub-optimal solutions in general, and provably optimal solutions for some special instances. In this paper, we propose a novel encoder-decoder architecture consisting of an encoder block, a perturbation block and a decoder block, for packing identical circles within a larger circle. In our approach, the encoder takes the index of a circle to be packed as an input and outputs its center through a normalization layer, the perturbation layer adds controlled perturbations to the center, ensuring that it does not deviate beyond the radius of the smaller circle to be packed, and the decoder takes the perturbed center as input and estimates the index of the intended circle for packing. We parameterize the encoder and decoder by a neural network and optimize it to reduce an error between the decoder's estimated index and the actual index of the circle provided as input to the encoder. The proposed approach can be generalized to pack objects of higher dimensions and different shapes by carefully choosing normalization and perturbation layers. The approach gives a sub-optimal solution and is able to pack smaller objects within a larger object with competitive performance with respect to classical methods.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
Authors:
Marcelo Archanjo Jose,
Fabio Gagliardi Cozman
Abstract:
Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques usually take as input a concatenated text with the question and the database schema), we present techniques that allow long text sequences to be handled by transfor…
▽ More
Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques usually take as input a concatenated text with the question and the database schema), we present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. We propose a training process with database schema pruning (removal of tables and columns names that are useless for the query of interest). In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously: English, Portuguese, Spanish, and French. Our proposed technique used the Spider dataset and increased the exact set match accuracy results from 0.718 to 0.736 in a validation dataset (Dev). Source code, evaluations, and checkpoints are available at: \underline{https://github.com/C4AI/gap-text2sql}.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Authors:
Chen Chen,
Bowen Zhang,
Liangliang Cao,
Jiguang Shen,
Tom Gunter,
Albin Madappally Jose,
Alexander Toshev,
Jonathon Shlens,
Ruoming Pang,
Yinfei Yang
Abstract:
Image and text retrieval is one of the foundational tasks in the vision and language domain with multiple real-world applications. State-of-the-art approaches, e.g. CLIP, ALIGN, represent images and texts as dense embeddings and calculate the similarity in the dense embedding space as the matching score. On the other hand, sparse semantic features like bag-of-words models are more interpretable, b…
▽ More
Image and text retrieval is one of the foundational tasks in the vision and language domain with multiple real-world applications. State-of-the-art approaches, e.g. CLIP, ALIGN, represent images and texts as dense embeddings and calculate the similarity in the dense embedding space as the matching score. On the other hand, sparse semantic features like bag-of-words models are more interpretable, but believed to suffer from inferior accuracy than dense representations. In this work, we show that it is possible to build a sparse semantic representation that is as powerful as, or even better than, dense presentations. We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space. Each token in the space is a (sub-)word in the vocabulary, which is not only interpretable but also easy to integrate with existing information retrieval systems. STAIR model significantly outperforms a CLIP model with +$4.9\%$ and +$4.3\%$ absolute Recall@1 improvement on COCO-5k text$\rightarrow$image and image$\rightarrow$text retrieval respectively. It also achieved better performance on both of ImageNet zero-shot and linear probing compared to CLIP.
△ Less
Submitted 7 February, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Denoising Diffusion Probabilistic Models for Generation of Realistic Fully-Annotated Microscopy Image Data Sets
Authors:
Dennis Eschweiler,
Rüveyda Yilmaz,
Matisse Baumann,
Ina Laube,
Rijo Roy,
Abin Jose,
Daniel Brückner,
Johannes Stegmaier
Abstract:
Recent advances in computer vision have led to significant progress in the generation of realistic image data, with denoising diffusion probabilistic models proving to be a particularly effective method. In this study, we demonstrate that diffusion models can effectively generate fully-annotated microscopy image data sets through an unsupervised and intuitive approach, using rough sketches of desi…
▽ More
Recent advances in computer vision have led to significant progress in the generation of realistic image data, with denoising diffusion probabilistic models proving to be a particularly effective method. In this study, we demonstrate that diffusion models can effectively generate fully-annotated microscopy image data sets through an unsupervised and intuitive approach, using rough sketches of desired structures as the starting point. The proposed pipeline helps to reduce the reliance on manual annotations when training deep learning-based segmentation approaches and enables the segmentation of diverse datasets without the need for human annotations. This approach holds great promise in streamlining the data generation process and enabling a more efficient and scalable training of segmentation models, as we show in the example of different practical experiments involving various organisms and cell types.
△ Less
Submitted 8 August, 2023; v1 submitted 2 January, 2023;
originally announced January 2023.
-
Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review
Authors:
Reza Azad,
Amirhossein Kazerouni,
Moein Heidari,
Ehsan Khodapanah Aghdam,
Amirali Molaei,
Yiwei Jia,
Abin Jose,
Rijo Roy,
Dorit Merhof
Abstract:
The remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad interest in Computer Vision. Among other merits, Transformers are witnessed as capable of learning long-range dependencies and spatial correlations, which is a clear advantage over convolutional neural networks (CNNs), which have been the de facto standard in Computer Vision…
▽ More
The remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad interest in Computer Vision. Among other merits, Transformers are witnessed as capable of learning long-range dependencies and spatial correlations, which is a clear advantage over convolutional neural networks (CNNs), which have been the de facto standard in Computer Vision problems so far. Thus, Transformers have become an integral part of modern medical image analysis. In this review, we provide an encyclopedic review of the applications of Transformers in medical imaging. Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks, including classification, segmentation, detection, registration, synthesis, and clinical report generation. For each of these applications, we investigate the novelty, strengths and weaknesses of the different proposed strategies and develop taxonomies highlighting key properties and contributions. Further, if applicable, we outline current benchmarks on different datasets. Finally, we summarize key challenges and discuss different future research directions. In addition, we have provided cited papers with their corresponding implementations in https://github.com/mindflow-institue/Awesome-Transformer.
△ Less
Submitted 5 November, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
The BLue Amazon Brain (BLAB): A Modular Architecture of Services about the Brazilian Maritime Territory
Authors:
Paulo Pirozelli,
Ais B. R. Castro,
Ana Luiza C. de Oliveira,
André S. Oliveira,
Flávio N. Cação,
Igor C. Silveira,
João G. M. Campos,
Laura C. Motheo,
Leticia F. Figueiredo,
Lucas F. A. O. Pellicer,
Marcelo A. José,
Marcos M. José,
Pedro de M. Ligabue,
Ricardo S. Grava,
Rodrigo M. Tavares,
Vinícius B. Matos,
Yan V. Sym,
Anna H. R. Costa,
Anarosa A. F. Brandão,
Denis D. Mauá,
Fabio G. Cozman,
Sarajane M. Peres
Abstract:
We describe the first steps in the development of an artificial agent focused on the Brazilian maritime territory, a large region within the South Atlantic also known as the Blue Amazon. The "BLue Amazon Brain" (BLAB) integrates a number of services aimed at disseminating information about this region and its importance, functioning as a tool for environmental awareness. The main service provided…
▽ More
We describe the first steps in the development of an artificial agent focused on the Brazilian maritime territory, a large region within the South Atlantic also known as the Blue Amazon. The "BLue Amazon Brain" (BLAB) integrates a number of services aimed at disseminating information about this region and its importance, functioning as a tool for environmental awareness. The main service provided by BLAB is a conversational facility that deals with complex questions about the Blue Amazon, called BLAB-Chat; its central component is a controller that manages several task-oriented natural language processing modules (e.g., question answering and summarizer systems). These modules have access to an internal data lake as well as to third-party databases. A news reporter (BLAB-Reporter) and a purposely-developed wiki (BLAB-Wiki) are also part of the BLAB service architecture. In this paper, we describe our current version of BLAB's architecture (interface, backend, web services, NLP modules, and resources) and comment on the challenges we have faced so far, such as the lack of training data and the scattered state of domain information. Solving these issues presents a considerable challenge in the development of artificial intelligence for technical domains.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Integrating question answering and text-to-SQL in Portuguese
Authors:
Marcos Menon José,
Marcelo Archanjo José,
Denis Deratani Mauá,
Fábio Gagliardi Cozman
Abstract:
Deep learning transformers have drastically improved systems that automatically answer questions in natural language. However, different questions demand different answering techniques; here we propose, build and validate an architecture that integrates different modules to answer two distinct kinds of queries. Our architecture takes a free-form natural language text and classifies it to send it e…
▽ More
Deep learning transformers have drastically improved systems that automatically answer questions in natural language. However, different questions demand different answering techniques; here we propose, build and validate an architecture that integrates different modules to answer two distinct kinds of queries. Our architecture takes a free-form natural language text and classifies it to send it either to a Neural Question Answering Reasoner or a Natural Language parser to SQL. We implemented a complete system for the Portuguese language, using some of the main tools available for the language and translating training and testing datasets. Experiments show that our system selects the appropriate answering method with high accuracy (over 99\%), thus validating a modular question answering strategy.
△ Less
Submitted 21 September, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Inclusive Design: Accessibility Settings for People with Cognitive Disabilities
Authors:
Trae Waggoner,
Julia Ann Jose,
Ashwin Nair,
Sudarsan Manikandan
Abstract:
The advancement of technology has progressed faster than any other field in the world and with the development of these new technologies, it is important to make sure that these tools can be used by everyone, including people with disabilities. Accessibility options in computing devices help ensure that everyone has the same access to advanced technologies. Unfortunately, for those who require mor…
▽ More
The advancement of technology has progressed faster than any other field in the world and with the development of these new technologies, it is important to make sure that these tools can be used by everyone, including people with disabilities. Accessibility options in computing devices help ensure that everyone has the same access to advanced technologies. Unfortunately, for those who require more unique and sometimes challenging accommodations, such as people with Amyotrophic lateral sclerosis ( ALS), the most commonly used accessibility features are simply not enough. While assistive technology for those with ALS does exist, it requires multiple peripheral devices that can become quite expensive collectively. The purpose of this paper is to suggest a more affordable and readily available option for ALS assistive technology that can be implemented on a smartphone or tablet.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
BotNet Detection on Social Media
Authors:
Aniket Chandrakant Devle,
Julia Ann Jose,
Abhay Shrinivas Saraswathula,
Shubham Mehta,
Siddhant Srivastava,
Sirisha Kona,
Sudheera Daggumalli
Abstract:
As our reliance on social media platforms and web services increase day by day, exploiters view these platforms as an opportunity to manipulate our thoughts ad actions. These platforms have become an open playground for social bot accounts. Social bots not only learn human conversations, manners, and presence but also manipulate public opinion, act as scammers, manipulate stock markets, and so on.…
▽ More
As our reliance on social media platforms and web services increase day by day, exploiters view these platforms as an opportunity to manipulate our thoughts ad actions. These platforms have become an open playground for social bot accounts. Social bots not only learn human conversations, manners, and presence but also manipulate public opinion, act as scammers, manipulate stock markets, and so on. There has been evidence of bots manipulating people's opinions and thoughts which can be a great threat to democracy. Identification and prevention of such campaigns that release or create these bots have become critical. Our goal in this paper is to leverage web mining techniques to help detect fake bots on social media platforms such as Twitter, thereby mitigating the spread of disinformation.
△ Less
Submitted 27 November, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer
Authors:
Marcelo Archanjo José,
Fabio Gagliardi Cozman
Abstract:
The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art t…
▽ More
The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a multilingual BART model (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This multilingual BART model fine-tuned with a double-size training dataset (English and Portuguese) achieved 83% of the baseline, making inferences for the Portuguese test dataset. This investigation can help other researchers to produce results in Machine Learning in a language different from English. Our multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at: https://github.com/C4AI/gap-text2sql
△ Less
Submitted 29 November, 2021; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Reversible Colour Density Compression of Images using cGANs
Authors:
Arun Jose,
Abraham Francis
Abstract:
Image compression using colour densities is historically impractical to decompress losslessly. We examine the use of conditional generative adversarial networks in making this transformation more feasible, through learning a map** between the images and a loss function to train on. We show that this method is effective at producing visually lossless generations, indicating that efficient colour…
▽ More
Image compression using colour densities is historically impractical to decompress losslessly. We examine the use of conditional generative adversarial networks in making this transformation more feasible, through learning a map** between the images and a loss function to train on. We show that this method is effective at producing visually lossless generations, indicating that efficient colour compression is viable.
△ Less
Submitted 19 June, 2021;
originally announced June 2021.
-
Continuous Glucose Monitoring Prediction
Authors:
Julia Ann Jose,
Trae Waggoner,
Sudarsan Manikandan
Abstract:
Diabetes is one of the deadliest diseases in the world and affects nearly 10 percent of the global adult population. Fortunately, powerful new technologies allow for a consistent and reliable treatment plan for people with diabetes. One major development is a system called continuous blood glucose monitoring (CGM). In this review, we look at three different continuous meal detection algorithms tha…
▽ More
Diabetes is one of the deadliest diseases in the world and affects nearly 10 percent of the global adult population. Fortunately, powerful new technologies allow for a consistent and reliable treatment plan for people with diabetes. One major development is a system called continuous blood glucose monitoring (CGM). In this review, we look at three different continuous meal detection algorithms that were developed using given CGM data from patients with diabetes. From this analysis, an initial meal prediction algorithm was also developed utilizing these methods.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Optimized Feature Space Learning for Generating Efficient Binary Codes for Image Retrieval
Authors:
Abin Jose,
Erik Stefan Ottlik,
Christian Rohlfing,
Jens-Rainer Ohm
Abstract:
In this paper we propose an approach for learning low dimensional optimized feature space with minimum intra-class variance and maximum inter-class variance. We address the problem of high-dimensionality of feature vectors extracted from neural networks by taking care of the global statistics of feature space. Classical approach of Linear Discriminant Analysis (LDA) is generally used for generatin…
▽ More
In this paper we propose an approach for learning low dimensional optimized feature space with minimum intra-class variance and maximum inter-class variance. We address the problem of high-dimensionality of feature vectors extracted from neural networks by taking care of the global statistics of feature space. Classical approach of Linear Discriminant Analysis (LDA) is generally used for generating an optimized low dimensional feature space for single-labeled images. Since, image retrieval involves both multi-labeled and single-labeled images, we utilize the equivalence between LDA and Canonical Correlation Analysis (CCA) to generate an optimized feature space for single-labeled images and use CCA to generate an optimized feature space for multi-labeled images. Our approach correlates the projections of feature vectors with label vectors in our CCA based network architecture. The neural network minimize a loss function which maximizes the correlation coefficients. We binarize our generated feature vectors with the popular Iterative Quantization (ITQ) approach and also propose an ensemble network to generate binary codes of desired bit length for image retrieval. Our measurement of mean average precision shows competitive results on other state-of-the-art single-labeled and multi-labeled image retrieval datasets.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
OpenUAV Cloud Testbed: a Collaborative Design Studio for Field Robotics
Authors:
Harish Anand,
Stephen A. Rees,
Zhiang Chen,
Ashwin Jose,
Sarah Bearman,
Prasad Antervedi,
Jnaneshwar Das
Abstract:
Simulations play a crucial role in robotics research and education. This paper presents the OpenUAV testbed, an open-source, easy-to-use, web-based, and reproducible software system that enables students and researchers to run robotic simulations on the cloud. We have built upon our previous work and have addressed some of the educational and research challenges associated with the prior work. The…
▽ More
Simulations play a crucial role in robotics research and education. This paper presents the OpenUAV testbed, an open-source, easy-to-use, web-based, and reproducible software system that enables students and researchers to run robotic simulations on the cloud. We have built upon our previous work and have addressed some of the educational and research challenges associated with the prior work. The critical contributions of the paper to the robotics and automation community are threefold: First, OpenUAV saves students and researchers from tedious and complicated software setups by providing web-browser-based Linux desktop sessions with standard robotics software like Gazebo, ROS, and flight autonomy stack. Second, a method for saving an individual's research work with its dependencies for the work's future reproducibility. Third, the platform provides a mechanism to support photorealistic robotics simulations by combining Unity game engine-based camera rendering and Gazebo physics. The paper addresses a research need for photorealistic simulations and describes a methodology for creating a photorealistic aquatic simulation. We also present the various academic and research use-cases of this platform to improve robotics education and research, especially during times like the COVID-19 pandemic, when virtual collaboration is necessary.
△ Less
Submitted 6 May, 2021; v1 submitted 1 October, 2019;
originally announced October 2019.