Search | arXiv e-print repository

arXiv:2407.00092 [pdf]

Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Authors: Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I. Alhadidi, Ahmed Jaber, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser, Andry Rakotonirainy

Abstract: Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems, including zero-shot in-context learning scenarios. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray point distributions on a two-dimensi… ▽ More Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems, including zero-shot in-context learning scenarios. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray point distributions on a two-dimensional plane. We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these combinatorial challenges. Our experimental investigation includes rigorous evaluations across zero-shot settings and introduces innovative multi-agent zero-shot in-context scenarios. The results demonstrated that both multi-agent models. Multi-Agent 1, which includes the Initializer, Critic, and Scorer agents, and Multi-Agent 2, which comprises only the Initializer and Critic agents; significantly improved solution quality for TSP and mTSP problems. Multi-Agent 1 excelled in environments requiring detailed route refinement and evaluation, providing a robust framework for sophisticated optimizations. In contrast, Multi-Agent 2, focusing on iterative refinements by the Initializer and Critic, proved effective for rapid decision-making scenarios. These experiments yield promising outcomes, showcasing the robust visual reasoning capabilities of MLLMs in addressing diverse combinatorial problems. The findings underscore the potential of MLLMs as powerful tools in computational optimization, offering insights that could inspire further advancements in this promising field. Project link: https://github.com/ahmed-abdulhuy/Solving-TSP-and-mTSP-Combinatorial-Challenges-using-Visual-Reasoning-and-Multi-Agent-Approach-MLLMs-.git △ Less

Submitted 26 June, 2024; originally announced July 2024.

arXiv:2406.06865 [pdf]

Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Authors: Mohammed Elhenawy, Ahmed Abdelhay, Taqwa I. Alhadidi, Huthaifa I Ashqar, Shadi Jaradat, Ahmed Jaber, Sebastien Glaser, Andry Rakotonirainy

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' vi… ▽ More Multimodal Large Language Models (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' visual capabilities to 'eyeball' solutions for the Traveling Salesman Problem (TSP) by analyzing images of point distributions on a two-dimensional plane. Our experiments aimed to validate the hypothesis that MLLMs can effectively 'eyeball' viable TSP routes. The results from zero-shot, few-shot, self-ensemble, and self-refine zero-shot evaluations show promising outcomes. We anticipate that these findings will inspire further exploration into MLLMs' visual reasoning abilities to tackle other combinatorial problems. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2402.12810 [pdf, other]

PIP-Net: Pedestrian Intention Prediction in the Wild

Authors: Mohsen Azarmi, Mahdi Rezaei, He Wang, Sebastien Glaser

Abstract: Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spat… ▽ More Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spatial features from the driving scene, the proposed model employs a recurrent and temporal attention-based solution, outperforming state-of-the-art performance. To enhance the visual representation of road users and their proximity to the ego vehicle, we introduce a categorical depth feature map, combined with a local motion flow feature, providing rich insights into the scene dynamics. Additionally, we explore the impact of expanding the camera's field of view, from one to three cameras surrounding the ego vehicle, leading to enhancement in the model's contextual perception. Depending on the traffic scenario and road environment, the model excels in predicting pedestrian crossing intentions up to 4 seconds in advance which is a breakthrough in current research studies in pedestrian intention prediction. Finally, for the first time, we present the Urban-PIP dataset, a customised pedestrian intention prediction dataset, with multi-camera annotations in real-world automated driving scenarios. △ Less

Submitted 1 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2110.03104 [pdf]

doi 10.1371/journal.pone.0260995

Hybrid Pointer Networks for Traveling Salesman Problems Optimization

Authors: Ahmed Stohy, Heba-Tullah Abdelhakam, Sayed Ali, Mohammed Elhenawy, Abdallah A Hassan, Mahmoud Masoud, Sebastien Glaser, Andry Rakotonirainy

Abstract: In this work, a novel idea is presented for combinatorial optimization problems, a hybrid network, which results in a superior outcome. We applied this method to graph pointer networks [1], expanding its capabilities to a higher level. We proposed a hybrid pointer network (HPN) to solve the travelling salesman problem trained by reinforcement learning. Furthermore, HPN builds upon graph pointer ne… ▽ More In this work, a novel idea is presented for combinatorial optimization problems, a hybrid network, which results in a superior outcome. We applied this method to graph pointer networks [1], expanding its capabilities to a higher level. We proposed a hybrid pointer network (HPN) to solve the travelling salesman problem trained by reinforcement learning. Furthermore, HPN builds upon graph pointer networks which is an extension of pointer networks with an additional graph embedding layer. HPN outperforms the graph pointer network in solution quality due to the hybrid encoder, which provides our model with a verity encoding type, allowing our model to converge to a better policy. Our network significantly outperforms the original graph pointer network for small and large-scale problems increasing its performance for TSP50 from 5.959 to 5.706 without utilizing 2opt, Pointer networks, Attention model, and a wide range of models, producing results comparable to highly tuned and specialized algorithms. We make our data, models, and code publicly available [2]. △ Less

Submitted 13 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

arXiv:2008.06727 [pdf]

doi 10.1109/ACCESS.2022.3154088

A Review on Drivers Red Light Running Behavior Predictions and Technology Based Countermeasures

Authors: Md Mostafizur Rahman Komol, Jack Pinnow, Mohammed Elhenawy, Shamsunnahar Yasmin, Mahmoud Masoud, Sebastien Glaser, Andry Rakotonirainy

Abstract: Red light running at signalised intersections is a growing road safety issue worldwide, leading to the rapid development of advanced intelligent transportation technologies and countermeasures. However, existing studies have yet to summarise and present the effect of these technology based innovations in improving safety. This paper represents a comprehensive review of red light running behaviour… ▽ More Red light running at signalised intersections is a growing road safety issue worldwide, leading to the rapid development of advanced intelligent transportation technologies and countermeasures. However, existing studies have yet to summarise and present the effect of these technology based innovations in improving safety. This paper represents a comprehensive review of red light running behaviour prediction methodologies and technology-based countermeasures. Specifically, the major focus of this study is to provide a comprehensive review on two streams of literature targeting red light running and stop and go behaviour at signalised intersection (1) studies focusing on modelling and predicting the red light running and stop and go related driver behaviour and (2) studies focusing on the effectiveness of different technology based countermeasures which combat such unsafe behaviour. The study provides a systematic guide to assist researchers and stakeholders in understanding how to best identify red light running and stop and go associated driving behaviour and subsequently implement countermeasures to combat such risky behaviour and improve the associated safety. △ Less

Submitted 13 March, 2022; v1 submitted 15 August, 2020; originally announced August 2020.

Comments: Published in IEEE ACCESS

Journal ref: in IEEE Access, vol. 10, pp. 25309-25326, 2022

arXiv:1911.05312 [pdf]

doi 10.13140/RG.2.2.16800.17922

Topological Stability: a New Algorithm for Selecting The Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques

Authors: Mohammed Elhenawy, Mahmoud Masoud, Sebastian Glaser, Andry Rakotonirainy

Abstract: In the machine learning field, dimensionality reduction is an important task. It mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new (non-linear) techniques for dimensionality reduction. Most of these techniques are based on the intuition that data… ▽ More In the machine learning field, dimensionality reduction is an important task. It mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new (non-linear) techniques for dimensionality reduction. Most of these techniques are based on the intuition that data lies on or near a complex low-dimensional manifold that is embedded in the high-dimensional space. New techniques for dimensionality reduction aim at identifying and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the nearest neighbours based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to choose the nearest neighbours to reduce the number of short-circuit errors and hence improves the topological stability. Because at any point on the manifold, that point and its nearest neighbours form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. The prposed algorithmuses the point itself and its two nearest neighbours to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. The proposed algorithm then adds new points to the two nearest neighbours based on the distance and the angle between each new point and the orthogonal to the subspace. The superior performance of the new algorithm in choosing the nearest neighbours is confirmed through experimental work with several datasets. △ Less

Submitted 16 November, 2019; v1 submitted 13 November, 2019; originally announced November 2019.

arXiv:1910.10371 [pdf, other]

Semi-supervised Multi-domain Multi-task Training for Metastatic Colon Lymph Node Diagnosis From Abdominal CT

Authors: Saskia Glaser, Gabriel Maicas, Sergei Bedrikovetski, Tarik Sammour, Gustavo Carneiro

Abstract: The diagnosis of the presence of metastatic lymph nodes from abdominal computed tomography (CT) scans is an essential task performed by radiologists to guide radiation and chemotherapy treatment. State-of-the-art deep learning classifiers trained for this task usually rely on a training set containing CT volumes and their respective image-level (i.e., global) annotation. However, the lack of annot… ▽ More The diagnosis of the presence of metastatic lymph nodes from abdominal computed tomography (CT) scans is an essential task performed by radiologists to guide radiation and chemotherapy treatment. State-of-the-art deep learning classifiers trained for this task usually rely on a training set containing CT volumes and their respective image-level (i.e., global) annotation. However, the lack of annotations for the localisation of the regions of interest (ROIs) containing lymph nodes can limit classification accuracy due to the small size of the relevant ROIs in this problem. The use of lymph node ROIs together with global annotations in a multi-task training process has the potential to improve classification accuracy, but the high cost involved in obtaining the ROI annotation for the same samples that have global annotations is a roadblock for this alternative. We address this limitation by introducing a new training strategy from two data sets: one containing the global annotations, and another (publicly available) containing only the lymph node ROI localisation. We term our new strategy semi-supervised multi-domain multi-task training, where the goal is to improve the diagnosis accuracy on the globally annotated data set by incorporating the ROI annotations from a different domain. Using a private data set containing global annotations and a public data set containing lymph node ROI localisation, we show that our proposed training mechanism improves the area under the ROC curve for the classification task compared to several training method baselines. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: Under review at ISBI 2020

arXiv:1212.4804 [pdf]

Low Speed Automation, a French Initiative

Authors: Sébastien Glaser, Maurice Cour, Lydie Nouveliere, Alain Lambert, Fawzi Nashashibi, Jean-Christophe Popieul, Benjamin Mourllion

Abstract: Nowadays, vehicle safety is constantly increasing thanks to the improvement of vehicle passive and active safety. However, on a daily usage of the car, traffic jams remains a problem. With limited space for road infrastructure, automation of the driving task on specific situation seems to be a possible solution. The French project ABV, which stands for low speed automation, tries to demonstrate th… ▽ More Nowadays, vehicle safety is constantly increasing thanks to the improvement of vehicle passive and active safety. However, on a daily usage of the car, traffic jams remains a problem. With limited space for road infrastructure, automation of the driving task on specific situation seems to be a possible solution. The French project ABV, which stands for low speed automation, tries to demonstrate the feasibility of the concept and to prove the benefits. In this article, we describe the scientific background of the project and expected outputs. △ Less

Submitted 19 December, 2012; originally announced December 2012.

Comments: TRA (2012)

Showing 1–8 of 8 results for author: Glaser, S