Search | arXiv e-print repository

Real-Time Scene Graph Generation

Authors: Maëlic Neau, Paulo E. Santos, Karl Sammut, Anne-Gwenn Bosser, Cédric Buche

Abstract: Scene Graph Generation (SGG) can extract abstract semantic relations between entities in images as graph representations. This task holds strong promises for other downstream tasks such as the embodied cognition of an autonomous agent. However, to power such applications, SGG needs to solve the gap of real-time latency. In this work, we propose to investigate the bottlenecks of current approaches… ▽ More Scene Graph Generation (SGG) can extract abstract semantic relations between entities in images as graph representations. This task holds strong promises for other downstream tasks such as the embodied cognition of an autonomous agent. However, to power such applications, SGG needs to solve the gap of real-time latency. In this work, we propose to investigate the bottlenecks of current approaches for real-time constraint applications. Then, we propose a simple yet effective implementation of a real-time SGG approach using YOLOV8 as an object detection backbone. Our implementation is the first to obtain more than 48 FPS for the task with no loss of accuracy, successfully outperforming any other lightweight approaches. Our code is freely available at https://github.com/Maelic/SGG-Benchmark. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2404.11882 [pdf, other]

doi 10.1609/aaaiss.v2i1.27643

Hybrid Navigation Acceptability and Safety

Authors: Benoit Clement, Marie Dubromel, Paulo E. Santos, Karl Sammut, Michelle Oppert, Feras Dayoub

Abstract: Autonomous vessels have emerged as a prominent and accepted solution, particularly in the naval defence sector. However, achieving full autonomy for marine vessels demands the development of robust and reliable control and guidance systems that can handle various encounters with manned and unmanned vessels while operating effectively under diverse weather and sea conditions. A significant challeng… ▽ More Autonomous vessels have emerged as a prominent and accepted solution, particularly in the naval defence sector. However, achieving full autonomy for marine vessels demands the development of robust and reliable control and guidance systems that can handle various encounters with manned and unmanned vessels while operating effectively under diverse weather and sea conditions. A significant challenge in this pursuit is ensuring the autonomous vessels' compliance with the International Regulations for Preventing Collisions at Sea (COLREGs). These regulations present a formidable hurdle for the human-level understanding by autonomous systems as they were originally designed from common navigation practices created since the mid-19th century. Their ambiguous language assumes experienced sailors' interpretation and execution, and therefore demands a high-level (cognitive) understanding of language and agent intentions. These capabilities surpass the current state-of-the-art in intelligent systems. This position paper highlights the critical requirements for a trustworthy control and guidance system, exploring the complexity of adapting COLREGs for safe vessel-on-vessel encounters considering autonomous maritime technology competing and/or cooperating with manned vessels. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2312.01568 [pdf, other]

Multimodal Speech Emotion Recognition Using Modality-specific Self-Supervised Frameworks

Authors: Rutherford Agbeshi Patamia, Paulo E. Santos, Kingsley Nketia Acheampong, Favour Ekong, Kwabena Sarpong, She Kun

Abstract: Emotion recognition is a topic of significant interest in assistive robotics due to the need to equip robots with the ability to comprehend human behavior, facilitating their effective interaction in our society. Consequently, efficient and dependable emotion recognition systems supporting optimal human-machine communication are required. Multi-modality (including speech, audio, text, images, and… ▽ More Emotion recognition is a topic of significant interest in assistive robotics due to the need to equip robots with the ability to comprehend human behavior, facilitating their effective interaction in our society. Consequently, efficient and dependable emotion recognition systems supporting optimal human-machine communication are required. Multi-modality (including speech, audio, text, images, and videos) is typically exploited in emotion recognition tasks. Much relevant research is based on merging multiple data modalities and training deep learning models utilizing low-level data representations. However, most existing emotion databases are not large (or complex) enough to allow machine learning approaches to learn detailed representations. This paper explores modalityspecific pre-trained transformer frameworks for self-supervised learning of speech and text representations for data-efficient emotion recognition while achieving state-of-the-art performance in recognizing emotions. This model applies feature-level fusion using nonverbal cue data points from motion capture to provide multimodal speech emotion recognition. The model was trained using the publicly available IEMOCAP dataset, achieving an overall accuracy of 77.58% for four emotions, outperforming state-of-the-art approaches △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2305.18668 [pdf, other]

Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient Scene Graph Generation

Authors: Neau Maëlic, Paulo E. Santos, Anne-Gwenn Bosser, Cédric Buche

Abstract: Learning to compose visual relationships from raw images in the form of scene graphs is a highly challenging task due to contextual dependencies, but it is essential in computer vision applications that depend on scene understanding. However, no current approaches in Scene Graph Generation (SGG) aim at providing useful graphs for downstream tasks. Instead, the main focus has primarily been on the… ▽ More Learning to compose visual relationships from raw images in the form of scene graphs is a highly challenging task due to contextual dependencies, but it is essential in computer vision applications that depend on scene understanding. However, no current approaches in Scene Graph Generation (SGG) aim at providing useful graphs for downstream tasks. Instead, the main focus has primarily been on the task of unbiasing the data distribution for predicting more fine-grained relations. That being said, all fine-grained relations are not equally relevant and at least a part of them are of no use for real-world applications. In this work, we introduce the task of Efficient SGG that prioritizes the generation of relevant relations, facilitating the use of Scene Graphs in downstream tasks such as Image Generation. To support further approaches, we present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset. We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG. Finally, we show the efficiency of this dataset in the task of Image Generation from Scene Graphs. △ Less

Submitted 25 September, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2105.07579 [pdf, other]

Monitoring electrical systems data-network equipment by means of Fuzzy and Paraconsistent Annotated Logic

Authors: Hyghor Miranda Cortes, Paulo Eduardo Santos, Joao Inacio da Silva Filho

Abstract: The constant increase in the amount and complexity of information obtained from IT data networkelements, for its correct monitoring and management, is a reality. The same happens to data net-works in electrical systems that provide effective supervision and control of substations and hydro-electric plants. Contributing to this fact is the growing number of installations and new environmentsmonitor… ▽ More The constant increase in the amount and complexity of information obtained from IT data networkelements, for its correct monitoring and management, is a reality. The same happens to data net-works in electrical systems that provide effective supervision and control of substations and hydro-electric plants. Contributing to this fact is the growing number of installations and new environmentsmonitored by such data networks and the constant evolution of the technologies involved. This sit-uation potentially leads to incomplete and/or contradictory data, issues that must be addressed inorder to maintain a good level of monitoring and, consequently, management of these systems. Inthis paper, a prototype of an expert system is developed to monitor the status of equipment of datanetworks in electrical systems, which deals with inconsistencies without trivialising the inferences.This is accomplished in the context of the remote control of hydroelectric plants and substationsby a Regional Operation Centre (ROC). The expert system is developed with algorithms definedupon a combination of Fuzzy logic and Paraconsistent Annotated Logic with Annotation of TwoValues (PAL2v) in order to analyse uncertain signals and generate the operating conditions (faulty,normal, unstable or inconsistent / indeterminate) of the equipment that are identified as importantfor the remote control of hydroelectric plants and substations. A prototype of this expert systemwas installed on a virtualised server with CLP500 software (from the EFACEC manufacturer) thatwas applied to investigate scenarios consisting of a Regional (Brazilian) Operation Centre, with aGeneric Substation and a Generic Hydroelectric Plant, representing a remote control environment. △ Less

Submitted 23 May, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

Comments: 38 pages; 14 figures; Under submission

arXiv:2011.01397 [pdf, other]

doi 10.1080/13875868.2020.1857386

Guided Navigation from Multiple Viewpoints using Qualitative Spatial Reasoning

Authors: Danilo Perico, Paulo E. Santos, Reinaldo Bianchi

Abstract: Navigation is an essential ability for mobile agents to be completely autonomous and able to perform complex actions. However, the problem of navigation for agents with limited (or no) perception of the world, or devoid of a fully defined motion model, has received little attention from research in AI and Robotics. One way to tackle this problem is to use guided navigation, in which other autonomo… ▽ More Navigation is an essential ability for mobile agents to be completely autonomous and able to perform complex actions. However, the problem of navigation for agents with limited (or no) perception of the world, or devoid of a fully defined motion model, has received little attention from research in AI and Robotics. One way to tackle this problem is to use guided navigation, in which other autonomous agents, endowed with perception, can combine their distinct viewpoints to infer the localisation and the appropriate commands to guide a sensory deprived agent through a particular path. Due to the limited knowledge about the physical and perceptual characteristics of the guided agent, this task should be conducted on a level of abstraction allowing the use of a generic motion model, and high-level commands, that can be applied by any type of autonomous agents, including humans. The main task considered in this work is, given a group of autonomous agents perceiving their common environment with their independent, egocentric and local vision sensors, the development and evaluation of algorithms capable of producing a set of high-level commands (involving qualitative directions: e.g. move left, go straight ahead) capable of guiding a sensory deprived robot to a goal location. △ Less

Submitted 2 November, 2020; originally announced November 2020.

Comments: 26 pages

Journal ref: Spatial Cognition and Computation - 2020

arXiv:2010.00839 [pdf, other]

CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns

Authors: Leonardo Anjoletto Ferreira, Douglas De Rizzo Meneghetti, Paulo Eduardo Santos

Abstract: Recently, Deep Learning (DL) methods have shown an excellent performance in image captioning and visual question answering. However, despite their performance, DL methods do not learn the semantics of the words that are being used to describe a scene, making it difficult to spot incorrect words used in captions or to interchange words that have similar meanings. This work proposes a combination of… ▽ More Recently, Deep Learning (DL) methods have shown an excellent performance in image captioning and visual question answering. However, despite their performance, DL methods do not learn the semantics of the words that are being used to describe a scene, making it difficult to spot incorrect words used in captions or to interchange words that have similar meanings. This work proposes a combination of DL methods for object detection and natural language processing to validate image's captions. We test our method in the FOIL-COCO data set, since it provides correct and incorrect captions for various images using only objects represented in the MS-COCO image data set. Results show that our method has a good overall performance, in some cases similar to the human performance. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: Published at the First Annual International Workshop on Interpretability: Methodologies and algorithms (IMA 2019)

arXiv:1903.03411 [pdf, other]

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

Authors: Thiago Freitas dos Santos, Paulo E. Santos, Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Pedro Cabalar

Abstract: Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to u… ▽ More Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning. △ Less

Submitted 15 February, 2019; originally announced March 2019.

Comments: Submitted to Journal of Heuristics

arXiv:1706.01417 [pdf, other]

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

Authors: Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras

Abstract: Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with… ▽ More Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process. △ Less

Submitted 5 June, 2017; originally announced June 2017.

Comments: Submitted to IJCAI 17

arXiv:1705.01399 [pdf, other]

Answer Set Programming for Non-Stationary Markov Decision Processes

Authors: Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras

Abstract: Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to fin… ▽ More Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains. △ Less

Submitted 3 May, 2017; originally announced May 2017.

Showing 1–10 of 10 results for author: Santos, P E