Search | arXiv e-print repository

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Authors: Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi

Abstract: Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities. Compositional visual reasoning approaches have emerged as effective strategies; however, they heavily rely on the commonsense knowledge encode… ▽ More Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities. Compositional visual reasoning approaches have emerged as effective strategies; however, they heavily rely on the commonsense knowledge encoded in Large Language Models (LLMs) to perform planning, reasoning, or both, without considering the effect of their decisions on the visual reasoning process, which can lead to errors or failed procedures. To address these challenges, we introduce HYDRA, a multi-stage dynamic compositional visual reasoning framework designed for reliable and incrementally progressive general reasoning. HYDRA integrates three essential modules: a planner, a Reinforcement Learning (RL) agent serving as a cognitive controller, and a reasoner. The planner and reasoner modules utilize an LLM to generate instruction samples and executable code from the selected instruction, respectively, while the RL agent dynamically interacts with these modules, making high-level decisions on selection of the best instruction sample given information from the historical state stored through a feedback loop. This adaptable design enables HYDRA to adjust its actions based on previous feedback received during the reasoning process, leading to more reliable reasoning outputs and ultimately enhancing its overall effectiveness. Our framework demonstrates state-of-the-art performance in various VR tasks on four different widely-used datasets. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:1606.03502 [pdf]

Intelligent audit code generation from free text in the context of neurosurgery

Authors: Sedigheh Khademi, Christopher Palmer, Pari Delir Haghighi, Philip Lewis, Frada Burstein

Abstract: Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of… ▽ More Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of medicine use their own language and data structures; the translation process has unique challenges, and often requires a fresh approach. This research is devoted to creating a novel semi-automated method for generating codified auditing data from clinical notes recorded in a neurosurgical department in an Australian teaching hospital. The method encapsulates specialist knowledge in rules that instantaneously make precise decisions for the majority of the matches, followed up by dictionary-based matching of the remaining text. △ Less

Submitted 10 June, 2016; originally announced June 2016.

Comments: ISBN# 978-0-646-95337-3 Presented at the Australasian Conference on Information Systems 2015 (arXiv:1605.01032)

Report number: ACIS/2015/189

arXiv:1606.00751 [pdf]

A Crowd Monitoring Framework using Emotion Analysis of Social Media for Emergency Management in Mass Gatherings

Authors: Minh Quan Ngo, Pari Delir Haghighi, Frada Burstein

Abstract: In emergency management for mass gathering, the knowledge about crowd types can highly assist with providing timely response and effective resource allocation. Crowd monitoring can be achieved using computer vision based approaches and sensory data analysis. The emergence of social media platforms presents an opportunity to capture valuable information about how people feel and think. However, rev… ▽ More In emergency management for mass gathering, the knowledge about crowd types can highly assist with providing timely response and effective resource allocation. Crowd monitoring can be achieved using computer vision based approaches and sensory data analysis. The emergence of social media platforms presents an opportunity to capture valuable information about how people feel and think. However, reviewing current works shows that there are a limited number of studies that use social media in crowd monitoring and/or incorporate a unified crowd model for consistency and interoperability. This presents a novel framework for crowd monitoring using social media. It includes a standard crowd model to represent different types of crowds. The proposed framework considers the effect of emotion on crowd behaviour and uses the emotion analysis of social media to identify the crowd types in an event. An experiment using historical data of a past event to validate our framework and model is described. △ Less

Submitted 28 May, 2016; originally announced June 2016.

Comments: ISBN# 978-0-646-95337-3 Presented at the Australasian Conference on Information Systems 2015 (arXiv:1605.01032)

Report number: ACIS/2015/85

Showing 1–3 of 3 results for author: Haghighi, P D