Search | arXiv e-print repository

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions

Authors: Anthony G Cohn, Robert E Blackwell

Abstract: We investigate the abilities of a representative set of Large language Models (LLMs) to reason about cardinal directions (CDs). To do so, we create two datasets: the first, co-created with ChatGPT, focuses largely on recall of world knowledge about CDs; the second is generated from a set of templates, comprehensively testing an LLM's ability to determine the correct CD given a particular scenario.… ▽ More We investigate the abilities of a representative set of Large language Models (LLMs) to reason about cardinal directions (CDs). To do so, we create two datasets: the first, co-created with ChatGPT, focuses largely on recall of world knowledge about CDs; the second is generated from a set of templates, comprehensively testing an LLM's ability to determine the correct CD given a particular scenario. The templates allow for a number of degrees of variation such as means of locomotion of the agent involved, and whether set in the first , second or third person. Even with a temperature setting of zero, Our experiments show that although LLMs are able to perform well in the simpler dataset, in the second more complex dataset no LLM is able to reliably determine the correct CD, even with a temperature setting of zero. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 9 pages, 3 figures, 1 table. Short paper accepted by COSIT 24, The 16th Conference on Spatial Information Theory

arXiv:2406.14336 [pdf, other]

Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction

Authors: Erum Haris, Anthony G. Cohn, John G. Stell

Abstract: Navigating historical narratives poses a challenge in unveiling the spatial intricacies of past landscapes. The proposed work addresses this challenge within the context of the English Lake District, employing the Corpus of the Lake District Writing. The method utilizes a generative pre-trained transformer model to extract spatial relations from the textual descriptions in the corpus. The study ap… ▽ More Navigating historical narratives poses a challenge in unveiling the spatial intricacies of past landscapes. The proposed work addresses this challenge within the context of the English Lake District, employing the Corpus of the Lake District Writing. The method utilizes a generative pre-trained transformer model to extract spatial relations from the textual descriptions in the corpus. The study applies this large language model to understand the spatial dimensions inherent in historical narratives comprehensively. The outcomes are presented as semantic triples, capturing the nuanced connections between entities and locations, and visualized as a network, offering a graphical representation of the spatial narrative. The study contributes to a deeper comprehension of the English Lake District's spatial tapestry and provides an approach to uncovering spatial relations within diverse historical contexts. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.11911 [pdf, other]

A Notion of Complexity for Theory of Mind via Discrete World Models

Authors: X. Angelo Huang, Emanuele La Malfa, Samuele Marro, Andrea Asperti, Anthony Cohn, Michael Wooldridge

Abstract: Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework to measure the complexity of ToM tasks. We quantify a problem's complexity as the nu… ▽ More Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required. While the research community has proposed many ToM benchmarks, their hardness varies greatly, and their complexity is not well defined. This work proposes a framework to measure the complexity of ToM tasks. We quantify a problem's complexity as the number of states necessary to solve it correctly. Our complexity measure also accounts for spurious states of a ToM problem designed to make it apparently harder. We use our method to assess the complexity of five widely adopted ToM benchmarks. On top of this framework, we design a prompting technique that augments the information available to a model with a description of how the environment changes with the agents' interactions. We name this technique Discrete World Models (DWM) and show how it elicits superior performance on ToM tasks. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: https://flecart.github.com/complexity-tom-dwm

arXiv:2406.01931 [pdf, other]

Dishonesty in Helpful and Harmless Alignment

Authors: Youcheng Huang, **gkun Tang, Duanyu Feng, Zheng Zhang, Wenqiang Lei, Jiancheng Lv, Anthony G. Cohn

Abstract: People tell lies when seeking rewards. Large language models (LLMs) are aligned to human values with reinforcement learning where they get rewards if they satisfy human preference. We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses. Using the latest interpreting tools, we detect dishonesty, show how LLMs can be harmful… ▽ More People tell lies when seeking rewards. Large language models (LLMs) are aligned to human values with reinforcement learning where they get rewards if they satisfy human preference. We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses. Using the latest interpreting tools, we detect dishonesty, show how LLMs can be harmful if their honesty is increased, and analyze such conflicts at the parameter-level. Given these preliminaries and the hypothesis that reward-seeking stimulates dishonesty, we theoretically show that the dishonesty can in-turn decrease the alignment performances and augment reward-seeking alignment with representation regularization. Extensive results, including GPT-4 annotated win-rates, perplexities, and cases studies demonstrate that we can train more honest, helpful, and harmless LLMs. We will make all our codes and results be open-sourced upon this paper's acceptance. △ Less

Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.15064 [pdf, other]

Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning

Authors: Fangjun Li, David C. Hogg, Anthony G. Cohn

Abstract: Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models' (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation.… ▽ More Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models' (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation. We present a novel benchmark for assessing QSR in LMs, which is grounded in realistic 3D simulation data, offering a series of diverse room layouts with various objects and their spatial relationships. This approach provides a more detailed and context-rich narrative for spatial reasoning evaluation, diverging from traditional, toy-task-oriented scenarios. Our benchmark encompasses a broad spectrum of qualitative spatial relationships, including topological, directional, and distance relations. These are presented with different viewing points, varied granularities, and density of relation constraints to mimic real-world complexities. A key contribution is our logic-based consistency-checking tool, which enables the assessment of multiple plausible solutions, aligning with real-world scenarios where spatial relationships are often open to interpretation. Our benchmark evaluation of advanced LMs reveals their strengths and limitations in spatial reasoning. They face difficulties with multi-hop spatial reasoning and interpreting a mix of different view descriptions, pointing to areas for future improvement. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Camera-Ready version for IJCAI 2024

arXiv:2402.02805 [pdf, other]

Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

Authors: Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert

Abstract: Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LL… ▽ More Planning is a fundamental property of human intelligence. Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents. Our code and data are available at https://github.com/fangru-lin/graph-llm-asynchow-plan. △ Less

Submitted 3 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted at ICML-2024

arXiv:2401.09074 [pdf, other]

Code Simulation Challenges for Large Language Models

Authors: Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

Abstract: Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-lin… ▽ More Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks. △ Less

Submitted 12 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Code: https://github.com/EmanueleLM/CodeSimulation

arXiv:2401.03991 [pdf, other]

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Authors: Fangjun Li, David C. Hogg, Anthony G. Cohn

Abstract: Artificial intelligence (AI) has made remarkable progress across various domains, with large language models like ChatGPT gaining substantial attention for their human-like text-generation capabilities. Despite these achievements, spatial reasoning remains a significant challenge for these models. Benchmarks like StepGame evaluate AI spatial reasoning, where ChatGPT has shown unsatisfactory perfor… ▽ More Artificial intelligence (AI) has made remarkable progress across various domains, with large language models like ChatGPT gaining substantial attention for their human-like text-generation capabilities. Despite these achievements, spatial reasoning remains a significant challenge for these models. Benchmarks like StepGame evaluate AI spatial reasoning, where ChatGPT has shown unsatisfactory performance. However, the presence of template errors in the benchmark has an impact on the evaluation results. Thus there is potential for ChatGPT to perform better if these template errors are addressed, leading to more accurate assessments of its spatial reasoning capabilities. In this study, we refine the StepGame benchmark, providing a more accurate dataset for model evaluation. We analyze GPT's spatial reasoning performance on the rectified benchmark, identifying proficiency in map** natural language text to spatial relations but limitations in multi-hop reasoning. We provide a flawless solution to the benchmark by combining template-to-relation map** with logic-based reasoning. This combination demonstrates proficiency in performing qualitative reasoning on StepGame without encountering any errors. We then address the limitations of GPT models in spatial reasoning. We deploy Chain-of-thought and Tree-of-thoughts prompting strategies, offering insights into GPT's ``cognitive process", and achieving remarkable improvements in accuracy. Our investigation not only sheds light on model deficiencies but also proposes enhancements, contributing to the advancement of AI with more robust spatial reasoning capabilities. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Camera-Ready version for AAAI 2024

arXiv:2309.16573 [pdf, other]

Language Models as a Service: Overview of a New Paradigm and its Challenges

Authors: Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G. Cohn, Nigel Shadbolt, Michael Wooldridge

Abstract: Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. In contrast with scenarios where full model access is available, as in the case of open-source models, such closed-off language models present specific challenges for evaluating, b… ▽ More Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. In contrast with scenarios where full model access is available, as in the case of open-source models, such closed-off language models present specific challenges for evaluating, benchmarking, and testing them. This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, replicability, reliability, and trustworthiness of LMaaS. We systematically examine the issues that arise from a lack of information about language models for each of these four aspects. We conduct a detailed analysis of existing solutions and put forth a number of considered recommendations, and highlight the directions for future advancements. On the other hand, it serves as a comprehensive resource for existing knowledge on current, major LMaaS, offering a synthesized overview of the licences and capabilities their interfaces offer. △ Less

Submitted 30 November, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.15577 [pdf, other]

An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-8

Authors: Anthony G Cohn

Abstract: Qualitative Spatial Reasoning (QSR) is well explored area of Commonsense Reasoning and has multiple applications ranging from Geographical Information Systems to Robotics and Computer Vision. Recently many claims have been made for the capabilities of Large Language Models (LLMs). In this paper we investigate the extent to which one particular LLM can perform classical qualitative spatial reasonin… ▽ More Qualitative Spatial Reasoning (QSR) is well explored area of Commonsense Reasoning and has multiple applications ranging from Geographical Information Systems to Robotics and Computer Vision. Recently many claims have been made for the capabilities of Large Language Models (LLMs). In this paper we investigate the extent to which one particular LLM can perform classical qualitative spatial reasoning tasks on the mereotopological calculus, RCC-8. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 10 figures. 8 pages. Accepted for presentation at 36th International Workshop on Qualitative Reasoning (QR-23), in conjunction with ECAI2023 in Krakow, Poland

arXiv:2304.11164 [pdf, other]

Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs

Authors: Anthony G Cohn, Jose Hernandez-Orallo

Abstract: Language models have become very popular recently and many claims have been made about their abilities, including for commonsense reasoning. Given the increasingly better results of current language models on previous static benchmarks for commonsense reasoning, we explore an alternative dialectical evaluation. The goal of this kind of evaluation is not to obtain an aggregate performance value but… ▽ More Language models have become very popular recently and many claims have been made about their abilities, including for commonsense reasoning. Given the increasingly better results of current language models on previous static benchmarks for commonsense reasoning, we explore an alternative dialectical evaluation. The goal of this kind of evaluation is not to obtain an aggregate performance value but to find failures and map the boundaries of the system. Dialoguing with the system gives the opportunity to check for consistency and get more reassurance of these boundaries beyond anecdotal evidence. In this paper we conduct some qualitative investigations of this kind of evaluation for the particular case of spatial reasoning (which is a fundamental aspect of commonsense reasoning). We conclude with some suggestions for future work both to improve the capabilities of language models and to systematise this kind of dialectical evaluation. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Comments: 11 pages in main paper + 71 pages in appendix

arXiv:2304.05989 [pdf, other]

Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings

Authors: Alexia Toumpa, Anthony G. Cohn

Abstract: Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks. As humans tend to use objects in many different ways depending on the scene and the objects' availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open set of interactions and objects. We addre… ▽ More Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks. As humans tend to use objects in many different ways depending on the scene and the objects' availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open set of interactions and objects. We address the problem of affordance categorization for class-agnostic objects with an open set of interactions; we achieve this by learning similarities between object interactions in an unsupervised way and thus inducing clusters of object affordances. A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs (AGs), which abstract from the continuous representation of spatio-temporal interactions in RGB-D videos. These AGs are clustered to obtain groups of objects with similar affordances. Our experiments in a real-world scenario demonstrate that our method learns to create object affordance clusters with a high V-measure even in cluttered scenes. The proposed approach handles object occlusions by capturing effectively possible interactions and without imposing any object or scene constraints. △ Less

Submitted 30 March, 2023; originally announced April 2023.

Comments: Accepted at Journal of Artificial Intelligence Research (JAIR)

arXiv:2212.08659 [pdf]

A Hierarchical Framework for Collaborative Artificial Intelligence

Authors: James L. Crowley, Joëlle L Coutaz, Jasmin Grosinger, Javier Vázquez-Salceda, Cecilio Angulo, Alberto Sanfeliu, Luca Iocchi, Anthony G. Cohn

Abstract: We propose a hierarchical framework for collaborative intelligent systems. This framework organizes research challenges based on the nature of the collaborative activity and the information that must be shared, with each level building on capabilities provided by lower levels. We review research paradigms at each level, with a description of classical engineering-based approaches and modern altern… ▽ More We propose a hierarchical framework for collaborative intelligent systems. This framework organizes research challenges based on the nature of the collaborative activity and the information that must be shared, with each level building on capabilities provided by lower levels. We review research paradigms at each level, with a description of classical engineering-based approaches and modern alternatives based on machine learning, illustrated with a running example using a hypothetical personal service robot. We discuss cross-cutting issues that occur at all levels, focusing on the problem of communicating and sharing comprehension, the role of explanation and the social nature of collaboration. We conclude with a summary of research challenges and a discussion of the potential for economic and societal impact provided by technologies that enhance human abilities and empower people and society through collaboration with Intelligent Systems. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Journal ref: IEEE Pervasive Computing, 2022

arXiv:2208.01136 [pdf, other]

Exploring the GLIDE model for Human Action-effect Prediction

Authors: Fangjun Li, David C. Hogg, Anthony G. Cohn

Abstract: We address the following action-effect prediction task. Given an image depicting an initial state of the world and an action expressed in text, predict an image depicting the state of the world following the action. The prediction should have the same scene context as the input image. We explore the use of the recently proposed GLIDE model for performing this task. GLIDE is a generative neural net… ▽ More We address the following action-effect prediction task. Given an image depicting an initial state of the world and an action expressed in text, predict an image depicting the state of the world following the action. The prediction should have the same scene context as the input image. We explore the use of the recently proposed GLIDE model for performing this task. GLIDE is a generative neural network that can synthesize (inpaint) masked areas of an image, conditioned on a short piece of text. Our idea is to mask-out a region of the input image where the effect of the action is expected to occur. GLIDE is then used to inpaint the masked region conditioned on the required action. In this way, the resulting image has the same background context as the input image, updated to show the effect of the action. We give qualitative results from experiments using the EPIC dataset of ego-centric videos labelled with actions. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2208.00783 [pdf, other]

Location retrieval using visible landmarks based qualitative place signatures

Authors: Lijun Wei, Valerie Gouet-Brunet, Anthony Cohn

Abstract: Location retrieval based on visual information is to retrieve the location of an agent (e.g. human, robot) or the area they see by comparing the observations with a certain form of representation of the environment. Existing methods generally require precise measurement and storage of the observed environment features, which may not always be robust due to the change of season, viewpoint, occlusio… ▽ More Location retrieval based on visual information is to retrieve the location of an agent (e.g. human, robot) or the area they see by comparing the observations with a certain form of representation of the environment. Existing methods generally require precise measurement and storage of the observed environment features, which may not always be robust due to the change of season, viewpoint, occlusion, etc. They are also challenging to scale up and may not be applicable for humans due to the lack of measuring/imaging devices. Considering that humans often use less precise but easily produced qualitative spatial language and high-level semantic landmarks when describing an environment, a qualitative location retrieval method is proposed in this work by describing locations/places using qualitative place signatures (QPS), defined as the perceived spatial relations between ordered pairs of co-visible landmarks from viewers' perspective. After dividing the space into place cells each with individual signatures attached, a coarse-to-fine location retrieval method is proposed to efficiently identify the possible location(s) of viewers based on their qualitative observations. The usability and effectiveness of the proposed method were evaluated using openly available landmark datasets, together with simulated observations by considering the possible perception error. △ Less

Submitted 26 July, 2022; originally announced August 2022.

arXiv:2109.11969 [pdf, other]

Rethinking Crowd Sourcing for Semantic Similarity

Authors: Shaul Solomon, Adam Cohn, Hernan Rosenblum, Chezi Hershkovitz, Ivan P. Yamshchikov

Abstract: Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic simi… ▽ More Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category (two sentences are either similar or not similar and there is no middle ground) play the most important role in the labeling. The paper offers heuristics to filter out unreliable annotators and stimulates further discussions on human perception of semantic similarity. △ Less

Submitted 24 September, 2021; originally announced September 2021.

ACM Class: I.2.7; H.5.2; K.6.1

arXiv:2108.04621 [pdf, other]

Refactoring the Whitby Intelligent Tutoring System for Clean Architecture

Authors: Paul S. Brown, Vania Dimitrova, Glen Hart, Anthony G. Cohn, Paulo Moura

Abstract: Whitby is the server-side of an Intelligent Tutoring System application for learning System-Theoretic Process Analysis (STPA), a methodology used to ensure the safety of anything that can be represented with a systems model. The underlying logic driving the reasoning behind Whitby is Situation Calculus, which is a many-sorted logic with situation, action, and object sorts. The Situation Calculus i… ▽ More Whitby is the server-side of an Intelligent Tutoring System application for learning System-Theoretic Process Analysis (STPA), a methodology used to ensure the safety of anything that can be represented with a systems model. The underlying logic driving the reasoning behind Whitby is Situation Calculus, which is a many-sorted logic with situation, action, and object sorts. The Situation Calculus is applied to Ontology Authoring and Contingent Scaffolding: the primary activities within Whitby. Thus many fluents and actions are aggregated in Whitby from these two sub-applications and from Whitby itself, but all are available through a common situation query interface that does not depend upon any of the fluents or actions. Each STPA project in Whitby is a single situation term, which is queried for fluents that include the ontology, and to determine what pedagogical interventions to offer. Initially Whitby was written in Prolog using a module system. In the interest of a cleaner architecture and implementation with improved code reuse and extensibility, the initial application was refactored into Logtalk. This refactoring includes decoupling the Situation Calculus reasoner, Ontology Authoring framework, and Contingent Scaffolding framework into third-party libraries that can be reused in other applications. This extraction was achieved by inverting dependencies via Logtalk protocols and categories, which are reusable interfaces and components that provide functionally cohesive sets of predicate declarations and predicate definitions. In this paper the architectures of two iterations of Whitby are evaluated with respect to the motivations behind the refactor: clean architecture enabling code reuse and extensibility. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: Under consideration for acceptance in TPLP. Paper presented at the 37th International Conference on Logic Programming (ICLP 2021), 16 pages

arXiv:2102.09896 [pdf, other]

Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace

Authors: Zhiyi Pan, Peng Jiang, Yunhai Wang, Changhe Tu, Anthony G. Cohn

Abstract: Scribble-supervised semantic segmentation has gained much attention recently for its promising performance without high-quality annotations. Due to the lack of supervision, confident and consistent predictions are usually hard to obtain. Typically, people handle these problems to either adopt an auxiliary task with the well-labeled dataset or incorporate the graphical model with additional require… ▽ More Scribble-supervised semantic segmentation has gained much attention recently for its promising performance without high-quality annotations. Due to the lack of supervision, confident and consistent predictions are usually hard to obtain. Typically, people handle these problems to either adopt an auxiliary task with the well-labeled dataset or incorporate the graphical model with additional requirements on scribble annotations. Instead, this work aims to achieve semantic segmentation by scribble annotations directly without extra information and other limitations. Specifically, we propose holistic operations, including minimizing entropy and a network embedded random walk on neural representation to reduce uncertainty. Given the probabilistic transition matrix of a random walk, we further train the network with self-supervision on its neural eigenspace to impose consistency on predictions between related images. Comprehensive experiments and ablation studies verify the proposed approach, which demonstrates superiority over others; it is even comparable to some full-label supervised ones and works well when scribbles are randomly shrunk or dropped. △ Less

Submitted 19 February, 2021; originally announced February 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2011.05621

arXiv:2003.13120 [pdf, other]

Defect segmentation: Map** tunnel lining internal defects with ground penetrating radar data using a convolutional neural network

Authors: Senlin Yang, Zhengfang Wang, **g Wang, Anthony G. Cohn, Jiaqi Zhang, Peng Jiang, Peng Jiang, Qingmei Sui

Abstract: This research proposes a Ground Penetrating Radar (GPR) data processing method for non-destructive detection of tunnel lining internal defects, called defect segmentation. To perform this critical step of automatic tunnel lining detection, the method uses a CNN called Segnet combined with the Lovász softmax loss function to map the internal defect structure with GPR synthetic data, which improves… ▽ More This research proposes a Ground Penetrating Radar (GPR) data processing method for non-destructive detection of tunnel lining internal defects, called defect segmentation. To perform this critical step of automatic tunnel lining detection, the method uses a CNN called Segnet combined with the Lovász softmax loss function to map the internal defect structure with GPR synthetic data, which improves the accuracy, automation and efficiency of defects detection. The novel method we present overcomes several difficulties of traditional GPR data interpretation as demonstrated by an evaluation on both synthetic and real datas -- to verify the method on real data, a test model containing a known defect was designed and built and GPR data was obtained and analyzed. △ Less

Submitted 29 March, 2020; originally announced March 2020.

Comments: 24 pages,11 figures

arXiv:2002.12738 [pdf, other]

Human-like Planning for Reaching in Cluttered Environments

Authors: Mohamed Hasan, Matthew Warburton, Wisdom C. Agboh, Mehmet R. Dogar, Matteo Leonetti, He Wang, Faisal Mushtaq, Mark Mon-Williams, Anthony G. Cohn

Abstract: Humans, in comparison to robots, are remarkably adept at reaching for objects in cluttered environments. The best existing robot planners are based on random sampling of configuration space -- which becomes excessively high-dimensional with large number of objects. Consequently, most planners often fail to efficiently find object manipulation plans in such environments. We addressed this problem b… ▽ More Humans, in comparison to robots, are remarkably adept at reaching for objects in cluttered environments. The best existing robot planners are based on random sampling of configuration space -- which becomes excessively high-dimensional with large number of objects. Consequently, most planners often fail to efficiently find object manipulation plans in such environments. We addressed this problem by identifying high-level manipulation plans in humans, and transferring these skills to robot planners. We used virtual reality to capture human participants reaching for a target object on a tabletop cluttered with obstacles. From this, we devised a qualitative representation of the task space to abstract the decision making, irrespective of the number of obstacles. Based on this representation, human demonstrations were segmented and used to train decision classifiers. Using these classifiers, our planner produced a list of waypoints in task space. These waypoints provided a high-level plan, which could be transferred to an arbitrary robot model and used to initialise a local trajectory optimiser. We evaluated this approach through testing on unseen human VR data, a physics-based robot simulation, and a real robot (dataset and code are publicly available). We found that the human-like planner outperformed a state-of-the-art standard trajectory optimisation algorithm, and was able to generate effective strategies for rapid planning -- irrespective of the number of obstacles in the environment. △ Less

Submitted 3 March, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: To be published in ICRA 2020

arXiv:1912.05759 [pdf]

doi 10.1109/TGRS.2020.3046454

GPRInvNet: Deep Learning-Based Ground Penetrating Radar Data Inversion for Tunnel Lining

Authors: Bin Liu, Yuxiao Ren, Hanchi Liu, Hui Xu, Zhengfang Wang, Anthony G. Cohn, Peng Jiang

Abstract: A DNN architecture referred to as GPRInvNet was proposed to tackle the challenges of map** the ground-penetrating radar (GPR) B-Scan data to complex permittivity maps of subsurface structures. The GPRInvNet consisted of a trace-to-trace encoder and a decoder. It was specially designed to take into account the characteristics of GPR inversion when faced with complex GPR B-Scan data, as well as ad… ▽ More A DNN architecture referred to as GPRInvNet was proposed to tackle the challenges of map** the ground-penetrating radar (GPR) B-Scan data to complex permittivity maps of subsurface structures. The GPRInvNet consisted of a trace-to-trace encoder and a decoder. It was specially designed to take into account the characteristics of GPR inversion when faced with complex GPR B-Scan data, as well as addressing the spatial alignment issues between time-series B-Scan data and spatial permittivity maps. It displayed the ability to fuse features from several adjacent traces on the B-Scan data to enhance each trace, and then further condense the features of each trace separately. As a result, the sensitive zones on the permittivity maps spatially aligned to the enhanced trace could be reconstructed accurately. The GPRInvNet has been utilized to reconstruct the permittivity map of tunnel linings. A diverse range of dielectric models of tunnel linings containing complex defects has been reconstructed using GPRInvNet. The results have demonstrated that the GPRInvNet is capable of effectively reconstructing complex tunnel lining defects with clear boundaries. Comparative results with existing baseline methods also demonstrated the superiority of the GPRInvNet. For the purpose of generalizing the GPRInvNet to real GPR data, some background noise patches recorded from practical model testing were integrated into the synthetic GPR data to retrain the GPRInvNet. The model testing has been conducted for validation, and experimental results revealed that the GPRInvNet had also achieved satisfactory results with regard to the real data. △ Less

Submitted 26 September, 2021; v1 submitted 11 December, 2019; originally announced December 2019.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 10, pp. 8305-8325, Oct. 2021

arXiv:1810.08615 [pdf]

Autonomous Functional Locomotion in a Tendon-Driven Limb via Limited Experience

Authors: Ali Marjaninejad, Darío Urbina-Meléndez, Brian A. Cohn, Francisco J. Valero-Cuevas

Abstract: Robots will become ubiquitously useful only when they can use few attempts to teach themselves to perform different tasks, even with complex bodies and in dynamical environments. Vertebrates, in fact, successfully use trial-and-error to learn multiple tasks in spite of their intricate tendon-driven anatomies. Roboticists find such tendon-driven systems particularly hard to control because they are… ▽ More Robots will become ubiquitously useful only when they can use few attempts to teach themselves to perform different tasks, even with complex bodies and in dynamical environments. Vertebrates, in fact, successfully use trial-and-error to learn multiple tasks in spite of their intricate tendon-driven anatomies. Roboticists find such tendon-driven systems particularly hard to control because they are simultaneously nonlinear, under-determined (many tendon tensions combine to produce few net joint torques), and over-determined (few joint rotations define how many tendons need to be reeled-in/payed-out). We demonstrate---for the first time in simulation and in hardware---how a model-free approach allows few-shot autonomous learning to produce effective locomotion in a 3-tendon/2-joint tendon-driven leg. Initially, an artificial neural network fed by sparsely sampled data collected using motor babbling creates an inverse map from limb kinematics to motor activations, which is analogous to juvenile vertebrates playing during development. Thereafter, iterative reward-driven exploration of candidate motor activations simultaneously refines the inverse map and finds a functional locomotor limit-cycle autonomously. This biologically-inspired algorithm, which we call G2P (General to Particular), enables versatile adaptation of robots to changes in the target task, mechanics of their bodies, and environment. Moreover, this work empowers future studies of few-shot autonomous learning in biological systems, which is the foundation of their enviable functional versatility. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: 39 pages, 6 figures

arXiv:1809.05970 [pdf]

Quantifying and attenuating pathologic tremor in virtual reality

Authors: Brian A. Cohn, Dilan D. Shah, Ali Marjaninejad, Martin Shapiro, Serhan Ulkumen, Christopher M. Laine, Francisco J. Valero-Cuevas, Kenneth H. Hayashida, Sarah Ingersoll

Abstract: We present a virtual reality (VR) experience that creates a research-grade benchmark in assessing patients with active upper-limb tremor, while simultaneously offering the opportunity for patients to engage with VR experiences without their pathologic tremor. Accurate and precise use of handheld motion controllers in VR gaming applications may be limited for patients with upper limb tremor. In par… ▽ More We present a virtual reality (VR) experience that creates a research-grade benchmark in assessing patients with active upper-limb tremor, while simultaneously offering the opportunity for patients to engage with VR experiences without their pathologic tremor. Accurate and precise use of handheld motion controllers in VR gaming applications may be limited for patients with upper limb tremor. In parallel, objective tools measuring tremor are not in widespread, routine clinical use. We used a commercially available VR system and designed a challenging virtual-balloon-pop** test mimicking a common nose-to-target pointing task used by medical practitioners to subjectively evaluate tremor in the exam room. Within our VR experience, we offer a software mode which uses a low-pass filter to adjust hand position and pointing orientation over a series of past data points. This digital filter creates a smoothing function for hand movement which effectively removes the patient's tremor in the VR representation. While the patient completes trials of the reaching task, quantitative data on the pathologic tremor is digitally recorded. With speed, accuracy, and the tremor components computed across three axes of movement, patients can be evaluated for their tremor amplitudes in a quantitative, replicable, and enjoyable manner. Removal of tremor in digital space may allow patients having significant upper limb tremor to have both an objective clinical measurement of symptoms while providing patients positive feedback and interaction. △ Less

Submitted 16 September, 2018; originally announced September 2018.

Comments: 3 pages; 3 figures

arXiv:1802.07490 [pdf, other]

ViTac: Feature Sharing between Vision and Tactile Sensing for Cloth Texture Recognition

Authors: Shan Luo, Wenzhen Yuan, Edward Adelson, Anthony G. Cohn, Raul Fuentes

Abstract: Vision and touch are two of the important sensing modalities for humans and they offer complementary information for sensing the environment. Robots could also benefit from such multi-modal sensing ability. In this paper, addressing for the first time (to the best of our knowledge) texture recognition from tactile images and vision, we propose a new fusion method named Deep Maximum Covariance Anal… ▽ More Vision and touch are two of the important sensing modalities for humans and they offer complementary information for sensing the environment. Robots could also benefit from such multi-modal sensing ability. In this paper, addressing for the first time (to the best of our knowledge) texture recognition from tactile images and vision, we propose a new fusion method named Deep Maximum Covariance Analysis (DMCA) to learn a joint latent space for sharing features through vision and tactile sensing. The features of camera images and tactile data acquired from a GelSight sensor are learned by deep neural networks. But the learned features are of a high dimensionality and are redundant due to the differences between the two sensing modalities, which deteriorates the perception performance. To address this, the learned features are paired using maximum covariance analysis. Results of the algorithm on a newly collected dataset of paired visual and tactile data relating to cloth textures show that a good recognition performance of greater than 90\% can be achieved by using the proposed DMCA framework. In addition, we find that the perception performance of either vision or tactile sensing can be improved by employing the shared representation space, compared to learning from unimodal data. △ Less

Submitted 13 March, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

Comments: 6 pages, 5 figures, Accepted for 2018 IEEE International Conference on Robotics and Automation

arXiv:1709.03456 [pdf, other]

doi 10.5518/249

CLAD: A Complex and Long Activities Dataset with Rich Crowdsourced Annotations

Authors: Jawad Tayyub, Majd Hawasly, David C. Hogg, Anthony G. Cohn

Abstract: This paper introduces a novel activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset presents a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of… ▽ More This paper introduces a novel activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset presents a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations. Furthermore, we also describe the methodology used to acquire annotations through crowdsourcing. Finally some activity recognition benchmarks are presented using current state-of-the-art techniques. We believe that this dataset is particularly suitable as a testbed for activity recognition research but it can also be applicable for other common tasks in robotics/computer vision research such as object detection and human skeleton tracking. △ Less

Submitted 21 September, 2017; v1 submitted 11 September, 2017; originally announced September 2017.

arXiv:1604.04384 [pdf, other]

doi 10.1109/MRA.2016.2636359

The STRANDS Project: Long-Term Autonomy in Everyday Environments

Authors: Nick Hawes, Chris Burbridge, Ferdian Jovan, Lars Kunze, Bruno Lacerda, Lenka Mudrová, Jay Young, Jeremy Wyatt, Denise Hebesberger, Tobias Körtner, Rares Ambrus, Nils Bore, John Folkesson, Patric Jensfelt, Lucas Beyer, Alexander Hermans, Bastian Leibe, Aitor Aldoma, Thomas Fäulhammer, Michael Zillich, Markus Vincze, Eris Chinellato, Muhannad Al-Omari, Paul Duckworth, Yiannis Gatsoulis , et al. (8 additional authors not shown)

Abstract: Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile… ▽ More Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile service robots, and deploying these systems for long-term installations in security and care environments. Over four deployments, our robots have been operational for a combined duration of 104 days autonomously performing end-user defined tasks, covering 116km in the process. In this article we describe the approach we have used to enable long-term autonomous operation in everyday environments, and how our robots are able to use their long run times to improve their own performance. △ Less

Submitted 14 October, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

arXiv:1401.5693 [pdf]

doi 10.1613/jair.2655

Sentence Compression as Tree Transduction

Authors: Trevor Anthony Cohn, Mirella Lapata

Abstract: This paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework… ▽ More This paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework. Experimental results on sentence compression bring significant improvements over a state-of-the-art model. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 34, pages 637-674, 2009

arXiv:1201.1530 [pdf, ps, other]

N-k-e Survivable Power System Design

Authors: Richard Li-Yang Chen, Amy Cohn, Neng Fan, Ali Pinar

Abstract: We consider the problem of designing (or augmenting) an electric power system such that it satisfies the N-k-e survivability criterion while minimizing total cost. The survivability criterion requires that at least (1-e) fraction of the total demand can still be met even if any k (or fewer) of the system components fail. We formulate this problem, taking into account both transmission and generati… ▽ More We consider the problem of designing (or augmenting) an electric power system such that it satisfies the N-k-e survivability criterion while minimizing total cost. The survivability criterion requires that at least (1-e) fraction of the total demand can still be met even if any k (or fewer) of the system components fail. We formulate this problem, taking into account both transmission and generation expansion planning, as a mixed-integer program. Two algorithms are designed and tested on modified instances from the IEEE-30-Bus and IEEE- 57-Bus systems. △ Less

Submitted 6 January, 2012; originally announced January 2012.

arXiv:1109.1801 [pdf, other]

An Implicit Optimization Approach for Survivable Network Design

Authors: Richard Chen, Amy Cohn, Ali Pinar

Abstract: We consider the problem of designing a network of minimum cost while satisfying a prescribed survivability criterion. The survivability criterion requires that a feasible flow must still exists (i.e. all demands can be satisfied without violating arc capacities) even after the disruption of a subset of the network's arcs. Specifically, we consider the case in which a disruption (random or maliciou… ▽ More We consider the problem of designing a network of minimum cost while satisfying a prescribed survivability criterion. The survivability criterion requires that a feasible flow must still exists (i.e. all demands can be satisfied without violating arc capacities) even after the disruption of a subset of the network's arcs. Specifically, we consider the case in which a disruption (random or malicious) can destroy a subset of the arcs, with the cost of the disruption not to exceed a disruption budget. This problem takes the form of a tri-level, two-player game, in which the network operator designs (or augments) the network, then the attacker launches a disruption that destroys a subset of arcs, and then the network operator attempts to find a feasible flow over the residual network. We first show how this can be modeled as a two-stage stochastic program from the network operator's perspective, with each of the exponential number of potential attacks considered as a disruption scenario. We then reformulate this problem, via a Benders decomposition, to consider the recourse decisions implicitly, greatly reducing the number of variables but at the expense of an exponential increase in the number of constraints. We next develop a cut-generation based algorithm. Rather than \emph{explicitly} considering each disruption scenario to identify these Benders cuts, however, we develop a bi-level program and corresponding separation algorithm that enables us to \emph{implicitly} evaluate the exponential set of disruption scenarios. Our computational results demonstrate the efficacy of this approach. △ Less

Submitted 8 September, 2011; originally announced September 2011.

arXiv:0909.0122 [pdf, ps, other]

Reasoning with Topological and Directional Spatial Information

Authors: Sanjiang Li, Anthony G. Cohn

Abstract: Current research on qualitative spatial representation and reasoning mainly focuses on one single aspect of space. In real world applications, however, multiple spatial aspects are often involved simultaneously. This paper investigates problems arising in reasoning with combined topological and directional information. We use the RCC8 algebra and the Rectangle Algebra (RA) for expressing topol… ▽ More Current research on qualitative spatial representation and reasoning mainly focuses on one single aspect of space. In real world applications, however, multiple spatial aspects are often involved simultaneously. This paper investigates problems arising in reasoning with combined topological and directional information. We use the RCC8 algebra and the Rectangle Algebra (RA) for expressing topological and directional information respectively. We give examples to show that the bipath-consistency algorithm BIPATH is incomplete for solving even basic RCC8 and RA constraints. If topological constraints are taken from some maximal tractable subclasses of RCC8, and directional constraints are taken from a subalgebra, termed DIR49, of RA, then we show that BIPATH is able to separate topological constraints from directional ones. This means, given a set of hybrid topological and directional constraints from the above subclasses of RCC8 and RA, we can transfer the joint satisfaction problem in polynomial time to two independent satisfaction problems in RCC8 and RA. For general RA constraints, we give a method to compute solutions that satisfy all topological constraints and approximately satisfy each RA constraint to any prescribed precision. △ Less

Submitted 1 September, 2009; originally announced September 2009.

Journal ref: Computational Intelligence, 2012, 28(4):579-616

arXiv:cs/9603104 [pdf, ps]

Active Learning with Statistical Models

Authors: D. A. Cohn, Z. Ghahramani, M. I. Jordan

Abstract: For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally wei… ▽ More For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance. △ Less

Submitted 29 February, 1996; originally announced March 1996.

Comments: See http://www.jair.org/ for any accompanying files

Journal ref: Journal of Artificial Intelligence Research, Vol 4, (1996), 129-145

Showing 1–31 of 31 results for author: Cohn, A