-
Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies
Authors:
Nikolaos Flemotomos,
Victor R. Martinez,
Zhuohao Chen,
Karan Singla,
Victor Ardulov,
Raghuveer Peri,
Derek D. Caperton,
James Gibson,
Michael J. Tanana,
Panayiotis Georgiou,
Jake Van Epps,
Sarah P. Lord,
Tad Hirsch,
Zac E. Imel,
David C. Atkins,
Shrikanth Narayanan
Abstract:
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domai…
▽ More
With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is however a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called Motivational Interviewing, our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5,000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.
△ Less
Submitted 27 March, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Measuring Expert Performance at Manually Classifying Domain Entities under Upper Ontology Classes
Authors:
Robert Stevens,
Phillip Lord,
James Malone,
Nicolas Matentzoglu
Abstract:
Classifying entities in domain ontologies under upper ontology classes is a recommended task in ontology engineering to facilitate semantic interoperability and modelling consistency. Integrating upper ontologies this way is difficult and, despite emerging automated methods, remains a largely manual task.
Little is known about how well experts perform at upper ontology integration. To develop me…
▽ More
Classifying entities in domain ontologies under upper ontology classes is a recommended task in ontology engineering to facilitate semantic interoperability and modelling consistency. Integrating upper ontologies this way is difficult and, despite emerging automated methods, remains a largely manual task.
Little is known about how well experts perform at upper ontology integration. To develop methodological and tool support, we first need to understand how well experts do this task. We designed a study to measure the performance of human experts at manually classifying classes in a general knowledge domain ontology with entities in the Basic Formal Ontology (BFO), an upper ontology used widely in the biomedical domain.
We conclude that manually classifying domain entities under upper ontology classes is indeed very difficult to do correctly. Given the importance of the task and the high degree of inconsistent classifications we encountered, we further conclude that it is necessary to improve the methodological framework surrounding the manual integration of domain and upper ontologies.
△ Less
Submitted 11 October, 2018;
originally announced October 2018.
-
Facets, Tiers and Gems: Ontology Patterns for Hypernormalisation
Authors:
Phillip Lord,
Robert Stevens
Abstract:
There are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are descr…
▽ More
There are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are described and defined in terms of the latter and used to manage the polyhierarchy of the self-standing entities. Fully programmatic development is a technique where an ontology is developed using a domain-specific language within a programming language, meaning that as well defining ontological entities, it is possible to add arbitrary patterns or new syntax within the same environment. We describe how new patterns can be used to enable a new style of ontology development that we call hypernormalisation.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
A Literature Based Approach to Define the Scope of Biomedical Ontologies: A Case Study on a Rehabilitation Therapy Ontology
Authors:
Mohammad K. Halawani,
Rob Forsyth,
Phillip Lord
Abstract:
In this article, we investigate our early attempts at building an ontology describing rehabilitation therapies following brain injury. These therapies are wide-ranging, involving interventions of many different kinds. As a result, these therapies are hard to describe. As well as restricting actual practice, this is also a major impediment to evidence-based medicine as it is hard to meaningfully co…
▽ More
In this article, we investigate our early attempts at building an ontology describing rehabilitation therapies following brain injury. These therapies are wide-ranging, involving interventions of many different kinds. As a result, these therapies are hard to describe. As well as restricting actual practice, this is also a major impediment to evidence-based medicine as it is hard to meaningfully compare two treatment plans.
Ontology development requires significant effort from both ontologists and domain experts. Knowledge elicited from domain experts forms the scope of the ontology. The process of knowledge elicitation is expensive, consumes experts' time and might have biases depending on the selection of the experts. Various methodologies and techniques exist for enabling this knowledge elicitation, including community groups and open development practices. A related problem is that of defining scope. By defining the scope, we can decide whether a concept (i.e. term) should be represented in the ontology. This is the opposite of knowledge elicitation, in the sense that it defines what should not be in the ontology. This can be addressed by pre-defining a set of competency questions.
These approaches are, however, expensive and time-consuming. Here, we describe our work toward an alternative approach, bootstrap** the ontology from an initially small corpus of literature that will define the scope of the ontology, expanding this to a set covering the domain, then using information extraction to define an initial terminology to provide the basis and the competencies for the ontology. Here, we discuss four approaches to building a suitable corpus that is both sufficiently covering and precise.
△ Less
Submitted 27 September, 2017;
originally announced September 2017.
-
Identitas: A Better Way To Be Meaningless
Authors:
Nizal Alshammry,
Phillip Lord
Abstract:
It is often recommended that identifiers for ontology terms should be semantics-free or meaningless. In practice, ontology developers tend to use numeric identifiers, starting at 1 and working upwards. In this paper we present a critique of current ontology semantics-free identifiers; monotonically increasing numbers have a number of significant usability flaws which make them unsuitable as a defa…
▽ More
It is often recommended that identifiers for ontology terms should be semantics-free or meaningless. In practice, ontology developers tend to use numeric identifiers, starting at 1 and working upwards. In this paper we present a critique of current ontology semantics-free identifiers; monotonically increasing numbers have a number of significant usability flaws which make them unsuitable as a default option, and we present a series of alternatives. We have provide an implementation of these alternatives which can be freely combined.
△ Less
Submitted 28 September, 2017; v1 submitted 26 September, 2017;
originally announced September 2017.
-
User and Developer Interaction with Editable and Readable Ontologies
Authors:
Aisha Blfgeh,
Phillip Lord
Abstract:
The process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology develope…
▽ More
The process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology developers by creating more readable forms of ontologies, and further to enable editing in normal office environments. Building on a programmatic ontology development environment, such as Tawny-OWL, we are now able to generate these readable/editable from the raw ontological source and its embedded comments. We have this translation to HTML for reading; this environment provides rich hyperlinking as well as active features such as hiding the source code in favour of comments. We are now working on translation to a Word document that also enables editing. Taken together this should provide a significant new route for collaboration between the ontologist and domain specialist.
△ Less
Submitted 28 September, 2017; v1 submitted 26 September, 2017;
originally announced September 2017.
-
On Patterns and Re-Use in Bioinformatics Databases
Authors:
Michael J Bell,
Phillip Lord
Abstract:
As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database m…
▽ More
As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others.
In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design.
We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
A Highly Literate Approach to Ontology Building
Authors:
Phillip Lord,
Jennifer Warrendar
Abstract:
Ontologies present an attractive technology for describing bio-medicine, because they can be shared, and have rich computational properties. However, they lack the rich expressivity of English and fit poorly with the current scientific "publish or perish" model. While, there have been attempts to combine free text and ontologies, most of these perform \textit{post-hoc} annotation of text. In this…
▽ More
Ontologies present an attractive technology for describing bio-medicine, because they can be shared, and have rich computational properties. However, they lack the rich expressivity of English and fit poorly with the current scientific "publish or perish" model. While, there have been attempts to combine free text and ontologies, most of these perform \textit{post-hoc} annotation of text. In this paper, we introduce our new environment which borrows from literate programming, to allow an author to co-develop both text and ontological description. We are currently using this environment to document the Karyotype Ontology which allows rich descriptions of the chromosomal complement in humans. We explore some of the advantages and difficulties of this form of ontology development.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
Scaffolding the Mitochondrial Disease Ontology from extant knowledge sources
Authors:
Jennifer D. Warrender,
Phillip Lord
Abstract:
Bio-medical ontologies can contain a large number of concepts. Often many of these concepts are very similar to each other, and similar or identical to concepts found in other bio-medical databases. This presents both a challenge and opportunity: maintaining many similar concepts is tedious and fastidious work, which could be substantially reduced if the data could be derived from pre-existing kno…
▽ More
Bio-medical ontologies can contain a large number of concepts. Often many of these concepts are very similar to each other, and similar or identical to concepts found in other bio-medical databases. This presents both a challenge and opportunity: maintaining many similar concepts is tedious and fastidious work, which could be substantially reduced if the data could be derived from pre-existing knowledge sources. In this paper, we describe how we have achieved this for an ontology of the mitochondria using our novel ontology development environment, the Tawny-OWL library.
△ Less
Submitted 15 May, 2015;
originally announced May 2015.
-
How, What and Why to test an ontology
Authors:
Jennifer D. Warrender,
Phillip Lord
Abstract:
Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies.
The application of testing to on…
▽ More
Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies.
The application of testing to ontologies, therefore, seems attractive. The Karyotype Ontology is developed using the novel Tawny-OWL library. This provides a fully programmatic environment for ontology development, which includes a complete test harness.
In this paper, we describe how we have used this harness to build an extensive series of tests as well as used a commodity continuous integration system to link testing deeply into our development process; this environment, is applicable to any OWL ontology whether written using Tawny-OWL or not. Moreover, we present a novel analysis of our tests, introducing a new classification of what our different tests are. For each class of test, we describe why we use these tests, also by comparison to software tests. We believe that this systematic comparison between ontology and software development will help us move to a more agile form of ontology development.
△ Less
Submitted 15 May, 2015;
originally announced May 2015.
-
A pattern-driven approach to biomedical ontology engineering
Authors:
Jennifer D. Warrender,
Phillip Lord
Abstract:
Develo** ontologies can be expensive, time-consuming, as well as difficult to develop and maintain. This is especially true for more expressive and/or larger ontologies. Some ontologies are, however, relatively repetitive, reusing design patterns; building these with both generic and bespoke patterns should reduce duplication and increase regularity which in turn should impact on the cost of dev…
▽ More
Develo** ontologies can be expensive, time-consuming, as well as difficult to develop and maintain. This is especially true for more expressive and/or larger ontologies. Some ontologies are, however, relatively repetitive, reusing design patterns; building these with both generic and bespoke patterns should reduce duplication and increase regularity which in turn should impact on the cost of development.
Here we report on the usage of patterns applied to two biomedical ontologies: firstly a novel ontology for karyotypes which has been built ground-up using a pattern based approach; and, secondly, our initial refactoring of the SIO ontology to make explicit use of patterns at development time. To enable this, we use the Tawny-OWL library which enables full-programmatic development of ontologies. We show how this approach can generate large numbers of classes from much simpler data structures which is highly beneficial within biomedical ontology engineering.
△ Less
Submitted 2 December, 2013;
originally announced December 2013.
-
An evolutionary approach to Function
Authors:
Phillip Lord
Abstract:
Background: Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology.
Results: I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Invest…
▽ More
Background: Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology.
Results: I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Investigations (OBI). Finally, I give an axiomatisation and discuss methods for applying these definitions in practice.
Conclusions: The definitions in this paper are applicable, formalizing current practice. As such, they make a significant contribution to the use of these concepts within biomedical ontologies.
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Can inferred provenance and its visualisation be used to detect erroneous annotation? A case study using UniProtKB
Authors:
Michael J. Bell,
Matthew Collison,
Phillip Lord
Abstract:
A constant influx of new data poses a challenge in kee** the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge, during the curation process annotations are often propagated between entries. However, this is often not made explicit…
▽ More
A constant influx of new data poses a challenge in kee** the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge, during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over 100, 000 sentences. Over 8000 sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately 30% are erroneous, whilst 35% appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website.
△ Less
Submitted 21 August, 2013;
originally announced August 2013.
-
The Karyotype Ontology: a computational representation for human cytogenetic patterns
Authors:
Jennifer D. Warrender,
Phillip Lord
Abstract:
The karyotype ontology describes the human chromosome complement as determined cytogenetically, and is designed as an initial step toward the goal of replacing the current system which is based on semantically meaningful strings. This ontology uses a novel, semi-programmatic methodology based around the tawny library to construct many classes rapidly. Here, we describe our use case, methodology an…
▽ More
The karyotype ontology describes the human chromosome complement as determined cytogenetically, and is designed as an initial step toward the goal of replacing the current system which is based on semantically meaningful strings. This ontology uses a novel, semi-programmatic methodology based around the tawny library to construct many classes rapidly. Here, we describe our use case, methodology and the event-based approach that we use to represent karyotypes.
The ontology is available at http://www.purl.org/ontolink/karyotype/. The clojure code is available at http://code.google.com/p/karyotype-clj/.
△ Less
Submitted 16 May, 2013;
originally announced May 2013.
-
Twenty-Five Shades of Greycite: Semantics for referencing and preservation
Authors:
Phillip Lord,
Lindsay Marshall
Abstract:
Semantic publishing can enable richer documents with clearer, computationally interpretable properties. For this vision to become reality, however, authors must benefit from this process, so that they are incentivised to add these semantics. Moreover, the publication process that generates final content must allow and enable this semantic content. Here we focus on author-led or "grey" literature,…
▽ More
Semantic publishing can enable richer documents with clearer, computationally interpretable properties. For this vision to become reality, however, authors must benefit from this process, so that they are incentivised to add these semantics. Moreover, the publication process that generates final content must allow and enable this semantic content. Here we focus on author-led or "grey" literature, which uses a convenient and simple publication pipeline. We describe how we have used metadata in articles to enable richer referencing of these articles and how we have customised the addition of these semantics to articles. Finally, we describe how we use the same semantics to aid in digital preservation and non-repudiability of research articles.
△ Less
Submitted 26 April, 2013;
originally announced April 2013.
-
The Semantic Web takes Wing: Programming Ontologies with Tawny-OWL
Authors:
Phillip Lord
Abstract:
The Tawny-OWL library provides a fully-programmatic environment for ontology building; it enables the use of a rich set of tools for ontology development, by recasting development as a form of programming. It is built in Clojure - a modern Lisp dialect, and is backed by the OWL API. Used simply, it has a similar syntax to OWL Manchester syntax, but it provides arbitrary extensibility and ab…
▽ More
The Tawny-OWL library provides a fully-programmatic environment for ontology building; it enables the use of a rich set of tools for ontology development, by recasting development as a form of programming. It is built in Clojure - a modern Lisp dialect, and is backed by the OWL API. Used simply, it has a similar syntax to OWL Manchester syntax, but it provides arbitrary extensibility and abstraction. It builds on existing facilities for Clojure, which provides a rich and modern programming tool chain, for versioning, distributed development, build, testing and continuous integration. In this paper, we describe the library, this environment and the its potential implications for the ontology development process.
△ Less
Submitted 1 March, 2013;
originally announced March 2013.
-
An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB
Authors:
Michael J. Bell,
Colin S. Gillespie,
Daniel Swan,
Phillip Lord
Abstract:
Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avo…
▽ More
Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use UniProt Knowledge Base (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations.
Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality.
Availability: Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation.
Contact: [email protected]
△ Less
Submitted 10 August, 2012;
originally announced August 2012.
-
Three Steps to Heaven: Semantic Publishing in a Real World Workflow
Authors:
Phillip Lord,
Simon Cockell,
Robert Stevens
Abstract:
Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen' PDF. In this paper, we discuss a web-first approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling…
▽ More
Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen' PDF. In this paper, we discuss a web-first approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling. Critically, although it adds limited semantics, it does provide value to all the participants in the process: the author, the reader and the machine.
△ Less
Submitted 22 June, 2012;
originally announced June 2012.
-
Lost in translation: data integration tools meet the Semantic Web (experiences from the Ondex project)
Authors:
Andrea Splendiani,
Chris J Rawlings,
Shao-Chih Kuo,
Robert Stevens,
Phillip Lord
Abstract:
More information is now being published in machine processable form on the web and, as de-facto distributed knowledge bases are materializing, partly encouraged by the vision of the Semantic Web, the focus is shifting from the publication of this information to its consumption. Platforms for data integration, visualization and analysis that are based on a graph representation of information appear…
▽ More
More information is now being published in machine processable form on the web and, as de-facto distributed knowledge bases are materializing, partly encouraged by the vision of the Semantic Web, the focus is shifting from the publication of this information to its consumption. Platforms for data integration, visualization and analysis that are based on a graph representation of information appear first candidates to be consumers of web-based information that is readily expressible as graphs. The question is whether the adoption of these platforms to information available on the Semantic Web requires some adaptation of their data structures and semantics. Ondex is a network-based data integration, analysis and visualization platform which has been developed in a Life Sciences context. A number of features, including semantic annotation via ontologies and an attention to provenance and evidence, make this an ideal candidate to consume Semantic Web information, as well as a prototype for the application of network analysis tools in this context. By analyzing the Ondex data structure and its usage, we have found a set of discrepancies and errors arising from the semantic mismatch between a procedural approach to network analysis and the implications of a web-based representation of information. We report in the paper on the simple methodology that we have adopted to conduct such analysis, and on issues that we have found which may be relevant for a range of similar platforms
△ Less
Submitted 24 March, 2011;
originally announced March 2011.