-
Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of Anaphoric Reference for Fiction and Wikipedia Texts
Authors:
Juntao Yu,
Silviu Paun,
Maris Camilleri,
Paloma Carretero Garcia,
Jon Chamberlain,
Udo Kruschwitz,
Massimo Poesio
Abstract:
Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of…
▽ More
Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length).
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?
Authors:
Alba García Seco De Herrera,
Rukiye Savran Kiziltepe,
Jon Chamberlain,
Mihai Gabriel Constantin,
Claire-Hélène Demarty,
Faiyaz Doctor,
Bogdan Ionescu,
Alan F. Smeaton
Abstract:
This paper describes the MediaEval 2020 \textit{Predicting Media Memorability} task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of…
▽ More
This paper describes the MediaEval 2020 \textit{Predicting Media Memorability} task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video-to-Text dataset, containing more action rich video content as compared with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for participants' run submissions.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Strategic Revenue Management of Preemptive versus Non-Preemptive Queues
Authors:
Jonathan Chamberlain,
David Starobinski
Abstract:
Consider a two-class unobservable priority queue, with Poisson arrivals, generally distributed service, and strategic customers. Customers are charged a fee when joining the premium class. We analyze the maximum revenue achievable under the non-preemptive (NP) and preemptive-resume (PR) policies, and show that a provider is always better off implementing the PR policy. Further, the maximum revenue…
▽ More
Consider a two-class unobservable priority queue, with Poisson arrivals, generally distributed service, and strategic customers. Customers are charged a fee when joining the premium class. We analyze the maximum revenue achievable under the non-preemptive (NP) and preemptive-resume (PR) policies, and show that a provider is always better off implementing the PR policy. Further, the maximum revenue under PR is sometimes achieved when only a fraction of the customers join the premium class.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Social Welfare and Price of Anarchy in Preemptive Priority Queues
Authors:
Jonathan Chamberlain,
David Starobinski
Abstract:
Consider an unobservable $M|G|1$ queue with preemptive-resume scheduling and two priority classes. Customers are strategic and may join the premium class for a fee. We analyze the resulting equilibrium outcomes, equilibrium stability, and social welfare. We find that for service distributions with coefficient of variation greater than 1, there exists a unique and stable mixed equilibrium at low lo…
▽ More
Consider an unobservable $M|G|1$ queue with preemptive-resume scheduling and two priority classes. Customers are strategic and may join the premium class for a fee. We analyze the resulting equilibrium outcomes, equilibrium stability, and social welfare. We find that for service distributions with coefficient of variation greater than 1, there exists a unique and stable mixed equilibrium at low loads. We also establish a tight bound on the price of anarchy, which is $4/3$.
△ Less
Submitted 2 March, 2020; v1 submitted 28 February, 2020;
originally announced February 2020.
-
Identification of Pediatric Sepsis Subphenotypes for Enhanced Machine Learning Predictive Performance: A Latent Profile Analysis
Authors:
Tom Velez,
Tony Wang,
Ioannis Koutroulis,
James Chamberlain,
Amit Uppal,
Seife Yohannes,
Tim Tschampel,
Emilia Apostolova
Abstract:
Background: While machine learning (ML) models are rapidly emerging as promising screening tools in critical care medicine, the identification of homogeneous subphenotypes within populations with heterogeneous conditions such as pediatric sepsis may facilitate attainment of high-predictive performance of these prognostic algorithms. This study is aimed to identify subphenotypes of pediatric sepsis…
▽ More
Background: While machine learning (ML) models are rapidly emerging as promising screening tools in critical care medicine, the identification of homogeneous subphenotypes within populations with heterogeneous conditions such as pediatric sepsis may facilitate attainment of high-predictive performance of these prognostic algorithms. This study is aimed to identify subphenotypes of pediatric sepsis and demonstrate the potential value of partitioned data/subty**-based training. Methods: This was a retrospective study of clinical data extracted from medical records of 6,446 pediatric patients that were admitted at a major hospital system in the DC area. Vitals and labs associated with patients meeting the diagnostic criteria for sepsis were used to perform latent profile analysis. Modern ML algorithms were used to explore the predictive performance benefits of reduced training data heterogeneity via label profiling. Results: In total 134 (2.1%) patients met the diagnostic criteria for sepsis in this cohort and latent profile analysis identified four profiles/subphenotypes of pediatric sepsis. Profiles 1 and 3 had the lowest mortality and included pediatric patients from different age groups. Profile 2 were characterized by respiratory dysfunction; profile 4 by neurological dysfunction and highest mortality rate (22.2%). Machine learning experiments comparing the predictive performance of models derived without training data profiling against profile targeted models suggest statistically significant improved performance of prediction can be obtained. For example, area under ROC curve (AUC) obtained to predict profile 4 with 24-hour data (AUC = .998, p < .0001) compared favorably with the AUC obtained from the model considering all profiles as a single homogeneous group (AUC = .918) with 24-hour data.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Topic Modeling for Classification of Clinical Reports
Authors:
Efsun Sarioglu Kayi,
Kabir Yadav,
James M. Chamberlain,
Hyeong-Ah Choi
Abstract:
Electronic health records (EHRs) contain important clinical information about patients. Efficient and effective use of this information could supplement or even replace manual chart review as a means of studying and improving the quality and safety of healthcare delivery. However, some of these clinical data are in the form of free text and require pre-processing before use in automated systems. A…
▽ More
Electronic health records (EHRs) contain important clinical information about patients. Efficient and effective use of this information could supplement or even replace manual chart review as a means of studying and improving the quality and safety of healthcare delivery. However, some of these clinical data are in the form of free text and require pre-processing before use in automated systems. A common free text data source is radiology reports, typically dictated by radiologists to explain their interpretations. We sought to demonstrate machine learning classification of computed tomography (CT) imaging reports into binary outcomes, i.e. positive and negative for fracture, using regular text classification and classifiers based on topic modeling. Topic modeling provides interpretable themes (topic distributions) in reports, a representation that is more compact than the commonly used bag-of-words representation and can be processed faster than raw text in subsequent automated processes. We demonstrate new classifiers based on this topic modeling representation of the reports. Aggregate topic classifier (ATC) and confidence-based topic classifier (CTC) use a single topic that is determined from the training dataset based on different measures to classify the reports on the test dataset. Alternatively, similarity-based topic classifier (STC) measures the similarity between the reports' topic distributions to determine the predicted class. Our proposed topic modeling-based classifier systems are shown to be competitive with existing text classification techniques and provides an efficient and interpretable representation.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Motivations for Participation in Socially Networked Collective Intelligence Systems
Authors:
Jon Chamberlain,
Udo Kruschwitz,
Massimo Poesio
Abstract:
One of the most significant challenges facing systems of collective intelligence is how to encourage participation on the scale required to produce high quality data. This paper details ongoing work with Phrase Detectives, an online game-with-a-purpose deployed on Facebook, and investigates user motivations for participation in social network gaming where the wisdom of crowds produces useful data.
One of the most significant challenges facing systems of collective intelligence is how to encourage participation on the scale required to produce high quality data. This paper details ongoing work with Phrase Detectives, an online game-with-a-purpose deployed on Facebook, and investigates user motivations for participation in social network gaming where the wisdom of crowds produces useful data.
△ Less
Submitted 18 April, 2012;
originally announced April 2012.