-
The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks
Authors:
Dominik Schlechtweg,
Shafqat Mumtaz Virk,
Nikolay Arefyev
Abstract:
Lexical Semantic Change Detection (LSCD) is a complex, lemma-level task, which is usually operationalized based on two subsequently applied usage-level tasks: First, Word-in-Context (WiC) labels are derived for pairs of usages. Then, these labels are represented in a graph on which Word Sense Induction (WSI) is applied to derive sense clusters. Finally, LSCD labels are derived by comparing sense c…
▽ More
Lexical Semantic Change Detection (LSCD) is a complex, lemma-level task, which is usually operationalized based on two subsequently applied usage-level tasks: First, Word-in-Context (WiC) labels are derived for pairs of usages. Then, these labels are represented in a graph on which Word Sense Induction (WSI) is applied to derive sense clusters. Finally, LSCD labels are derived by comparing sense clusters over time. This modularity is reflected in most LSCD datasets and models. It also leads to a large heterogeneity in modeling options and task definitions, which is exacerbated by a variety of dataset versions, preprocessing options and evaluation metrics. This heterogeneity makes it difficult to evaluate models under comparable conditions, to choose optimal model combinations or to reproduce results. Hence, we provide a benchmark repository standardizing LSCD evaluation. Through transparent implementation results become easily reproducible and by standardization different components can be freely combined. The repository reflects the task's modularity by allowing model evaluation for WiC, WSI and LSCD. This allows for careful evaluation of increasingly complex model components providing new ways of model optimization.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change
Authors:
Dominik Schlechtweg,
Shafqat Mumtaz Virk,
Pauline Sander,
Emma Sköldberg,
Lukas Theuer Linke,
Tuo Zhang,
Nina Tahmasebi,
Jonas Kuhn,
Sabine Schulte im Walde
Abstract:
We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows…
▽ More
We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics giving insights into sense frequency distributions, semantic variation or changes of senses over time.
△ Less
Submitted 5 February, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Data Resource Profile: Egress Behavior from Select NYC COVID-19 Exposed Health Facilities March-May 2020
Authors:
Debra F. Laefer,
Thomas Kirchner,
Haoran,
Jiang,
Darlene Cheong,
Yunqi,
Jiang,
Aseah Khan,
Weiyi Qiu,
Nikki Tai,
Tiffany Truong,
Maimunah Virk
Abstract:
Vector control strategies are central to the mitigation and containment of COVID-19 and have come in the form of municipal ordinances that restrict the operational status of public and private spaces and associated services. Yet, little is known about specific population responses in terms of risk behaviors. To help understand the impact of those vector control variable strategies, a multi-week, m…
▽ More
Vector control strategies are central to the mitigation and containment of COVID-19 and have come in the form of municipal ordinances that restrict the operational status of public and private spaces and associated services. Yet, little is known about specific population responses in terms of risk behaviors. To help understand the impact of those vector control variable strategies, a multi-week, multi-site observational study was undertaken outside of 19 New York City medical facilities during the peak of the city's initial COVID-19 wave (03/22/20-05/19/20). The aim was to capture perishable data of the touch, destination choice, and PPE usage behavior of individuals egressing hospitals and urgent care centers. A major goal was to establish an empirical basis for future research on the way people interact with three-dimensional vector environments. Anonymized data were collected via smart phones. Each data record includes the time, data, and location of an individual leaving a healthcare facility, their routing, interactions with the build environment, other individuals, and themselves. Most records also note their PPE usage, destination, intermediary stops, and transportation choices. The records were linked with 61 socio-economic factors by the facility zip code and 7 contemporaneous weather factors and the merged in a unified shapefile in an ARCGIS system. This paper describes the project team and protocols used to produce over 5,100 publicly accessible observational records and an affiliated codebook that can be used to study linkages between individual behaviors and on-the-ground conditions.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.