-
Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit
Authors:
Virginia Partridge,
Jasmine Mangat,
Rebecca Curran,
Ryan McGrady,
Ethan Zuckerman
Abstract:
We present a method for map** Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings…
▽ More
We present a method for map** Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings that maintain quality while supporting insight into changes in Reddit communities over time. Comparing different clusterings of community embeddings with quantitative measures of quality and temporal stability, we describe properties of the models and what they tell us about the underlying Reddit data. Moreover, qualitative analysis of the resulting clusters illuminate which properties of clusterings are useful for analysis of Reddit communities. Although clusterings of subreddits have been used in many earlier works, we believe this is the first study to qualitatively analyze how these clusterings are perceived by social media researchers at a Reddit-wide scale.
Finally, we demonstrate how the temporal snapshots might be used in exploratory study. We are able to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading. This work informed the development of a webtool for exploring Reddit now available to the public at RedditMap.social.
△ Less
Submitted 22 December, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
An optimization framework for route design and allocation of aircraft to multiple departure routes
Authors:
V. Ho-Huu,
S. Hartjes,
H. G. Visser,
R. Curran
Abstract:
In this article, we present the development of a two-step optimization framework to deal with the design and selection of aircraft departure routes and the allocation of flights among these routes. The aim of the framework is to minimize cumulative noise annoyance and fuel burn. In the first step of the framework, multi-objective trajectory optimization is used to compute and store a set of routes…
▽ More
In this article, we present the development of a two-step optimization framework to deal with the design and selection of aircraft departure routes and the allocation of flights among these routes. The aim of the framework is to minimize cumulative noise annoyance and fuel burn. In the first step of the framework, multi-objective trajectory optimization is used to compute and store a set of routes that will serve as inputs in the second step. In the second step, the selection of routes from the set of pre-computed optimal routes and the optimal allocation of flights among these routes are conducted simultaneously. To validate the proposed framework, we also conduct an analysis involving an integrated (one-step) approach, in which both trajectory optimization and route allocation are formulated as a single optimization problem. A comparison of both approaches is then performed, and their advantages and disadvantages are identified. The performance and capabilities of the present framework are demonstrated using a case study at Amsterdam Airport Schiphol in The Netherlands. The numerical results show that the proposed framework can generate solutions which can achieve a reduction in the number of people annoyed of up to 31% and a reduction in fuel consumption of 7.3% relative to the reference case solution.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
Authors:
Nicky Ringland,
Xiang Dai,
Ben Hachey,
Sarvnaz Karimi,
Cecile Paris,
James R. Curran
Abstract:
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation…
▽ More
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Web-scale Surface and Syntactic n-gram Features for Dependency Parsing
Authors:
Dominick Ng,
Mohit Bansal,
James R. Curran
Abstract:
We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface $n$-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks.
Surfac…
▽ More
We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface $n$-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks.
Surface and syntactic $n$-grams both produce substantial and complementary gains in parsing accuracy across domains. Our best system combines the two feature sets, achieving up to 0.8% absolute UAS improvements on newswire and 1.4% on web text.
△ Less
Submitted 24 February, 2015;
originally announced February 2015.