Search | arXiv e-print repository

Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit

Authors: Virginia Partridge, Jasmine Mangat, Rebecca Curran, Ryan McGrady, Ethan Zuckerman

Abstract: We present a method for map** Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings… ▽ More We present a method for map** Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings that maintain quality while supporting insight into changes in Reddit communities over time. Comparing different clusterings of community embeddings with quantitative measures of quality and temporal stability, we describe properties of the models and what they tell us about the underlying Reddit data. Moreover, qualitative analysis of the resulting clusters illuminate which properties of clusterings are useful for analysis of Reddit communities. Although clusterings of subreddits have been used in many earlier works, we believe this is the first study to qualitatively analyze how these clusterings are perceived by social media researchers at a Reddit-wide scale. Finally, we demonstrate how the temporal snapshots might be used in exploratory study. We are able to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading. This work informed the development of a webtool for exploring Reddit now available to the public at RedditMap.social. △ Less

Submitted 22 December, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: 10 pages, 8 figures

arXiv:1908.11086 [pdf]

An optimization framework for route design and allocation of aircraft to multiple departure routes

Authors: V. Ho-Huu, S. Hartjes, H. G. Visser, R. Curran

Abstract: In this article, we present the development of a two-step optimization framework to deal with the design and selection of aircraft departure routes and the allocation of flights among these routes. The aim of the framework is to minimize cumulative noise annoyance and fuel burn. In the first step of the framework, multi-objective trajectory optimization is used to compute and store a set of routes… ▽ More In this article, we present the development of a two-step optimization framework to deal with the design and selection of aircraft departure routes and the allocation of flights among these routes. The aim of the framework is to minimize cumulative noise annoyance and fuel burn. In the first step of the framework, multi-objective trajectory optimization is used to compute and store a set of routes that will serve as inputs in the second step. In the second step, the selection of routes from the set of pre-computed optimal routes and the optimal allocation of flights among these routes are conducted simultaneously. To validate the proposed framework, we also conduct an analysis involving an integrated (one-step) approach, in which both trajectory optimization and route allocation are formulated as a single optimization problem. A comparison of both approaches is then performed, and their advantages and disadvantages are identified. The performance and capabilities of the present framework are demonstrated using a case study at Amsterdam Airport Schiphol in The Netherlands. The numerical results show that the proposed framework can generate solutions which can achieve a reduction in the number of people annoyed of up to 31% and a reduction in fuel consumption of 7.3% relative to the reference case solution. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1906.01359 [pdf, other]

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Authors: Nicky Ringland, Xiang Dai, Ben Hachey, Sarvnaz Karimi, Cecile Paris, James R. Curran

Abstract: Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation… ▽ More Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: ACL 2019

arXiv:1502.07038 [pdf, other]

Web-scale Surface and Syntactic n-gram Features for Dependency Parsing

Authors: Dominick Ng, Mohit Bansal, James R. Curran

Abstract: We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface $n$-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks. Surfac… ▽ More We develop novel first- and second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface $n$-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks. Surface and syntactic $n$-grams both produce substantial and complementary gains in parsing accuracy across domains. Our best system combines the two feature sets, achieving up to 0.8% absolute UAS improvements on newswire and 1.4% on web text. △ Less

Submitted 24 February, 2015; originally announced February 2015.

Showing 1–4 of 4 results for author: Curran, R