Skip to main content

Showing 1–35 of 35 results for author: Clark, E

Searching in archive cs. Search in all archives.
.
  1. Stochastic Guidance of Buoyancy Controlled Vehicles under Ice Shelves using Ocean Currents

    Authors: Federico Rossi, Andrew Branch, Michael P. Schodlok, Timothy Stanton, Ian G. Fenty, Joshua Vander Hook, Evan B. Clark

    Abstract: We propose a novel technique for guidance of buoyancy-controlled vehicles in uncertain under-ice ocean flows. In-situ melt rate measurements collected at the grounding zone of Antarctic ice shelves, where the ice shelf meets the underlying bedrock, are essential to constrain models of future sea level rise. Buoyancy-controlled vehicles, which control their vertical position in the water column thr… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Presented at IROS 2021

  2. arXiv:2405.14199  [pdf, other

    cs.RO cs.LG

    Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse Reward Scenarios

    Authors: Emma Clark, Kanghyun Ryu, Negar Mehr

    Abstract: Learning from Demonstration (LfD) can be an efficient way to train systems with analogous agents by enabling ``Student'' agents to learn from the demonstrations of the most experienced ``Teacher'' agent, instead of training their policy in parallel. However, when there are discrepancies in agent capabilities, such as divergent actuator power or joint angle constraints, naively replicating demonstr… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: To be published in L4DC 2024, 10 pages, 5 figures

  3. arXiv:2403.03906  [pdf, other

    cs.DS

    On HTLC-Based Protocols for Multi-Party Cross-Chain Swaps

    Authors: Emily Clark, Chloe Georgiou, Katelyn Poon, Marek Chrobak

    Abstract: In his 2018 paper, Herlihy introduced an atomic protocol for multi-party asset swaps across different blockchains. His model represents an asset swap by a directed graph whose nodes are the participating parties and edges represent asset transfers, and rational behavior of the participants is captured by a preference relation between a protocol's outcomes. Asset transfers between parties are achie… ▽ More

    Submitted 14 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    ACM Class: F.2

  4. arXiv:2401.03506  [pdf, other

    eess.AS cs.LG cs.SD

    DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

    Authors: Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

    Abstract: In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (A… ▽ More

    Submitted 26 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  5. arXiv:2311.09424  [pdf, other

    cs.CV

    Predicting Spine Geometry and Scoliosis from DXA Scans

    Authors: Amir Jamaludin, Timor Kadir, Emma Clark, Andrew Zisserman

    Abstract: Our objective in this paper is to estimate spine curvature in DXA scans. To this end we first train a neural network to predict the middle spine curve in the scan, and then use an integral-based method to determine the curvature along the spine curve. We use the curvature to compare to the standard angle scoliosis measure obtained using the DXA Scoliosis Method (DSM). The performance improves over… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: CSI@MICCAI 2019 Submission

  6. arXiv:2305.16600  [pdf, other

    cs.CY

    Temporal Evolution of Risk Behavior in a Disease Spread Simulation

    Authors: Ollin D. Langle-Chimal, Scott C. Merrill, Eric M. Clark, Gabriela Bucini, Tung-Lin Liu, Trisha R. Shrum, Christopher Koliba, Asim Zia, Julia M. Smith, Nicholas Cheney

    Abstract: Human behavior is a dynamic process that evolves with experience. Understanding the evolution of individual's risk propensity is critical to design public health interventions to propitiate the adoption of better biosecurity protocols and thus, prevent the transmission of an infectious disease. Using an experimental game that simulates the spread of a disease in a network of porcine farms, we meas… ▽ More

    Submitted 1 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 12 pages, 1 table, 7 figures

    MSC Class: ACM-class: F.2.2; I.2.7

  7. arXiv:2305.14755  [pdf, other

    cs.CL cs.AI

    Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

    Authors: Akhila Yerukola, Xuhui Zhou, Elizabeth Clark, Maarten Sap

    Abstract: Most existing stylistic text rewriting methods and evaluation metrics operate on a sentence level, but ignoring the broader context of the text can lead to preferring generic, ambiguous, and incoherent rewrites. In this paper, we investigate integrating the preceding textual context into both the $\textit{rewriting}$ and $\textit{evaluation}$ stages of stylistic text rewriting, and introduce a new… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: emnlp 2023 main camera ready

  8. arXiv:2305.13194  [pdf, other

    cs.CL

    SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

    Authors: Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur P. Parikh

    Abstract: Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensi… ▽ More

    Submitted 1 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  9. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  10. arXiv:2212.10622  [pdf, other

    cs.CL

    mFACE: Multilingual Summarization with Factual Consistency Evaluation

    Authors: Roee Aharoni, Shashi Narayan, Joshua Maynez, Jonathan Herzig, Elizabeth Clark, Mirella Lapata

    Abstract: Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. Several recent efforts attempt to address this by devising models that automatically det… ▽ More

    Submitted 5 January, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: 28 pages with links to released data

  11. arXiv:2212.10397  [pdf, other

    cs.CL

    Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

    Authors: Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc

    Abstract: To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization. Thus, we investigate the recruitment of high-quality Amazon Mechanical Turk workers via a two-step pipeline. We show that we can successfully filter out subpar worke… ▽ More

    Submitted 13 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  12. Pacific Lamprey Inspired Climbing

    Authors: Brian Van Stratum, Kourosh Shoele, Jonathan E. Clark

    Abstract: Snakes and their bio-inspired robot counterparts have demonstrated locomotion on a wide range of terrains. However, dynamic vertical climbing is one locomotion strategy that has received little attention in the existing snake robotics literature. We demonstrate a new scansorial gait and robot inspired by the locomotion of the Pacific Lamprey. This new gait allows a robot to steer while climbing on… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

  13. arXiv:2211.00922  [pdf, other

    cs.CL

    Dialect-robust Evaluation of Generated Text

    Authors: Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann

    Abstract: Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects. However, currently, there exists no way to quantify how metrics respond to change in the dialect of a generated utterance. We thus formalize dialect robustness and dialect awareness as… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  14. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di **, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  15. arXiv:2202.06935  [pdf, other

    cs.CL cs.AI cs.LG

    Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

    Authors: Sebastian Gehrmann, Elizabeth Clark, Thibault Sellam

    Abstract: Evaluation practices in natural language generation (NLG) have many known flaws, but improved evaluation approaches are rarely widely adopted. This issue has become more urgent, since neural NLG models have improved to the point where they can often no longer be distinguished based on the surface-level features that older metrics rely on. This paper surveys the issues with human and automatic mode… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  16. arXiv:2107.00061  [pdf, other

    cs.CL

    All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

    Authors: Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. Smith

    Abstract: Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text? We run a study assessing non-experts' ability to distinguish between human- and machine-authored text (GPT2 and GPT3) in three domains (stories, news articles, and recipes). We find that, without training, eva… ▽ More

    Submitted 7 July, 2021; v1 submitted 30 June, 2021; originally announced July 2021.

    Comments: references added, corrected typo

  17. arXiv:2102.13476  [pdf, other

    eess.SP cs.LG math.OC

    PySensors: A Python Package for Sparse Sensor Placement

    Authors: Brian M. de Silva, Krithika Manohar, Emily Clark, Bingni W. Brunton, Steven L. Brunton, J. Nathan Kutz

    Abstract: PySensors is a Python package for selecting and placing a sparse set of sensors for classification and reconstruction tasks. Specifically, PySensors implements algorithms for data-driven sparse sensor placement optimization for reconstruction (SSPOR) and sparse sensor placement optimization for classification (SSPOC). In this work we provide a brief description of the mathematical algorithms and t… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

  18. arXiv:2008.12247  [pdf, other

    cs.LG

    Bracketing brackets with bras and kets

    Authors: Emily Clark, Angelie Vincent, J. Nathan Kutz, Steven L. Brunton

    Abstract: Brackets are an essential component in aircraft manufacture and design, joining parts together, supporting weight, holding wires, and strengthening joints. Hundreds or thousands of unique brackets are used in every aircraft, but manufacturing a large number of distinct brackets is inefficient and expensive. Fortunately, many so-called "different" brackets are in fact very similar or even identical… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: 10 pages, 9 figures

  19. arXiv:2006.14799  [pdf, other

    cs.CL cs.LG

    Evaluation of Text Generation: A Survey

    Authors: Asli Celikyilmaz, Elizabeth Clark, Jianfeng Gao

    Abstract: The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics. For each category, we discuss the progress that has been made and the challenges still being fac… ▽ More

    Submitted 18 May, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: 47 pages (revised version)

  20. arXiv:2004.12129  [pdf

    physics.soc-ph cs.CY cs.SI

    Assessing the impact of the coronavirus lockdown on unhappiness, loneliness, and boredom using Google Trends

    Authors: Abel Brodeur, Andrew E. Clark, Sarah Fleche, Nattavudh Powdthavee

    Abstract: The COVID-19 pandemic has led many governments to implement lockdowns. While lockdowns may help to contain the spread of the virus, it is possible that substantial damage to population well-being will result. This study relies on Google Trends data and tests whether the lockdowns implemented in Europe and America led to changes in well-being related topic search terms. Using different methods to e… ▽ More

    Submitted 25 April, 2020; originally announced April 2020.

  21. arXiv:2004.03607  [pdf, other

    cs.CL

    TuringAdvice: A Generative and Dynamic Evaluation of Language Use

    Authors: Rowan Zellers, Ari Holtzman, Elizabeth Clark, Lianhui Qin, Ali Farhadi, Ye** Choi

    Abstract: We propose TuringAdvice, a new challenge task and dataset for language understanding models. Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language. Our evaluation framework tests a fundamental aspect of human language understanding: our ability to use language to resolve open-ended situations by communicating with each other. E… ▽ More

    Submitted 12 April, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: NAACL 2021 camera ready. Project page at https://rowanzellers.com/advice

  22. Effects of Social Cues on Biosecurity Compliance in Livestock Facilities: Evidence from Experimental Simulations

    Authors: Luke Trinity, Scott C. Merrill, Eric Clark, Christopher J. Koliba, Asim Zia, Gabriela Bucini, Julia M. Smith

    Abstract: Disease outbreaks in U.S. animal livestock industries have economic impacts measured in hundreds of millions of dollars per year. Biosecurity, or procedures intended to protect animals against disease, is known to be effective at reducing infection risk at facilities. Yet to the detriment of animal health, humans do not always follow biosecurity protocols. Human behavioral factors have been shown… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 30 pages, 4 figures, 6 tables

  23. Using Digital Field Experiments To Elicit Risk Mitigation Behavioral Strategies For Disease Management Across Agricultural Production Systems

    Authors: Eric M. Clark, Scott C. Merrill, Luke Trinity, Gabriela Bucini, Nicholas Cheney, Ollin Langle-Chimal, Trisha Shrum, Christopher Koliba, Asim Zia, Julia M. Smith

    Abstract: Failing to mitigate propagation of disease spread can result in dire economic consequences for agricultural networks. Pathogens like Porcine Epidemic Diarrhea virus, can quickly spread among producers. Biosecurity is designed to prevent infection transmission. When considering biosecurity investments, management must balance the cost of protection versus the consequences of contracting an infectio… ▽ More

    Submitted 1 October, 2019; v1 submitted 20 September, 2019; originally announced September 2019.

  24. arXiv:1909.04076  [pdf, other

    cs.CL cs.AI

    Counterfactual Story Reasoning and Generation

    Authors: Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Ye** Choi

    Abstract: Counterfactual reasoning requires predicting how alternative events, contrary to what actually happened, might have resulted in different outcomes. Despite being considered a necessary component of AI-complete systems, few resources have been developed for evaluating counterfactual reasoning in narratives. In this paper, we propose Counterfactual Story Rewriting: given an original story and an i… ▽ More

    Submitted 12 September, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP 2019

  25. arXiv:1805.09959  [pdf, other

    cs.CL cs.SI

    A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter

    Authors: Eric M. Clark, Ted James, Chris A. Jones, Amulya Alapati, Promise Ukandu, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing the… ▽ More

    Submitted 12 October, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

  26. arXiv:1804.10202  [pdf, other

    cs.HC cs.AI cs.CL

    Sounding Board: A User-Centric and Content-Driven Social Chatbot

    Authors: Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Ye** Choi, Noah A. Smith, Mari Ostendorf

    Abstract: We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa Prize. The system architecture consists of several components including spoken language processing, dialogue management, language generation, and content management, with emphasis on user-centric and content-driven design. We also share insights gained from large-scale online logs based on 160,000 conversations with real-wo… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: 5 pages, 3 figures, NAACL 2018

  27. arXiv:1802.06220  [pdf, ps, other

    eess.SP cs.IT cs.MA eess.SY

    Fusion of finite set distributions: Pointwise consistency and global cardinality

    Authors: Murat Üney, Jérémie Houssineau, Emmanuel Delande, Simon J. Julier, Daniel E. Clark

    Abstract: A recent trend in distributed multi-sensor fusion is to use random finite set filters at the sensor nodes and fuse the filtered distributions algorithmically using their exponential mixture densities (EMDs). Fusion algorithms which extend the celebrated covariance intersection and consensus based approaches are such examples. In this article, we analyse the variational principle underlying EMDs an… ▽ More

    Submitted 3 December, 2018; v1 submitted 17 February, 2018; originally announced February 2018.

    Comments: accepted for publication in the IEEE Transactions on Aerospace and Electronics Systems

  28. arXiv:1708.00842  [pdf, ps, other

    eess.SY cs.IT cs.MA stat.AP stat.CO

    Latent Parameter Estimation in Fusion Networks Using Separable Likelihoods

    Authors: Murat Uney, Bernard Mulgrew, Daniel E Clark

    Abstract: Multi-sensor state space models underpin fusion applications in networks of sensors. Estimation of latent parameters in these models has the potential to provide highly desirable capabilities such as network self-calibration. Conventional solutions to the problem pose difficulties in scaling with the number of sensors due to the joint multi-sensor filtering involved when evaluating the parameter l… ▽ More

    Submitted 2 January, 2018; v1 submitted 2 August, 2017; originally announced August 2017.

    Comments: accepted with minor revisions, IEEE Transactions on Signal and Information Processing Over Networks

  29. arXiv:1611.02989  [pdf, ps, other

    cs.IT

    Bayesian data assimilation based on a family of outer measures

    Authors: Jeremie Houssineau, Daniel E. Clark

    Abstract: A flexible representation of uncertainty that remains within the standard framework of probabilistic measure theory is presented along with a study of its properties. This representation relies on a specific type of outer measure that is based on the measure of a supremum, hence combining additive and highly sub-additive components. It is shown that this type of outer measure enables the introduct… ▽ More

    Submitted 9 November, 2016; originally announced November 2016.

    MSC Class: 60A10; 62C10

  30. arXiv:1508.01843  [pdf, other

    cs.SI

    Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter

    Authors: Eric M. Clark, Chris A. Jones, Jake Ryland Williams, Allison N. Kurti, Michell Craig Nortotsky, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Background: Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, and free samples. Methods:All electronic cigarette keyword related tweets from a 10% sample of Twitter spanning January 2012… ▽ More

    Submitted 5 March, 2016; v1 submitted 7 August, 2015; originally announced August 2015.

  31. arXiv:1505.06750  [pdf, other

    physics.soc-ph cs.CL

    Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics

    Authors: P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J. Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M. Kloumann, J. P. Bagrow, K. Megerdoomian, M. T. McMahon, B. F. Tivnan, C. M. Danforth

    Abstract: We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists. In particular, we show that the English… ▽ More

    Submitted 28 May, 2015; v1 submitted 25 May, 2015; originally announced May 2015.

    Comments: 5 pages, 2 figures, 1 table. Expanded version of reply appearing in PNAS 2015

  32. arXiv:1505.04342  [pdf, other

    cs.CL

    Sifting Robotic from Organic Text: A Natural Language Approach for Detecting Automation on Twitter

    Authors: Eric M. Clark, Jake Ryland Williams, Chris A. Jones, Richard A. Galbraith, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevol… ▽ More

    Submitted 14 June, 2016; v1 submitted 16 May, 2015; originally announced May 2015.

  33. arXiv:1503.02120  [pdf, other

    cs.CL cs.IT stat.ML

    Identifying missing dictionary entries with frequency-conserving context models

    Authors: Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in tex… ▽ More

    Submitted 28 July, 2015; v1 submitted 6 March, 2015; originally announced March 2015.

    Comments: 16 pages, 6 figures, and 7 tables

  34. arXiv:1406.5181  [pdf, other

    cs.CL physics.soc-ph

    Zipf's law holds for phrases, not words

    Authors: Jake Ryland Williams, Paul R. Lessard, Suma Desu, Eric Clark, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically… ▽ More

    Submitted 4 March, 2015; v1 submitted 19 June, 2014; originally announced June 2014.

    Comments: Manuscript: 6 pages, 3 figures; Supplementary Information: 8 pages, 18 tables

  35. arXiv:1406.3855  [pdf, other

    physics.soc-ph cs.CL cs.SI

    Human language reveals a universal positivity bias

    Authors: Peter Sheridan Dodds, Eric M. Clark, Suma Desu, Morgan R. Frank, Andrew J. Reagan, Jake Ryland Williams, Lewis Mitchell, Kameron Decker Harris, Isabel M. Kloumann, James P. Bagrow, Karine Megerdoomian, Matthew T. McMahon, Brian F. Tivnan, Christopher M. Danforth

    Abstract: Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias i… ▽ More

    Submitted 15 June, 2014; originally announced June 2014.

    Comments: Manuscript: 7 pages, 4 figures; Supplementary Material: 49 pages, 43 figures, 6 tables. Online appendices available at http://www.uvm.edu/storylab/share/papers/dodds2014a/