Search | arXiv e-print repository

An NLP approach to quantify dynamic salience of predefined topics in a text corpus

Authors: A. Bock, A. Palladino, S. Smith-Heisters, I. Boardman, E. Pellegrini, E. J. Bienenstock, A. Valenti

Abstract: The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of news reports documenting significant events, trends, and responses provides a more democratized picture of the social characteristics of a location,… ▽ More The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of news reports documenting significant events, trends, and responses provides a more democratized picture of the social characteristics of a location, making sense of an entire corpus to extract significant trends is a steep challenge for any one analyst or team. Here, we present an approach using natural language processing techniques that seeks to quantify how a set of pre-defined topics of interest change over time across a large corpus of text. We found that, given a predefined topic, we can identify and rank sets of terms, or n-grams, that map to those topics and have usage patterns that deviate from a normal baseline. Emergence, disappearance, or significant variations in n-gram usage present a ground-up picture of a topic's dynamic salience within a corpus of interest. △ Less

Submitted 16 August, 2021; originally announced August 2021.

Comments: This paper was presented at the 2021 International Conference on Social Computing, Behavioral-Cultural Modeling Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), 9 July 2021

arXiv:1903.11478 [pdf]

Information Fusion to Estimate Resilience of Dense Urban Neighborhoods

Authors: Anthony Palladino, Elisa J. Bienenstock, Bradley M. West, Jake R. Nelson, Tony H. Grubesic

Abstract: Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data s… ▽ More Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data sources to estimate resilience indicators from social science theory and leverage a structured ontology for factor combinations to enhance explainability. Estimates of destabilizing areas can improve the decision-making capabilities of civil governments by identifying critical areas needing increased social services. △ Less

Submitted 27 March, 2019; originally announced March 2019.

arXiv:0802.4363 [pdf, ps, other]

doi 10.3390/entropy-e10020071

Estimating the entropy of binary time series: Methodology, some theory and a simulation study

Authors: Y. Gao, I. Kontoyiannis, E. Bienenstock

Abstract: Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal e… ▽ More Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator. **Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency. △ Less

Submitted 29 February, 2008; originally announced February 2008.

Comments: 34 pages, 3 figures

arXiv:0710.4117 [pdf, ps, other]

From the entropy to the statistical structure of spike trains

Authors: Yun Gao, Ioannis Kontoyiannis, Elie Bienenstock

Abstract: We use statistical estimates of the entropy rate of spike train data in order to make inferences about the underlying structure of the spike train itself. We first examine a number of different parametric and nonparametric estimators (some known and some new), including the ``plug-in'' method, several versions of Lempel-Ziv-based compression algorithms, a maximum likelihood estimator tailored to… ▽ More We use statistical estimates of the entropy rate of spike train data in order to make inferences about the underlying structure of the spike train itself. We first examine a number of different parametric and nonparametric estimators (some known and some new), including the ``plug-in'' method, several versions of Lempel-Ziv-based compression algorithms, a maximum likelihood estimator tailored to renewal processes, and the natural estimator derived from the Context-Tree Weighting method (CTW). The theoretical properties of these estimators are examined, several new theoretical results are developed, and all estimators are systematically applied to various types of synthetic data and under different conditions. Our main focus is on the performance of these entropy estimators on the (binary) spike trains of 28 neurons recorded simultaneously for a one-hour period from the primary motor and dorsal premotor cortices of a monkey. We show how the entropy estimates can be used to test for the existence of long-term structure in the data, and we construct a hypothesis test for whether the renewal process model is appropriate for these spike trains. Further, by applying the CTW algorithm we derive the maximum a posterior (MAP) tree model of our empirical data, and comment on the underlying structure it reveals. △ Less

Submitted 27 March, 2008; v1 submitted 22 October, 2007; originally announced October 2007.

Journal ref: In Proceedings of the 2006 International Symposium on Information Theory, Seattle, WA, July 2006

arXiv:cs/0202034 [pdf, ps, other]

Covariance Plasticity and Regulated Criticality

Authors: Elie Bienenstock, Daniel Lehmann

Abstract: We propose that a regulation mechanism based on Hebbian covariance plasticity may cause the brain to operate near criticality. We analyze the effect of such a regulation on the dynamics of a network with excitatory and inhibitory neurons and uniform connectivity within and across the two populations. We show that, under broad conditions, the system converges to a critical state lying at the comm… ▽ More We propose that a regulation mechanism based on Hebbian covariance plasticity may cause the brain to operate near criticality. We analyze the effect of such a regulation on the dynamics of a network with excitatory and inhibitory neurons and uniform connectivity within and across the two populations. We show that, under broad conditions, the system converges to a critical state lying at the common boundary of three regions in parameter space; these correspond to three modes of behavior: high activity, low activity, oscillation. △ Less

Submitted 20 February, 2002; originally announced February 2002.

Comments: 35 pages, 8 figures

Report number: Center for Neural Computation, Hebrew University, Jerusalem TR-95-1 ACM Class: I.2.6

Journal ref: Advances in Complex Systems, 1(4) (1998) pp. 361-384

Showing 1–5 of 5 results for author: Bienenstock, E