-
An NLP approach to quantify dynamic salience of predefined topics in a text corpus
Authors:
A. Bock,
A. Palladino,
S. Smith-Heisters,
I. Boardman,
E. Pellegrini,
E. J. Bienenstock,
A. Valenti
Abstract:
The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of news reports documenting significant events, trends, and responses provides a more democratized picture of the social characteristics of a location,…
▽ More
The proliferation of news media available online simultaneously presents a valuable resource and significant challenge to analysts aiming to profile and understand social and cultural trends in a geographic location of interest. While an abundance of news reports documenting significant events, trends, and responses provides a more democratized picture of the social characteristics of a location, making sense of an entire corpus to extract significant trends is a steep challenge for any one analyst or team. Here, we present an approach using natural language processing techniques that seeks to quantify how a set of pre-defined topics of interest change over time across a large corpus of text. We found that, given a predefined topic, we can identify and rank sets of terms, or n-grams, that map to those topics and have usage patterns that deviate from a normal baseline. Emergence, disappearance, or significant variations in n-gram usage present a ground-up picture of a topic's dynamic salience within a corpus of interest.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Information Fusion to Estimate Resilience of Dense Urban Neighborhoods
Authors:
Anthony Palladino,
Elisa J. Bienenstock,
Bradley M. West,
Jake R. Nelson,
Tony H. Grubesic
Abstract:
Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data s…
▽ More
Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data sources to estimate resilience indicators from social science theory and leverage a structured ontology for factor combinations to enhance explainability. Estimates of destabilizing areas can improve the decision-making capabilities of civil governments by identifying critical areas needing increased social services.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
Estimating the entropy of binary time series: Methodology, some theory and a simulation study
Authors:
Y. Gao,
I. Kontoyiannis,
E. Bienenstock
Abstract:
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal e…
▽ More
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency.
△ Less
Submitted 29 February, 2008;
originally announced February 2008.
-
From the entropy to the statistical structure of spike trains
Authors:
Yun Gao,
Ioannis Kontoyiannis,
Elie Bienenstock
Abstract:
We use statistical estimates of the entropy rate of spike train data in order to make inferences about the underlying structure of the spike train itself. We first examine a number of different parametric and nonparametric estimators (some known and some new), including the ``plug-in'' method, several versions of Lempel-Ziv-based compression algorithms, a maximum likelihood estimator tailored to…
▽ More
We use statistical estimates of the entropy rate of spike train data in order to make inferences about the underlying structure of the spike train itself. We first examine a number of different parametric and nonparametric estimators (some known and some new), including the ``plug-in'' method, several versions of Lempel-Ziv-based compression algorithms, a maximum likelihood estimator tailored to renewal processes, and the natural estimator derived from the Context-Tree Weighting method (CTW). The theoretical properties of these estimators are examined, several new theoretical results are developed, and all estimators are systematically applied to various types of synthetic data and under different conditions.
Our main focus is on the performance of these entropy estimators on the (binary) spike trains of 28 neurons recorded simultaneously for a one-hour period from the primary motor and dorsal premotor cortices of a monkey. We show how the entropy estimates can be used to test for the existence of long-term structure in the data, and we construct a hypothesis test for whether the renewal process model is appropriate for these spike trains. Further, by applying the CTW algorithm we derive the maximum a posterior (MAP) tree model of our empirical data, and comment on the underlying structure it reveals.
△ Less
Submitted 27 March, 2008; v1 submitted 22 October, 2007;
originally announced October 2007.
-
Covariance Plasticity and Regulated Criticality
Authors:
Elie Bienenstock,
Daniel Lehmann
Abstract:
We propose that a regulation mechanism based on Hebbian covariance plasticity may cause the brain to operate near criticality. We analyze the effect of such a regulation on the dynamics of a network with excitatory and inhibitory neurons and uniform connectivity within and across the two populations. We show that, under broad conditions, the system converges to a critical state lying at the comm…
▽ More
We propose that a regulation mechanism based on Hebbian covariance plasticity may cause the brain to operate near criticality. We analyze the effect of such a regulation on the dynamics of a network with excitatory and inhibitory neurons and uniform connectivity within and across the two populations. We show that, under broad conditions, the system converges to a critical state lying at the common boundary of three regions in parameter space; these correspond to three modes of behavior: high activity, low activity, oscillation.
△ Less
Submitted 20 February, 2002;
originally announced February 2002.