-
Identifying Geographic Clusters: A Network Analytic Approach
Authors:
Roberto Catini,
Dmytro Karamshuk,
Orion Penner,
Massimo Riccaboni
Abstract:
In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invar…
▽ More
In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invariant and it does not allow one to differentiate between different activities. Our goal in this paper is to present a new methodology for identifying clusters, that can be applied to different empirical settings. We use a graph approach based on k-shell decomposition to analyze world biomedical research clusters based on PubMed scientific publications. We identify research institutions and locate their activities in geographical clusters. Leading areas of scientific production and their top performing research institutions are consistently identified at different geographic scales.
△ Less
Submitted 18 May, 2015;
originally announced May 2015.
-
Inequality and cumulative advantage in science careers: a case study of high-impact journals
Authors:
Alexander M. Petersen,
Orion Penner
Abstract:
Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known abo…
▽ More
Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. We analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in develo** countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researcher's successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researcher's publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly publishing high-impact research and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers.
△ Less
Submitted 21 November, 2014;
originally announced November 2014.
-
Exploiting citation networks for large-scale author name disambiguation
Authors:
Christian Schulz,
Amin Mazloumian,
Alexander M Petersen,
Orion Penner,
Dirk Helbing
Abstract:
We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared r…
▽ More
We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.
△ Less
Submitted 9 December, 2014; v1 submitted 23 January, 2014;
originally announced January 2014.
-
On the Predictability of Future Impact in Science
Authors:
Orion Penner,
Raj Kumar Pan,
Alexander M. Petersen,
Kimmo Kaski,
Santo Fortunato
Abstract:
Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By…
▽ More
Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their "predictive power". Moreover, the predictive power of these models depend heavily upon scientists' career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions.
△ Less
Submitted 29 October, 2013; v1 submitted 1 June, 2013;
originally announced June 2013.
-
The case for caution in predicting scientists' future impact
Authors:
Orion Penner,
Raj K. Pan,
Alexander M. Petersen,
Santo Fortunato
Abstract:
We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career mea…
▽ More
We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career measures taken at career age t. Here we investigate how the "predictability" depends on the aggregation of career data across multiple age cohorts. We confirm that the Acuna model does a respectable job of predicting h(t+Δt) up to roughly 6 years into the future when aggregating all age cohorts together. However, when calculated using subsets of specific age cohorts (e.g. using data for only t=3), we find that the model's predictive power significantly decreases, especially when applied to early career years. For young careers, the model does a much worse job of predicting future impact, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied.
△ Less
Submitted 2 April, 2013;
originally announced April 2013.
-
Reputation and Impact in Academic Careers
Authors:
Alexander M. Petersen,
Santo Fortunato,
Raj K. Pan,
Kimmo Kaski,
Orion Penner,
Armando Rungi,
Massimo Riccaboni,
H. Eugene Stanley,
Fabio Pammolli
Abstract:
Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original…
▽ More
Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate $Δc$ depends on the reputation of its central author $i$, in addition to its net citation count $c$. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations $C_{i}$ of each scientist as his/her reputation measure. We find a citation crossover $c_{\times}$ which distinguishes the strength of the reputation effect. For publications with $c < c_{\times}$, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in $C_{i}$. However, the reputation effect becomes negligible for highly cited publications meaning that for $c\geq c_{\times}$ the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.
△ Less
Submitted 7 October, 2014; v1 submitted 28 March, 2013;
originally announced March 2013.
-
Is Europe Evolving Toward an Integrated Research Area?
Authors:
Alessandro Chessa,
Andrea Morescalchi,
Fabio Pammolli,
Orion Penner,
Alexander M. Petersen,
Massimo Riccaboni
Abstract:
An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 al…
▽ More
An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 along with a sample of 2.6 x 10^5 records from the ISI Web of Science to quantitatively measure the role of borders in international R&D collaboration and mobility. From these data we construct five different networks for each year analyzed: (i) the patent co-inventor network, (ii) the publication co-author network, (iii) the co-applicant patent network, (iv) the patent citation network, and (v) the patent mobility network. We use methods from network science and econometrics to perform a comparative analysis across time and between EU and non-EU countries to determine the "treatment effect" resulting from EU integration policies. Using non-EU countries as a control set, we provide quantitative evidence that, despite decades of efforts to build a European Research Area, there has been little integration above global trends in patenting and publication. This analysis provides concrete evidence that Europe remains a collection of national innovation systems.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.