Search | arXiv e-print repository

The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice

Authors: Alexander M. Petersen, Felber Arroyave, Fabio Pammolli

Abstract: Measuring the rate of innovation in academia and industry is fundamental to monitoring the efficiency and competitiveness of the knowledge economy. To this end, a disruption index (CD) was recently developed and applied to publication and patent citation networks (Wu et al., Nature 2019; Park et al., Nature 2023). Here we show that CD systematically decreases over time due to secular growth in res… ▽ More Measuring the rate of innovation in academia and industry is fundamental to monitoring the efficiency and competitiveness of the knowledge economy. To this end, a disruption index (CD) was recently developed and applied to publication and patent citation networks (Wu et al., Nature 2019; Park et al., Nature 2023). Here we show that CD systematically decreases over time due to secular growth in research and patent production, following two distinct mechanisms unrelated to innovation -- one behavioral and the other structural. Whereas the behavioral explanation reflects shifts associated with techno-social factors (e.g. self-citation practices), the structural explanation follows from `citation inflation' (CI), an inextricable feature of real citation networks attributable to increasing reference list lengths, which causes CD to systematically decrease. We demonstrate this causal link by way of mathematical deduction, computational simulation, multi-variate regression, and quasi-experimental comparison of the disruptiveness of PNAS versus PNAS Plus articles, which differ only in their lengths. Accordingly, we analyze CD data available in the SciSciNet database and find that disruptiveness incrementally increased from 2005-2015, and that the negative relationship between disruption and team-size is remarkably small in overall magnitude effect size, and shifts from negative to positive for team size $\geq$ 8 coauthors. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures; SI: 4 figures and 4 tables

arXiv:2306.01949 [pdf, other]

The disruption index is biased by citation inflation

Authors: Alexander M. Petersen, Felber Arroyave, Fabio Pammolli

Abstract: A recent analysis of scientific publication and patent citation networks by Park et al. (Nature, 2023) suggests that publications and patents are becoming less disruptive over time. Here we show that the reported decrease in disruptiveness is an artifact of systematic shifts in the structure of citation networks unrelated to innovation system capacity. Instead, the decline is attributable to 'cita… ▽ More A recent analysis of scientific publication and patent citation networks by Park et al. (Nature, 2023) suggests that publications and patents are becoming less disruptive over time. Here we show that the reported decrease in disruptiveness is an artifact of systematic shifts in the structure of citation networks unrelated to innovation system capacity. Instead, the decline is attributable to 'citation inflation', an unavoidable characteristic of real citation networks that manifests as a systematic time-dependent bias and renders cross-temporal analysis challenging. One driver of citation inflation is the ever-increasing lengths of reference lists over time, which in turn increases the density of links in citation networks, and causes the disruption index to converge to 0. A second driver is attributable to shifts in the construction of reference lists, which is increasingly impacted by self-citations that increase in the rate of triadic closure in citation networks, and thus confounds efforts to measure disruption, which is itself a measure of triadic closure. Combined, these two systematic shifts render the disruption index temporally biased, and unsuitable for cross-temporal analysis. The impact of this systematic bias further stymies efforts to correlate disruption to other measures that are also time-dependent, such as team size and citation counts. In order to demonstrate this fundamental measurement problem, we present three complementary lines of critique (deductive, empirical and computational modeling), and also make available an ensemble of synthetic citation networks that can be used to test alternative citation-based indices for systematic bias. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 10 pages, 6 figures

arXiv:2205.11632 [pdf, other]

doi 10.1142/S0219525921500168

Evolution of biomedical innovation quantified via billions of distinct article-level MeSH keyword combinations

Authors: Alexander M. Petersen

Abstract: We develop a systematic approach to measuring combinatorial innovation in the biomedical sciences based upon the comprehensive ontology of Medical Subject Headings (MeSH). This approach leverages an expert-defined knowledge ontology that features both breadth (27,875 MeSH analyzed across 25 million articles indexed by PubMed from 1902 onwards) and depth (we differentiate between Major and Minor Me… ▽ More We develop a systematic approach to measuring combinatorial innovation in the biomedical sciences based upon the comprehensive ontology of Medical Subject Headings (MeSH). This approach leverages an expert-defined knowledge ontology that features both breadth (27,875 MeSH analyzed across 25 million articles indexed by PubMed from 1902 onwards) and depth (we differentiate between Major and Minor MeSH terms to identify differences in the knowledge network representation constructed from primary research topics only). With this level of uniform resolution we differentiate between three different modes of innovation contributing to the combinatorial knowledge network: (i) conceptual innovation associated with the emergence of new concepts and entities (measured as the entry of new MeSH); and (ii) recombinant innovation, associated with the emergence of new combinations, which itself consists of two types: peripheral (i.e., combinations involving new knowledge) and core (combinations comprised of pre-existing knowledge only). Another relevant question we seek to address is whether examining triplet and quartet combinations, in addition to the more traditional dyadic or pairwise combinations, provide evidence of any new phenomena associated with higher-order combinations. Analysis of the size, growth, and coverage of combinatorial innovation yield results that are largely independent of the combination order, thereby suggesting that the common dyadic approach is sufficient to capture essential phenomena. Our main results are twofold: (a) despite the persistent addition of new MeSH terms, the network is densifying over time meaning that scholars are increasingly exploring and realizing the vast space of all knowledge combinations; and (b) conceptual innovation is increasingly concentrated within single research articles, a harbinger of the recent paradigm shift towards convergence science. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 11 pages, 4 figures

Journal ref: Advances in Complex Systems 24, 2150016 (2022)

arXiv:2103.11547 [pdf, other]

doi 10.1057/s41599-021-00869-9

Grand challenges and emergent modes of convergence science

Authors: Alexander M. Petersen, Mohammed E. Ahmed, Ioannis Pavlidis

Abstract: To address complex problems, scholars are increasingly faced with challenges of integrating diverse knowledge domains. We analyzed the evolution of this convergence paradigm in the broad ecosystem of brain science, which provides a real-time testbed for evaluating two modes of cross-domain integration - subject area exploration via expansive learning and cross-disciplinary collaboration among doma… ▽ More To address complex problems, scholars are increasingly faced with challenges of integrating diverse knowledge domains. We analyzed the evolution of this convergence paradigm in the broad ecosystem of brain science, which provides a real-time testbed for evaluating two modes of cross-domain integration - subject area exploration via expansive learning and cross-disciplinary collaboration among domain experts. We show that research involving both modes features a 16% citation premium relative to a mono-disciplinary baseline. Further comparison of research integrating neighboring versus distant research domains shows that the cross-disciplinary mode is essential for integrating across relatively large disciplinary distances. Yet we find research utilizing cross-domain subject area exploration alone - a convergence shortcut - to be growing in prevalence at roughly 3% per year, significantly faster than the alternative cross-disciplinary mode, despite being less effective at integrating domains and markedly less impactful. By measuring shifts in the prevalence and impact of different convergence modes in the 5-year intervals before and after 2013, our results indicate that these counterproductive patterns may relate to competitive pressures associated with global Human Brain flagship funding initiatives. Without additional policy guidance, such Grand Challenge flagships may unintentionally incentivize such convergence shortcuts, thereby undercutting the advantages of cross-disciplinary teams in tackling challenges calling on convergence. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 15 pages, 5 figures; Supplementary Information: 25 pages, 12 Figures and 5 Tables

Journal ref: Humanities and Social Sciences Communications, 8, 194 (2021)

arXiv:2103.10641 [pdf, other]

doi 10.1142/S0219525923500030

Biomedical Convergence Facilitated by the Emergence of Technological and Informatic Capabilities

Authors: Dong Yang, Ioannis Pavlidis, Alexander M. Petersen

Abstract: We analyzed Medical Subject Headings (MeSH) from 21.6 million research articles indexed by PubMed to map this vast space of entities and their relations, providing insights into the origins and future of biomedical convergence. Detailed analysis of MeSH co-occurrence networks identifies three robust knowledge clusters: the vast universe of microscopic biological entities and structures; systems, d… ▽ More We analyzed Medical Subject Headings (MeSH) from 21.6 million research articles indexed by PubMed to map this vast space of entities and their relations, providing insights into the origins and future of biomedical convergence. Detailed analysis of MeSH co-occurrence networks identifies three robust knowledge clusters: the vast universe of microscopic biological entities and structures; systems, disease and diagnostics; and emergent biological and social phenomena underlying the complex problems driving the health, behavioral and brain science frontiers. These domains integrated from the 1990s onward by way of technological and informatic capabilities that introduced highly controllable, scalable and permutable research processes and invaluable imaging techniques for illuminating fundamental structure-function-behavior questions. Article-level analysis confirms a positive relationship between team size and topical diversity, and shows convergence to be increasing in prominence but with recent saturation. Together, our results invite additional policy support for cross-disciplinary team assembly to harness transdisciplinary convergence. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: 12 pages, 4 figures; 8 pages of Supplementary Information

Journal ref: Advances in Complex Systems 26(1), 2023. Article 2350003

arXiv:1701.04906 [pdf, other]

doi 10.1016/j.joi.2019.100974

Quantifying the distribution of editorial power and manuscript decision bias at the mega-journal PLOS ONE

Authors: Alexander M. Petersen

Abstract: We analyzed the longitudinal activity of nearly 7,000 editors at the mega-journal PLOS ONE over the 10-year period 2006-2015. Using the article-editor associations, we develop editor-specific measures of power, activity, article acceptance time, citation impact, and editorial renumeration (an analogue to self-citation). We observe remarkably high levels of power inequality among the PLOS ONE edito… ▽ More We analyzed the longitudinal activity of nearly 7,000 editors at the mega-journal PLOS ONE over the 10-year period 2006-2015. Using the article-editor associations, we develop editor-specific measures of power, activity, article acceptance time, citation impact, and editorial renumeration (an analogue to self-citation). We observe remarkably high levels of power inequality among the PLOS ONE editors, with the top-10 editors responsible for 3,366 articles -- corresponding to 2.4% of the 141,986 articles we analyzed. Such high inequality levels suggest the presence of unintended incentives, which may reinforce unethical behavior in the form of decision-level biases at the editorial level. Our results indicate that editors may become apathetic in judging the quality of articles and susceptible to modes of power-driven misconduct. We used the longitudinal dimension of editor activity to develop two panel regression models which test and verify the presence of editor-level bias. In the first model we analyzed the citation impact of articles, and in the second model we modeled the decision time between an article being submitted and ultimately accepted by the editor. We focused on two variables that represent social factors that capture potential conflicts-of-interest: (i) we accounted for the social ties between editors and authors by develo** a measure of repeat authorship among an editor's article set, and (ii) we accounted for the rate of citations directed towards the editor's own publications in the reference list of each article he/she oversaw. Our results indicate that these two factors play a significant role in the editorial decision process. Moreover, these two effects appear to increase with editor age, which is consistent with behavioral studies concerning the evolution of misbehavior and response to temptation in power-driven environments. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Comments: 25 pages, 5 figures, 1 Table

Journal ref: Journal of Informetrics, 13, 100974 (2019)

arXiv:1607.05606 [pdf, other]

doi 10.1016/j.joi.2018.06.005

The Memory of Science: Inflation, Myopia, and the Knowledge Network

Authors: Raj K. Pan, Alexander M. Petersen, Fabio Pammolli, Santo Fortunato

Abstract: Science is a growing system, exhibiting ~4% annual growth in publications and ~1.8% annual growth in the number of references per publication. Combined these trends correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation n… ▽ More Science is a growing system, exhibiting ~4% annual growth in publications and ~1.8% annual growth in the number of references per publication. Combined these trends correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation network comprised of 837 million references produced by 32.6 million publications over the period 1965-2012, allowing for a temporal analysis of the `attention economy' in science. Unlike previous studies, we analyzed the entire probability distribution of reference ages - the time difference between a citing and cited paper - thereby capturing previously overlooked trends. Over this half-century period we observe a narrowing range of attention - both classic and recent literature are being cited increasingly less, pointing to the important role of socio-technical processes. To better understand the impact of exponential growth on the underlying knowledge network we develop a network-based model, featuring the redirection of scientific attention via publications' reference lists, and validate the model against several empirical benchmarks. We then use the model to test the causal impact of real paradigm shifts, thereby providing guidance for science policy analysis. In particular, we show how perturbations to the growth rate of scientific output affects the reference age distribution and the functionality of the vast science citation network as an aid for the search & retrieval of knowledge. In order to account for the inflation of science, our study points to the need for a systemic overhaul of the counting methods used to evaluate citation impact - especially in the case of evaluating science careers, which can span several decades and thus several doubling periods. △ Less

Submitted 19 July, 2016; originally announced July 2016.

Comments: 17 pages, 8 figures, Supplementary Material available at http://physics.bu.edu/~amp17/webpage_files/MyPapers/pppf_Arxiv_July2016_SI.pdf

Journal ref: Journal of Informetrics 12, 656-678 (2018)

arXiv:1512.07250 [pdf, other]

doi 10.1016/j.respol.2015.12.004

A Triple Helix Model of Medical Innovation: Supply, Demand, and Technological Capabilities in terms of Medical Subject Headings

Authors: Alexander M. Petersen, Daniele Rotolo, Loet Leydesdorff

Abstract: We develop a model of innovation that enables us to trace the interplay among three key dimensions of the innovation process: (i) demand of and (ii) supply for innovation, and (iii) technological capabilities available to generate innovation in the forms of products, processes, and services. Building on triple helix research, we use entropy statistics to elaborate an indicator of mutual informatio… ▽ More We develop a model of innovation that enables us to trace the interplay among three key dimensions of the innovation process: (i) demand of and (ii) supply for innovation, and (iii) technological capabilities available to generate innovation in the forms of products, processes, and services. Building on triple helix research, we use entropy statistics to elaborate an indicator of mutual information among these dimensions that can provide indication of reduction of uncertainty. To do so, we focus on the medical context, where uncertainty poses significant challenges to the governance of innovation. We use the Medical Subject Headings (MeSH) of MEDLINE/PubMed to identify publications classified within the categories "Diseases" (C), "Drugs and Chemicals" (D), "Analytic, Diagnostic, and Therapeutic Techniques and Equipment" (E) and use these as knowledge representations of demand, supply, and technological capabilities, respectively. Three case-studies of medical research areas are used as representative 'entry perspectives' of the medical innovation process. These are: (i) human papilloma virus, (ii) RNA interference, and (iii) magnetic resonance imaging. We find statistically significant periods of synergy among demand, supply, and technological capabilities (C-D-E) that point to three-dimensional interactions as a fundamental perspective for the understanding and governance of the uncertainty associated with medical innovation. Among the pairwise configurations in these contexts, the demand-technological capabilities (C-E) provided the strongest link, followed by the supply-demand (D-C) and the supply-technological capabilities (D-E) channels. △ Less

Submitted 4 January, 2016; v1 submitted 22 December, 2015; originally announced December 2015.

Comments: Accepted for publication in Research Policy (in press)

Journal ref: Research Policy 45(3), 666-681 (2016)

arXiv:1509.01804 [pdf, other]

doi 10.1073/pnas.1501444112

Quantifying the impact of weak, strong, and super ties in scientific careers

Authors: Alexander Michael Petersen

Abstract: Scientists are frequently faced with the important decision to start or terminate a creative partnership. This process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of scientific collaboration, we analyzed 473 collaboration profiles u… ▽ More Scientists are frequently faced with the important decision to start or terminate a creative partnership. This process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of scientific collaboration, we analyzed 473 collaboration profiles using an ego-centric perspective which accounts for researcher-specific characteristics and provides insight into a range of topics, from career achievement and sustainability to team dynamics and efficiency. From more than 166,000 collaboration records, we quantify the frequency distributions of collaboration duration and tie-strength, showing that collaboration networks are dominated by weak ties characterized by high turnover rates. We use analytic extreme-value thresholds to identify a new class of indispensable `super ties', the strongest of which commonly exhibit >50% publication overlap with the central scientist. The prevalence of super ties suggests that they arise from career strategies based upon cost, risk, and reward sharing and complementary skill matching. We then use a combination of descriptive and panel regression methods to compare the subset of publications coauthored with a super tie to the subset without one, controlling for pertinent features such as career age, prestige, team size, and prior group experience. We find that super ties contribute to above-average productivity and a 17% citation increase per publication, thus identifying these partnerships - the analog of life partners - as a major factor in science career development. △ Less

Submitted 6 September, 2015; originally announced September 2015.

Comments: 13 pages, 5 figures, 1 Table

Journal ref: Proceedings of the National Academy of Sciences 112, E4671-E4680 (2015)

arXiv:1411.5945 [pdf, other]

doi 10.1140/epjds/s13688-014-0024-y

Inequality and cumulative advantage in science careers: a case study of high-impact journals

Authors: Alexander M. Petersen, Orion Penner

Abstract: Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known abo… ▽ More Analyzing a large data set of publications drawn from the most competitive journals in the natural and social sciences we show that research careers exhibit the broad distributions of individual achievement characteristic of systems in which cumulative advantage plays a key role. While most researchers are personally aware of the competition implicit in the publication process, little is known about the levels of inequality at the level of individual researchers. We analyzed both productivity and impact measures for a large set of researchers publishing in high-impact journals. For each researcher cohort we calculated Gini inequality coefficients, with average Gini values around 0.48 for total publications and 0.73 for total citations. For perspective, these observed values are well in excess of the inequality levels observed for personal income in develo** countries. Investigating possible sources of this inequality, we identify two potential mechanisms that act at the level of the individual that may play defining roles in the emergence of the broad productivity and impact distributions found in science. First, we show that the average time interval between a researcher's successive publications in top journals decreases with each subsequent publication. Second, after controlling for the time dependent features of citation distributions, we compare the citation impact of subsequent publications within a researcher's publication record. We find that as researchers continue to publish in top journals, there is more likely to be a decreasing trend in the relative citation impact with each subsequent publication. This pattern highlights the difficulty of repeatedly publishing high-impact research and the intriguing possibility that confirmation bias plays a role in the evaluation of scientific careers. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 2-page summary of the long published version which is available open access here at http://www.epjdatascience.com/content/3/1/24

Journal ref: EPJ Data Science 3, 24 (2014)

arXiv:1404.0191 [pdf, other]

doi 10.1007/s11948-014-9562-8

A quantitative perspective on ethics in large team science

Authors: Alexander M. Petersen, Ioannis Pavlidis, Ioanna Semendeferi

Abstract: The gradual crowding out of singleton and small team science by large team endeavors is challenging key features of research culture. It is therefore important for the future of scientific practice to reflect upon the individual scientist's ethical responsibilities within teams. To facilitate this reflection we show labor force trends in the US revealing a skewed growth in academic ranks and incre… ▽ More The gradual crowding out of singleton and small team science by large team endeavors is challenging key features of research culture. It is therefore important for the future of scientific practice to reflect upon the individual scientist's ethical responsibilities within teams. To facilitate this reflection we show labor force trends in the US revealing a skewed growth in academic ranks and increased levels of competition for promotion within the system; we analyze teaming trends across disciplines and national borders demonstrating why it is becoming difficult to distribute credit and to avoid conflicts of interest; and we use more than a century of Nobel prize data to show how science is outgrowing its old institutions of singleton awards. Of particular concern within the large team environment is the weakening of the mentor-mentee relation, which undermines the cultivation of virtue ethics across scientific generations. These trends and emerging organizational complexities call for a universal set of behavioral norms that transcend team heterogeneity and hierarchy. To this end, our expository analysis provides a survey of ethical issues in team settings to inform science ethics education and science policy. △ Less

Submitted 2 July, 2014; v1 submitted 1 April, 2014; originally announced April 2014.

Comments: 13 pages, 5 figures, 1 table. Keywords: team ethics; team management; team evaluation; science of science

Journal ref: Science & Engineering Ethics 20, 923-945 (2014)

arXiv:1401.6157 [pdf, other]

doi 10.1140/epjds/s13688-014-0011-3

Exploiting citation networks for large-scale author name disambiguation

Authors: Christian Schulz, Amin Mazloumian, Alexander M Petersen, Orion Penner, Dirk Helbing

Abstract: We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared r… ▽ More We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios. △ Less

Submitted 9 December, 2014; v1 submitted 23 January, 2014; originally announced January 2014.

Comments: 14 pages, 5 figures

Journal ref: EPJ Data Science 2014, 3:11

arXiv:1308.5752 [pdf, other]

doi 10.1016/j.joi.2013.07.003

The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile

Authors: Alexander M. Petersen, Sauro Succi

Abstract: We present a simple generalization of Hirsch's h-index, Z = \sqrt{h^{2}+C}/\sqrt{5}, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^{2})/C to be distributed closely around the value 0.75, meaning that 75 per… ▽ More We present a simple generalization of Hirsch's h-index, Z = \sqrt{h^{2}+C}/\sqrt{5}, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^{2})/C to be distributed closely around the value 0.75, meaning that 75 percent of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of $Z$ and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists. △ Less

Submitted 27 August, 2013; originally announced August 2013.

Comments: 9 pages, 5 figures

Journal ref: J. of Informetrics 7, 823-832 (2013)

arXiv:1306.0114 [pdf, other]

doi 10.1038/srep03052

On the Predictability of Future Impact in Science

Authors: Orion Penner, Raj Kumar Pan, Alexander M. Petersen, Kimmo Kaski, Santo Fortunato

Abstract: Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By… ▽ More Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their "predictive power". Moreover, the predictive power of these models depend heavily upon scientists' career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions. △ Less

Submitted 29 October, 2013; v1 submitted 1 June, 2013; originally announced June 2013.

Comments: Published version, 8 pages, 5 figures + Appendix

Journal ref: Scientific Reports 3, 3052 (2013)

arXiv:1304.0627 [pdf, other]

doi 10.1063/PT.3.1928

The case for caution in predicting scientists' future impact

Authors: Orion Penner, Raj K. Pan, Alexander M. Petersen, Santo Fortunato

Abstract: We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career mea… ▽ More We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career measures taken at career age t. Here we investigate how the "predictability" depends on the aggregation of career data across multiple age cohorts. We confirm that the Acuna model does a respectable job of predicting h(t+Δt) up to roughly 6 years into the future when aggregating all age cohorts together. However, when calculated using subsets of specific age cohorts (e.g. using data for only t=3), we find that the model's predictive power significantly decreases, especially when applied to early career years. For young careers, the model does a much worse job of predicting future impact, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied. △ Less

Submitted 2 April, 2013; originally announced April 2013.

Comments: 2 pages, 1 figure

Journal ref: Physics Today 66, 8-9 (2013)

arXiv:1303.7274 [pdf, other]

doi 10.1073/pnas.1323111111

Reputation and Impact in Academic Careers

Authors: Alexander M. Petersen, Santo Fortunato, Raj K. Pan, Kimmo Kaski, Orion Penner, Armando Rungi, Massimo Riccaboni, H. Eugene Stanley, Fabio Pammolli

Abstract: Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original… ▽ More Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate $Δc$ depends on the reputation of its central author $i$, in addition to its net citation count $c$. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations $C_{i}$ of each scientist as his/her reputation measure. We find a citation crossover $c_{\times}$ which distinguishes the strength of the reputation effect. For publications with $c < c_{\times}$, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in $C_{i}$. However, the reputation effect becomes negligible for highly cited publications meaning that for $c\geq c_{\times}$ the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science. △ Less

Submitted 7 October, 2014; v1 submitted 28 March, 2013; originally announced March 2013.

Comments: Final published version of the main manuscript including additional analysis: 9 pages, 4 figures, 1 table, and full reference list, including those in the Supplementary Information. For the SI Appendix, see http://physics.bu.edu/~amp17/webpage_files/MyPapers/Reputation_SI.pdf

Journal ref: Proceedings of the National Academy of Sciences 111, 15316-15321 (2014)

arXiv:1302.3126 [pdf, other]

doi 10.1126/science.1227970

Is Europe Evolving Toward an Integrated Research Area?

Authors: Alessandro Chessa, Andrea Morescalchi, Fabio Pammolli, Orion Penner, Alexander M. Petersen, Massimo Riccaboni

Abstract: An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 al… ▽ More An integrated European Research Area (ERA) is a critical component for a more competitive and open European R&D system. However, the impact of EU-specific integration policies aimed at overcoming innovation barriers associated with national borders is not well understood. Here we analyze 2.4 x 10^6 patent applications filed with the European Patent Office (EPO) over the 25-year period 1986-2010 along with a sample of 2.6 x 10^5 records from the ISI Web of Science to quantitatively measure the role of borders in international R&D collaboration and mobility. From these data we construct five different networks for each year analyzed: (i) the patent co-inventor network, (ii) the publication co-author network, (iii) the co-applicant patent network, (iv) the patent citation network, and (v) the patent mobility network. We use methods from network science and econometrics to perform a comparative analysis across time and between EU and non-EU countries to determine the "treatment effect" resulting from EU integration policies. Using non-EU countries as a control set, we provide quantitative evidence that, despite decades of efforts to build a European Research Area, there has been little integration above global trends in patenting and publication. This analysis provides concrete evidence that Europe remains a collection of national innovation systems. △ Less

Submitted 13 February, 2013; originally announced February 2013.

Comments: 24 pages, 4 figures, 2 tables

Journal ref: Science 339, 650-651 (2013)

arXiv:1212.2616 [pdf, other]

doi 10.1038/srep00943

Languages cool as they expand: Allometric scaling and the decreasing need for new words

Authors: Alexander M. Petersen, Joel N. Tenenbaum, Shlomo Havlin, H. Eugene Stanley, Matjaz Perc

Abstract: We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size,… ▽ More We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This "cooling pattern" forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature. △ Less

Submitted 11 December, 2012; originally announced December 2012.

Comments: 9 two-column pages, 7 figures; accepted for publication in Scientific Reports

Journal ref: Sci. Rep. 2 (2012) 943

arXiv:1204.0752 [pdf, other]

doi 10.1073/pnas.1121429109

Persistence and Uncertainty in the Academic Career

Authors: Alexander M. Petersen, Massimo Riccaboni, H. Eugene Stanley, Fabio Pammolli

Abstract: Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative representation of how careers evolve over time. Since knowledge spillovers, cumulative advantage, competition, and collaboration are distinctive features of the academic profession, both the employment relationship and the procedures for assigning recognition and alloc… ▽ More Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative representation of how careers evolve over time. Since knowledge spillovers, cumulative advantage, competition, and collaboration are distinctive features of the academic profession, both the employment relationship and the procedures for assigning recognition and allocating funding should be designed to account for these factors. We study the annual production n_{i}(t) of a given scientist i by analyzing longitudinal career data for 200 leading scientists and 100 assistant professors from the physics community. We compare our results with 21,156 sports careers. Our empirical analysis of individual productivity dynamics shows that (i) there are increasing returns for the top individuals within the competitive cohort, and that (ii) the distribution of production growth is a leptokurtic "tent-shaped" distribution that is remarkably symmetric. Our methodology is general, and we speculate that similar features appear in other disciplines where academic publication is essential and collaboration is a key feature. We introduce a model of proportional growth which reproduces these two observations, and additionally accounts for the significantly right-skewed distributions of career longevity and achievement in science. Using this theoretical model, we show that short-term contracts can amplify the effects of competition and uncertainty making careers more vulnerable to early termination, not necessarily due to lack of individual talent and persistence, but because of random negative production shocks. We show that fluctuations in scientific production are quantitatively related to a scientist's collaboration radius and team efficiency. △ Less

Submitted 3 April, 2012; originally announced April 2012.

Comments: 29 pages total: 8 main manuscript + 4 figs, 21 SI text + figs

Journal ref: Proceedings of the National Academy of Sciences USA 109, 5213 - 5218 (2012)

arXiv:1107.3707 [pdf, other]

doi 10.1038/srep00313

Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death

Authors: Alexander M. Petersen, Joel Tenenbaum, Shlomo Havlin, H. Eugene Stanley

Abstract: We analyze the dynamic properties of 10^7 words recorded in English, Spanish and Hebrew over the period 1800--2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift i… ▽ More We analyze the dynamic properties of 10^7 words recorded in English, Spanish and Hebrew over the period 1800--2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes in the dynamics of language during periods of war shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors. We quantify cultural memory by analyzing the long-term correlations in the use of individual words using detrended fluctuation analysis. △ Less

Submitted 15 February, 2012; v1 submitted 19 July, 2011; originally announced July 2011.

Comments: Version 1: 31 pages, 17 figures, 3 tables. Version 2 is streamlined, eliminates substantial material and incorporates referee comments: 19 pages, 14 figures, 3 tables

Journal ref: Scientific Reports 2, 313 (2012)

arXiv:1103.2719 [pdf, other]

doi 10.1038/srep00181

Statistical regularities in the rank-citation profile of scientists

Authors: Alexander M. Petersen, H. Eugene Stanley, Sauro Succi

Abstract: Recent "science of science" research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate scientific production and impact of individual careers using the rank-citation profile c_{i}… ▽ More Recent "science of science" research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate scientific production and impact of individual careers using the rank-citation profile c_{i}(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c_{i}(r) to a common distribution function that is parameterized by two scaling exponents. Since two scientists with equivalent Hirsch h-index can have significantly different c_{i}(r) profiles, our results demonstrate the utility of the β_{i} scaling parameter in conjunction with h_{i} for quantifying individual publication impact. We show that the total number of citations C_{i} tallied from a scientist's N_{i} papers scales as C_{i} \sim h_{i}^{1+β_{i}}. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress. △ Less

Submitted 28 September, 2011; v1 submitted 14 March, 2011; originally announced March 2011.

Comments: 29 pages, 17 figures, 6 tables, Submitted; Modifications in V2 in response to referee comments

Journal ref: Scientific Reports 1, 181 (2011)

Showing 1–21 of 21 results for author: Petersen, A M