Gintropic Scaling of Scientometric Indexes
Authors:
Tamás Biró,
András Telcs,
Mété Józsa,
Zoltán Néda
Abstract:
The most frequently used indicators for the productivity and impact of scientists are the total number of publication ($N_{pub}$), total number of citations ($N_{cit}$) and the Hirsch (h) index. Since the seminal paper of Hirsch, in 2005, it is largely debated whether the h index can be considered as an indicator independent of $N_{pub}$ and $N_{cit}$. Exploiting the Paretian form for the distribu…
▽ More
The most frequently used indicators for the productivity and impact of scientists are the total number of publication ($N_{pub}$), total number of citations ($N_{cit}$) and the Hirsch (h) index. Since the seminal paper of Hirsch, in 2005, it is largely debated whether the h index can be considered as an indicator independent of $N_{pub}$ and $N_{cit}$. Exploiting the Paretian form for the distribution of citations for the papers authored by a researcher, here we discuss scaling relations between h, $N_{pub}$ and $N_{cit}$. The analysis incorporates the Gini index as an inequality measure of citation distributions and a recently proposed inequality kernel, gintropy (resembling to the entropy kernel). We find a new upper bound for the h value as a function of the total number of citations, confimed on massive data collected from Google Scholar. Our analyses reveals also that the individualized Gini index calculated for the citations received by the publications of an author peaks around 0.8, a value much higher than the one characteristic for the usual socio-economic inequalities.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
Manifold-adaptive dimension estimation revisited
Authors:
Zsigmond Benkő,
Marcell Stip**er,
Roberta Rehus,
Attila Bencze,
Dániel Fabó,
Boglárka Hajnal,
Loránd Erőss,
András Telcs,
Zoltán Somogyvári
Abstract:
Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold-adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the l…
▽ More
Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold-adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected-median-FSA estimator with kNN estimators: maximum likelihood (ML, Levina-Bickel) and two implementations of DANCo (R and matlab). We show that corrected-median-FSA estimator beats the ML estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones.
△ Less
Submitted 10 August, 2020; v1 submitted 7 August, 2020;
originally announced August 2020.