-
Statistical signatures of abstraction in deep neural networks
Authors:
Carlo Orientale Caputo,
Matteo Marsili
Abstract:
We study how abstract representations emerge in a Deep Belief Network (DBN) trained on benchmark datasets. Our analysis targets the principles of learning in the early stages of information processing, starting from the "primordial soup" of the under-sampling regime. As the data is processed by deeper and deeper layers, features are detected and removed, transferring more and more "context-invaria…
▽ More
We study how abstract representations emerge in a Deep Belief Network (DBN) trained on benchmark datasets. Our analysis targets the principles of learning in the early stages of information processing, starting from the "primordial soup" of the under-sampling regime. As the data is processed by deeper and deeper layers, features are detected and removed, transferring more and more "context-invariant" information to deeper layers. We show that the representation approaches an universal model -- the Hierarchical Feature Model (HFM) -- determined by the principle of maximal relevance. Relevance quantifies the uncertainty on the model of the data, thus suggesting that "meaning" -- i.e. syntactic information -- is that part of the data which is not yet captured by a model. Our analysis shows that shallow layers are well described by pairwise Ising models, which provide a representation of the data in terms of generic, low order features. We also show that plasticity increases with depth, in a similar way as it does in the brain. These findings suggest that DBNs are capable of extracting a hierarchy of features from the data which is consistent with the principle of maximal relevance.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Is stochastic thermodynamics the key to understanding the energy costs of computation?
Authors:
David Wolpert,
Jan Korbel,
Christopher Lynn,
Farita Tasnim,
Joshua Grochow,
Gülce Kardeş,
James Aimone,
Vijay Balasubramanian,
Eric de Giuli,
David Doty,
Nahuel Freitas,
Matteo Marsili,
Thomas E. Ouldridge,
Andrea Richa,
Paul Riechers,
Édgar Roldán,
Brenda Rubenstein,
Zoltan Toroczkai,
Joseph Paradiso
Abstract:
The relationship between the thermodynamic and computational characteristics of dynamical physical systems has been a major theoretical interest since at least the 19th century, and has been of increasing practical importance as the energetic cost of digital devices has exploded over the last half century. One of the most important thermodynamic features of real-world computers is that they operat…
▽ More
The relationship between the thermodynamic and computational characteristics of dynamical physical systems has been a major theoretical interest since at least the 19th century, and has been of increasing practical importance as the energetic cost of digital devices has exploded over the last half century. One of the most important thermodynamic features of real-world computers is that they operate very far from thermal equilibrium, in finite time, with many quickly (co-)evolving degrees of freedom. Such computers also must almost always obey multiple physical constraints on how they work. For example, all modern digital computers are periodic processes, governed by a global clock. Another example is that many computers are modular, hierarchical systems, with strong restrictions on the connectivity of their subsystems. This properties hold both for naturally occurring computers, like brains or Eukaryotic cells, as well as digital systems. These features of real-world computers are absent in 20th century analyses of the thermodynamics of computational processes, which focused on quasi-statically slow processes. However, the field of stochastic thermodynamics has been developed in the last few decades - and it provides the formal tools for analyzing systems that have exactly these features of real-world computers. We argue here that these tools, together with other tools currently being developed in stochastic thermodynamics, may help us understand at a far deeper level just how the fundamental physical properties of dynamic systems are related to the computation that they perform.
△ Less
Submitted 30 November, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Application of spin glass ideas in social sciences, economics and finance
Authors:
Jean-Philippe Bouchaud,
Matteo Marsili,
Jean-Pierre Nadal
Abstract:
Classical economics has developed an arsenal of methods, based on the idea of representative agents, to come up with precise numbers for next year's GDP, inflation and exchange rates, among (many) other things. Few, however, will disagree with the fact that the economy is a complex system, with a large number of strongly heterogeneous, interacting units of different types (firms, banks, households…
▽ More
Classical economics has developed an arsenal of methods, based on the idea of representative agents, to come up with precise numbers for next year's GDP, inflation and exchange rates, among (many) other things. Few, however, will disagree with the fact that the economy is a complex system, with a large number of strongly heterogeneous, interacting units of different types (firms, banks, households, public institutions) and different sizes.
Now, the main issue in economics is precisely the emergent organization, cooperation and coordination of such a motley crowd of micro-units. Treating them as a unique ``representative'' firm or household clearly risks throwing the baby with the bathwater. As we have learnt from statistical physics, understanding and characterizing such emergent properties can be difficult. Because of feedback loops of different signs, heterogeneities and non-linearities, the macro-properties are often hard to anticipate. In particular, these situations generically lead to a very large number of possible equilibria, or even the lack thereof.
Spin-glasses and other disordered systems give a concrete example of such difficulties. In order to tackle these complex situations, new theoretical and numerical tools have been invented in the last 50 years, including of course the replica method and replica symmetry breaking, and the cavity method, both static and dynamic. In this chapter we review the application of such ideas and methods in economics and social sciences. Of particular interest are the proliferation (and fragility) of equilibria, the analogue of satisfiability phase transitions in games and random economies, and condensation (or concentration) effects in opinion, wealth, etc
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Defects drive the tribocharging strength of PTFE
Authors:
A. Ciniero,
G. Fatti,
M. Marsili,
D. Dini,
M. C. Righi
Abstract:
If polytetrafluoroethylene (PTFE), commonly known as Teflon, is put into contact and rubbed against another material, almost surely it will be more effective than its counterpart in collecting negative charges. This simple, basic property is captured by the so called triboelectric series, where PTFE ranks extremely high, and that qualitatively orders materials in terms of their ability to electros…
▽ More
If polytetrafluoroethylene (PTFE), commonly known as Teflon, is put into contact and rubbed against another material, almost surely it will be more effective than its counterpart in collecting negative charges. This simple, basic property is captured by the so called triboelectric series, where PTFE ranks extremely high, and that qualitatively orders materials in terms of their ability to electrostatically charge upon contact and rubbing. However, while classifying materials, the series does not provide an explanation of their triboelectric strength, besides a loose correlation with the workfunction. Indeed, despite being an extremely familiar process, known from centuries, tribocharging is still elusive and not fully understood. In this work we employ density functional theory to look for the origin of PTFE tribocharging strength. We study how charge transfers when pristine or defective PTFE is put in contact with different clean and oxidised metals. Our results show the important role played by defects in enhancing charge transfer. Interestingly and unexpectedly our results show that negatively charged chains are more stable than neutral ones, if slightly bent. Indeed deformations can be easily promoted in polymers as PTFE, especially in tribological contacts. These results suggest that, in designing materials in view of their triboelectric properties, the characteristics of their defects could be a performance determining factor.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Multiscale Relevance of Natural Images
Authors:
Samy Lakhal,
Alexandre Darmon,
Iacopo Mastromatteo,
Matteo Marsili,
Michael Benzaquen
Abstract:
We use an agnostic information-theoretic approach to investigate the statistical properties of natural images. We introduce the Multiscale Relevance (MSR) measure to assess the robustness of images to compression at all scales. Starting in a controlled environment, we characterize the MSR of synthetic random textures as function of image roughness H and other relevant parameters. We then extend th…
▽ More
We use an agnostic information-theoretic approach to investigate the statistical properties of natural images. We introduce the Multiscale Relevance (MSR) measure to assess the robustness of images to compression at all scales. Starting in a controlled environment, we characterize the MSR of synthetic random textures as function of image roughness H and other relevant parameters. We then extend the analysis to natural images and find striking similarities with critical (H = 0) random textures. We show that the MSR is more robust and informative of image content than classical methods such as power spectrum analysis. Finally, we confront the MSR to classical measures for the calibration of common procedures such as color map** and denoising. Overall, the MSR approach appears to be a good candidate for advanced image analysis and image processing, while providing a good level of physical interpretability.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
A Bayesian theory of market impact
Authors:
Louis Saddier,
Matteo Marsili
Abstract:
The available liquidity at any time in financial markets falls largely short of the typical size of the orders that institutional investors would trade. In order to reduce the impact on prices due to the execution of large orders, traders in financial markets split large orders into a series of smaller ones, which are executed sequentially. The resulting sequence of trades is called a meta-order.…
▽ More
The available liquidity at any time in financial markets falls largely short of the typical size of the orders that institutional investors would trade. In order to reduce the impact on prices due to the execution of large orders, traders in financial markets split large orders into a series of smaller ones, which are executed sequentially. The resulting sequence of trades is called a meta-order. Empirical studies have revealed a non-trivial set of statistical laws on how meta-orders affect prices, which include i) the square-root behaviour of the expected price variation with the total volume traded, ii) its crossover to a linear regime for small volumes, and iii) a reversion of average prices towards its initial value, after the sequence of trades is over. Here we recover this phenomenology within a minimal theoretical framework where the market sets prices by incorporating all information on the direction and speed of trade of the meta-order in a Bayesian manner. The simplicity of this derivation lends further support to the robustness and universality of market impact laws. In particular, it suggests that the square-root impact law originates from the over-estimation of order flows originating from meta-orders.
△ Less
Submitted 21 May, 2024; v1 submitted 15 March, 2023;
originally announced March 2023.
-
arXiv:2211.08241
[pdf]
physics.optics
cond-mat.mes-hall
cond-mat.mtrl-sci
physics.app-ph
quant-ph
Advances in ultrafast plasmonics
Authors:
Alemayehu Nana Koya,
Marco Romanelli,
Joel Kuttruff,
Nils Henriksson,
Andrei Stefancu,
Gustavo Grinblat,
Aitor De Andres,
Fritz Schnur,
Mirko Vanzan,
Margherita Marsili,
Mahfujur Rahaman,
Alba Viejo Rodríguez,
Tilaike Tapani,
Haifeng Lin,
Bereket Dalga Dana,
**gquan Lin,
Grégory Barbillon,
Remo Proietti Zaccaria,
Daniele Brida,
Deep Jariwala,
László Veisz,
Emiliano Cortes,
Stefano Corni,
Denis Garoli,
Nicolò Maccaferri
Abstract:
In the past twenty years, we have reached a broad understanding of many light-driven phenomena in nanoscale systems. The temporal dynamics of the excited states are instead quite challenging to explore, and, at the same time, crucial to study for understanding the origin of fundamental physical and chemical processes. In this review we examine the current state and prospects of ultrafast phenomena…
▽ More
In the past twenty years, we have reached a broad understanding of many light-driven phenomena in nanoscale systems. The temporal dynamics of the excited states are instead quite challenging to explore, and, at the same time, crucial to study for understanding the origin of fundamental physical and chemical processes. In this review we examine the current state and prospects of ultrafast phenomena driven by plasmons both from a fundamental and applied point of view. This research area is referred to as ultrafast plasmonics and represents an outstanding playground to tailor and control fast optical and electronic processes at the nanoscale, such as ultrafast optical switching, single photon emission and strong coupling interactions to tailor photochemical reactions. Here, we provide an overview of the field, and describe the methodologies to monitor and control nanoscale phenomena with plasmons at ultrafast timescales in terms of both modeling and experimental characterization. Various directions are showcased, among others recent advances in ultrafast plasmon-driven chemistry and multi-functional plasmonics, in which charge, spin, and lattice degrees of freedom are exploited to provide active control of the optical and electronic properties of nanoscale materials. As the focus shifts to the development of practical devices, such as all-optical transistors, we also emphasize new materials and applications in ultrafast plasmonics and highlight recent development in the relativistic realm. The latter is a promising research field with potential applications in fusion research or particle and light sources providing properties such as attosecond duration.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
A simple probabilistic neural network for machine understanding
Authors:
Rongrong Xie,
Matteo Marsili
Abstract:
We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding. Here understanding is intended as map** data to an already existing representation which encodes an {\em a priori} organisation of the feature space. We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance abo…
▽ More
We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding. Here understanding is intended as map** data to an already existing representation which encodes an {\em a priori} organisation of the feature space. We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined. We show that, when hidden units are binary variables, these two principles identify a unique model -- the Hierarchical Feature Model (HFM) -- which is fully solvable and provides a natural interpretation in terms of features. We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data, the possibility to control the level of compression and the ability to support functions that go beyond generalisation. We explore the behaviour of the model with extensive numerical experiments and argue that models where the internal representation is fixed reproduce a learning modality which is qualitatively different from that of traditional models such as Restricted Boltzmann Machines.
△ Less
Submitted 6 December, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Quantifying Relevance in Learning and Inference
Authors:
Matteo Marsili,
Yasser Roudi
Abstract:
Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharte…
▽ More
Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high-dimensional and scarce, and prior information on "true" models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of "relevance". The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf's law statistics. This identifies samples obeying Zipf's law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
A random energy approach to deep learning
Authors:
Rongrong Xie,
Matteo Marsili
Abstract:
We study a generic ensemble of deep belief networks which is parametrized by the distribution of energy levels of the hidden states of each layer. We show that, within a random energy approach, statistical dependence can propagate from the visible to deep layers only if each layer is tuned close to the critical point during learning. As a consequence, efficiently trained learning machines are char…
▽ More
We study a generic ensemble of deep belief networks which is parametrized by the distribution of energy levels of the hidden states of each layer. We show that, within a random energy approach, statistical dependence can propagate from the visible to deep layers only if each layer is tuned close to the critical point during learning. As a consequence, efficiently trained learning machines are characterised by a broad distribution of energy levels. The analysis of Deep Belief Networks and Restricted Boltzmann Machines on different datasets confirms these conclusions.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
The rise and fall of hubs in Self-Organized Critical learning networks
Authors:
Anjan Roy,
Serena Di Santo,
Matteo Marsili
Abstract:
Information processing networks are the result of local rewiring rules. In many instances, such rules promote links where the activity at the two end nodes is positively correlated. The conceptual problem we address is what network architecture prevails under such rules and how does the resulting network, in turn, constrain the dynamics. We focus on a simple toy model that captures the interplay b…
▽ More
Information processing networks are the result of local rewiring rules. In many instances, such rules promote links where the activity at the two end nodes is positively correlated. The conceptual problem we address is what network architecture prevails under such rules and how does the resulting network, in turn, constrain the dynamics. We focus on a simple toy model that captures the interplay between link self-reinforcement and a Self-Organised Critical dynamics in a simple way. Our main finding is that, under these conditions, a core of densely connected nodes forms spontaneously. Moreover, we show that the appearance of such clustered state can be dynamically regulated by a fatigue mechanism, eventually giving rise to non-trivial avalanche exponents.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Spinorial formulation of the GW-BSE equations and spin properties of excitons in 2D Transition Metal Dichalcogenides
Authors:
Margherita Marsili,
Alejandro Molina-Sánchez,
Maurizia Palummo,
Davide Sangalli,
Andrea Marini
Abstract:
In many paradigmatic materials, like Transition Metal Dichalcogenides, the role played by the spin degrees of freedom is as important as the one played by the electron-electron interaction. Thus an accurate treatment of the two effects and of their interaction is necessary for an accurate and predictive study of the optical and electronic properties of these materials. Despite the GW-BSE approach…
▽ More
In many paradigmatic materials, like Transition Metal Dichalcogenides, the role played by the spin degrees of freedom is as important as the one played by the electron-electron interaction. Thus an accurate treatment of the two effects and of their interaction is necessary for an accurate and predictive study of the optical and electronic properties of these materials. Despite the GW-BSE approach correctly accounts for electronic correlations the spin-orbit coupling effect is often neglected or treated perturbatively. Recently spinorial formulations of GW-BSE have become available in different flavours in material-science codes. Still an accurate validation and comparison of different approaches is missing. In this work we go through the derivation of non collinear GW-BSE approach. The scheme is applied to transition metal dichalcogenides comparing perturbative and full spinorial approach. Our calculations reveal that dark-bright exciton splittings are generally improved when the spin orbit coupling is included non perturbatively. The exchange-driven intravalley mixing between the A and B exciton is found to be extremely important in the case of MoSe$_2$. We finally define the excitonic spin and use it to sharply analyze the spinorial properties of Transition Metal Dichalcogenides excitonic states.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Information thermodynamics of financial markets: the Glosten-Milgrom model
Authors:
Léo Touzo,
Matteo Marsili,
Don Zagier
Abstract:
The Glosten-Milgrom model describes a single asset market, where informed traders interact with a market maker, in the presence of noise traders. We derive an analogy between this financial model and a Szilárd information engine by {\em i)} showing that the optimal work extraction protocol in the latter coincides with the pricing strategy of the market maker in the former and {\em ii)} defining a…
▽ More
The Glosten-Milgrom model describes a single asset market, where informed traders interact with a market maker, in the presence of noise traders. We derive an analogy between this financial model and a Szilárd information engine by {\em i)} showing that the optimal work extraction protocol in the latter coincides with the pricing strategy of the market maker in the former and {\em ii)} defining a market analogue of the physical temperature from the analysis of the distribution of market orders. Then we show that the expected gain of informed traders is bounded above by the product of this market temperature with the amount of information that informed traders have, in exact analogy with the corresponding formula for the maximal expected amount of work that can be extracted from a cycle of the information engine. This suggests that recent ideas from information thermodynamics may shed light on financial markets, and lead to generalised inequalities, in the spirit of the extended second law of thermodynamics.
△ Less
Submitted 24 January, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Statistical Inference of Minimally Complex Models
Authors:
Clélia de Mulatier,
Paolo P. Mazza,
Matteo Marsili
Abstract:
Finding the model that best describes a high dimensional dataset is a daunting task. For binary data, we show that this becomes feasible when restricting the search to a family of simple models, that we call Minimally Complex Models (MCMs). These are spin models, with interactions of arbitrary order, that are composed of independent components of minimal complexity (Beretta et al., 2018). They ten…
▽ More
Finding the model that best describes a high dimensional dataset is a daunting task. For binary data, we show that this becomes feasible when restricting the search to a family of simple models, that we call Minimally Complex Models (MCMs). These are spin models, with interactions of arbitrary order, that are composed of independent components of minimal complexity (Beretta et al., 2018). They tend to be simple in information theoretic terms, which means that they are well-fitted to specific types of data, and are therefore easy to falsify. We show that Bayesian model selection restricted to these models is computationally feasible and has many other advantages. First, their evidence, which trades off goodness-of-fit against model complexity, can be computed easily without any parameter fitting. This allows selecting the best MCM among all, even though the number of models is astronomically large. Furthermore, MCMs can be inferred and sampled from without any computational effort. Finally, model selection among MCMs is invariant with respect to changes in the representation of the data. MCMs portray the structure of dependencies among variables in a simple way, as illustrated in several examples, and thus provide robust predictions on dependencies in the data. MCMs contain interactions of any order between variables, and thus may reveal the presence of interactions of order higher than pairwise.
△ Less
Submitted 27 September, 2021; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Characterising authors on the extent of their paper acceptance: A case study of the Journal of High Energy Physics
Authors:
Rima Hazra,
Aryan,
Hardik Aggarwal,
Matteo Marsili,
Animesh Mukherjee
Abstract:
New researchers are usually very curious about the recipe that could accelerate the chances of their paper getting accepted in a reputed forum (journal/conference). In search of such a recipe, we investigate the profile and peer review text of authors whose papers almost always get accepted at a venue (Journal of High Energy Physics in our current work). We find authors with high acceptance rate a…
▽ More
New researchers are usually very curious about the recipe that could accelerate the chances of their paper getting accepted in a reputed forum (journal/conference). In search of such a recipe, we investigate the profile and peer review text of authors whose papers almost always get accepted at a venue (Journal of High Energy Physics in our current work). We find authors with high acceptance rate are likely to have a high number of citations, high $h$-index, higher number of collaborators etc. We notice that they receive relatively lengthy and positive reviews for their papers. In addition, we also construct three networks -- co-reviewer, co-citation and collaboration network and study the network-centric features and intra- and inter-category edge interactions. We find that the authors with high acceptance rate are more `central' in these networks; the volume of intra- and inter-category interactions are also drastically different for the authors with high acceptance rate compared to the other authors. Finally, using the above set of features, we train standard machine learning models (random forest, XGBoost) and obtain very high class wise precision and recall. In a followup discussion we also narrate how apart from the author characteristics, the peer-review system might itself have a role in propelling the distinction among the different categories which could lead to potential discrimination and unfairness and calls for further investigation by the system admins.
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Optimal Work Extraction and the Minimum Description Length Principle
Authors:
Léo Touzo,
Matteo Marsili,
Neri Merhav,
Édgar Roldán
Abstract:
We discuss work extraction from classical information engines (e.g., Szilárd) with $N$-particles, $q$ partitions, and initial arbitrary non-equilibrium states. In particular, we focus on their {\em optimal} behaviour, which includes the measurement of a set of quantities $Φ$ with a feedback protocol that extracts the maximal average amount of work. We show that the optimal non-equilibrium state to…
▽ More
We discuss work extraction from classical information engines (e.g., Szilárd) with $N$-particles, $q$ partitions, and initial arbitrary non-equilibrium states. In particular, we focus on their {\em optimal} behaviour, which includes the measurement of a set of quantities $Φ$ with a feedback protocol that extracts the maximal average amount of work. We show that the optimal non-equilibrium state to which the engine should be driven before the measurement is given by the normalised maximum-likelihood probability distribution of a statistical model that admits $Φ$ as sufficient statistics. Furthermore, we show that the minimax universal code redundancy $\mathcal{R}^*$ associated to this model, provides an upper bound to the work that the demon can extract on average from the cycle, in units of $k_{\rm B}T$. We also find that, in the limit of $N$ large, the maximum average extracted work cannot exceed $H[Φ]/2$, i.e. one half times the Shannon entropy of the measurement. Our results establish a connection between optimal work extraction in stochastic thermodynamics and optimal universal data compression, providing design principles for optimal information engines. In particular, they suggest that: (i) optimal coding is thermodynamically efficient, and (ii) it is essential to drive the system into a critical state in order to achieve optimal performance.
△ Less
Submitted 30 July, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Estimating the impact of preventive quarantine with reverse epidemiology
Authors:
Jacopo Grilli,
Matteo Marsili,
Guido Sanguinetti
Abstract:
The impact of mitigation or control measures on an epidemics can be estimated by fitting the parameters of a compartmental model to empirical data, and running the model forward with modified parameters that account for a specific measure. This approach has several drawbacks, stemming from biases or lack of availability of data and instability of parameter estimates. Here we take the opposite appr…
▽ More
The impact of mitigation or control measures on an epidemics can be estimated by fitting the parameters of a compartmental model to empirical data, and running the model forward with modified parameters that account for a specific measure. This approach has several drawbacks, stemming from biases or lack of availability of data and instability of parameter estimates. Here we take the opposite approach -- that we call reverse epidemiology. Given the data, we reconstruct backward in time an ensemble of networks of contacts, and we assess the impact of measures on that specific realization of the contagion process. This approach is robust because it only depends on parameters that describe the evolution of the disease within one individual (e.g. latency time) and not on parameters that describe the spread of the epidemics in a population. Using this method, we assess the impact of preventive quarantine on the ongoing outbreak of Covid-19 in Italy. This gives an estimate of how many infected could have been avoided had preventive quarantine been enforced at a given time.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
Koopmans Meets Bethe-Salpeter: Excitonic Optical Spectra without GW
Authors:
Joshua Elliott,
Nicola Colonna,
Margherita Marsili,
Nicola Marzari,
Paolo Umari
Abstract:
The Bethe-Salpeter Equation (BSE) can be applied to compute from first-principles optical spectra that include the effects of screened electron-hole interactions. As input, BSE calculations require single-particle states, quasiparticle energy levels and the screened Coulomb interaction, which are typically obtained with many-body perturbation theory, whose cost limits the scope of possible applica…
▽ More
The Bethe-Salpeter Equation (BSE) can be applied to compute from first-principles optical spectra that include the effects of screened electron-hole interactions. As input, BSE calculations require single-particle states, quasiparticle energy levels and the screened Coulomb interaction, which are typically obtained with many-body perturbation theory, whose cost limits the scope of possible applications. This work tries to address this practical limitation, instead deriving spectral energies from Koopmans-compliant functionals and introducing a new methodology for handling the screened Coulomb interaction. The explicit calculation of the $W$ matrix is bypassed via a direct minimization scheme applied on top of a maximally localised Wannier function basis. We validate and benchmark this approach by computing the low-lying excited states of the molecules in Thiel's set, and the optical absorption spectrum of a $\text{C}_{60}$ fullerene. The results show the same trends as quantum chemical methods and are in excellent agreement with previous simulations carried out at the TD-DFT or $G_{0}W_{0}$-\text{BSE} level. Conveniently, the new framework reduces the parameter space controlling the accuracy of the calculation, thereby simplifying the simulation of charge-neutral excitations, offering the potential to expand the applicability of first-principles spectroscopies to larger systems of applied interest.
△ Less
Submitted 31 December, 2019;
originally announced December 2019.
-
Thermodynamic Computing
Authors:
Tom Conte,
Erik DeBenedictis,
Natesh Ganesh,
Todd Hylton,
John Paul Strachan,
R. Stanley Williams,
Alexander Alemi,
Lee Altenberg,
Gavin Crooks,
James Crutchfield,
Lidia del Rio,
Josh Deutsch,
Michael DeWeese,
Khari Douglas,
Massimiliano Esposito,
Michael Frank,
Robert Fry,
Peter Harsha,
Mark Hill,
Christopher Kello,
Jeff Krichmar,
Suhas Kumar,
Shih-Chii Liu,
Seth Lloyd,
Matteo Marsili
, et al. (14 additional authors not shown)
Abstract:
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hard…
▽ More
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hardware, devices have become so small that we are struggling to eliminate the effects of thermodynamic fluctuations, which are unavoidable at the nanometer scale. In terms of software, our ability to imagine and program effective computational abstractions and implementations are clearly challenged in complex domains. In terms of systems, currently five percent of the power generated in the US is used to run computing systems - this astonishing figure is neither ecologically sustainable nor economically scalable. Economically, the cost of building next-generation semiconductor fabrication plants has soared past $10 billion. All of these difficulties - device scaling, software complexity, adaptability, energy consumption, and fabrication economics - indicate that the current computing paradigm has matured and that continued improvements along this path will be limited. If technological progress is to continue and corresponding social and economic benefits are to continue to accrue, computing must become much more capable, energy efficient, and affordable. We propose that progress in computing can continue under a united, physically grounded, computational paradigm centered on thermodynamics. Herein we propose a research agenda to extend these thermodynamic foundations into complex, non-equilibrium, self-organizing systems and apply them holistically to future computing systems that will harness nature's innate computational capacity. We call this type of computing "Thermodynamic Computing" or TC.
△ Less
Submitted 14 November, 2019; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Optimal work extraction and mutual information in a generalized Szilárd engine
Authors:
Juyong Song,
Susanne Still,
Rafael Díaz Hernández Rojas,
Isaac Pérez Castillo,
Matteo Marsili
Abstract:
A 1929 Gedankenexperiment proposed by Szilárd, often referred to as "Szilárd's engine", has served as a foundation for computing fundamental thermodynamic bounds to information processing. While Szilárd's original box could be partitioned into two halves and contains one gas molecule, we calculate here the maximal average work that can be extracted in a system with $N$ particles and $q$ partitions…
▽ More
A 1929 Gedankenexperiment proposed by Szilárd, often referred to as "Szilárd's engine", has served as a foundation for computing fundamental thermodynamic bounds to information processing. While Szilárd's original box could be partitioned into two halves and contains one gas molecule, we calculate here the maximal average work that can be extracted in a system with $N$ particles and $q$ partitions, given an observer which counts the molecules in each partition, and given a work extraction mechanism that is limited to pressure equalization. We find that the average extracted work is proportional to the mutual information between the one-particle position and the vector containing the counts of how many particles are in each partition. We optimize this quantity over the initial locations of the dividing walls, and find that there exists a critical number of particles $N^{\star}(q)$ below which the extracted work is maximized by a symmetric configuration of the $q$ partitions, and above which the optimal partitioning is asymmetric. Overall, the average extracted work is maximized for a number of particles $\hat{N}(q)<N^{\star}(q)$, with a symmetric partition. We calculate asymptotic values for $N\rightarrow \infty$.
△ Less
Submitted 18 May, 2021; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Maximal Relevance and Optimal Learning Machines
Authors:
O Duranthon,
M Marsili,
R Xie
Abstract:
We show that the mutual information between the representation of a learning machine and the hidden features that it extracts from data is bounded from below by the relevance, which is the entropy of the model's energy distribution. Models with maximal relevance -- that we call Optimal Learning Machines (OLM) -- are hence expected to extract maximally informative representations. We explore this p…
▽ More
We show that the mutual information between the representation of a learning machine and the hidden features that it extracts from data is bounded from below by the relevance, which is the entropy of the model's energy distribution. Models with maximal relevance -- that we call Optimal Learning Machines (OLM) -- are hence expected to extract maximally informative representations. We explore this principle in a range of models. For fully connected Ising models and we show that {\em i)} OLM are characterised by inhomogeneous distributions of couplings, and that {\em ii)} their learning performance is affected by sub-extensive features that are elusive to a thermodynamic treatment. On specific learning tasks, we find that likelihood maximisation is achieved by models with maximal relevance. Training of Restricted Boltzmann Machines on the MNIST benchmark shows that learning is associated with a broadening of the spectrum of energy levels and that the internal representation of the hidden layer approaches the maximal relevance that can be achieved in a finite dataset. Finally, we discuss a Gaussian learning machine that clarifies that learning hidden features is conceptually different from parameter estimation.
△ Less
Submitted 27 January, 2021; v1 submitted 27 September, 2019;
originally announced September 2019.
-
The peculiar statistical mechanics of Optimal Learning Machines
Authors:
Matteo Marsili
Abstract:
Optimal Learning Machines (OLM) are systems that extract maximally informative representation of the environment they are in contact with, or of the data they are presented. It has recently been suggested that these systems are characterised by an exponential distribution of energy levels. In order to understand the peculiar properties of OLM within a broader framework, I consider an ensemble of o…
▽ More
Optimal Learning Machines (OLM) are systems that extract maximally informative representation of the environment they are in contact with, or of the data they are presented. It has recently been suggested that these systems are characterised by an exponential distribution of energy levels. In order to understand the peculiar properties of OLM within a broader framework, I consider an ensemble of optimisation problems over functions of many variables, part of which describe a sub-system and the rest account for its interaction with a random environment. The number of states of the sub-system with a given value of the objective function obeys a stretched exponential distribution, with exponent $γ$, and the interaction part is drawn at random from the same distribution, independently for each configuration of the whole system. Systems with $γ=1$ then correspond to OLM, and we find that they sit at the boundary between two regions with markedly different properties. For all $γ>0$ the system exhibits a freezing phase transition. The transition is discontinuous for $γ<1$ and it is continuous for $γ>1$. The region $γ>1$ corresponds to learnable energy landscapes and the behaviour of the sub-system becomes predictable as the size of the environment exceeds a critical threshold. For $γ<1$, instead, the energy landscape is unlearnable and the behaviour of the system becomes more and more unpredictable as the size of the environment increases. Sub-systems with $γ=1$ (OLM) feature a behaviour which is independent of the relative size of the environment. This is consistent with the expectation that efficient representations should be largely independent of the level of detail of the description of the environment.
△ Less
Submitted 22 July, 2019; v1 submitted 19 April, 2019;
originally announced April 2019.
-
On the complexity of logistic regression models
Authors:
Nicola Bulso,
Matteo Marsili,
Yasser Roudi
Abstract:
We investigate the complexity of logistic regression models which is defined by counting the number of indistinguishable distributions that the model can represent (Balasubramanian, 1997). We find that the complexity of logistic models with binary inputs does not only depend on the number of parameters but also on the distribution of inputs in a non-trivial way which standard treatments of complex…
▽ More
We investigate the complexity of logistic regression models which is defined by counting the number of indistinguishable distributions that the model can represent (Balasubramanian, 1997). We find that the complexity of logistic models with binary inputs does not only depend on the number of parameters but also on the distribution of inputs in a non-trivial way which standard treatments of complexity do not address. In particular, we observe that correlations among inputs induce effective dependencies among parameters thus constraining the model and, consequently, reducing its complexity. We derive simple relations for the upper and lower bounds of the complexity. Furthermore, we show analytically that, defining the model parameters on a finite support rather than the entire axis, decreases the complexity in a manner that critically depends on the size of the domain. Based on our findings, we propose a novel model selection criterion which takes into account the entropy of the input distribution. We test our proposal on the problem of selecting the input variables of a logistic regression model in a Bayesian Model Selection framework. In our numerical tests, we find that, while the reconstruction errors of standard model selection approaches (AIC, BIC, $\ell_1$ regularization) strongly depend on the sparsity of the ground truth, the reconstruction error of our method is always close to the minimum in all conditions of sparsity, data size and strength of input correlations. Finally, we observe that, when considering categorical instead of binary inputs, in a simple and mathematically tractable case, the contribution of the alphabet size to the complexity is very small compared to that of parameter space dimension. We further explore the issue by analysing the dataset of the "13 keys to the White House" which is a method for forecasting the outcomes of US presidential elections.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Many-body perturbation theory calculations using the yambo code
Authors:
D. Sangalli,
A. Ferretti,
H. Miranda,
C. Attaccalite,
I. Marri,
E. Cannuccia,
P. Melo,
M. Marsili,
F. Paleari,
A. Marrazzo,
G. Prandini,
P. Bonfà,
M. O. Atambo,
F. Affinito,
M. Palummo,
A. Molina-Sánchez,
C. Hogan,
M. Grüning,
D. Varsano,
A. Marini
Abstract:
yambo is an open source project aimed at studying excited state properties of condensed matter systems from first principles using many-body methods. As input, yambo requires ground state electronic structure data as computed by density functional theory codes such as quantum-espresso and abinit. yambo's capabilities include the calculation of linear response quantities (both independent-particle…
▽ More
yambo is an open source project aimed at studying excited state properties of condensed matter systems from first principles using many-body methods. As input, yambo requires ground state electronic structure data as computed by density functional theory codes such as quantum-espresso and abinit. yambo's capabilities include the calculation of linear response quantities (both independent-particle and including electron-hole interactions), quasi-particle corrections based on the GW formalism, optical absorption, and other spectroscopic quantities. Here we describe recent developments ranging from the inclusion of important but oft-neglected physical effects such as electron-phonon interactions to the implementation of a real-time propagation scheme for simulating linear and non-linear optical properties. Improvements to numerical algorithms and the user interface are outlined. Particular emphasis is given to the new and efficient parallel structure that makes it possible to exploit modern high performance computing architectures. Finally, we demonstrate the possibility to automate workflows by interfacing with the yambopy and AiiDA software tools.
△ Less
Submitted 7 June, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Lost in Diversification
Authors:
Marco Bardoscia,
Daniele d'Arienzo,
Matteo Marsili,
Valerio Volpati
Abstract:
As financial instruments grow in complexity more and more information is neglected by risk optimization practices. This brings down a curtain of opacity on the origination of risk, that has been one of the main culprits in the 2007-2008 global financial crisis. We discuss how the loss of transparency may be quantified in bits, using information theoretic concepts. We find that {\em i)} financial t…
▽ More
As financial instruments grow in complexity more and more information is neglected by risk optimization practices. This brings down a curtain of opacity on the origination of risk, that has been one of the main culprits in the 2007-2008 global financial crisis. We discuss how the loss of transparency may be quantified in bits, using information theoretic concepts. We find that {\em i)} financial transformations imply large information losses, {\em ii)} portfolios are more information sensitive than individual stocks only if fundamental analysis is sufficiently informative on the co-movement of assets, that {\em iii)} securitisation, in the relevant range of parameters, yields assets that are less information sensitive than the original stocks and that {\em iv)} when diversification (or securitisation) is at its best (i.e. when assets are uncorrelated) information losses are maximal. We also address the issue of whether pricing schemes can be introduced to deal with information losses. This is relevant for the transmission of incentives to gather information on the risk origination side. Within a simple mean variance scheme, we find that market incentives are not generally sufficient to make information harvesting sustainable.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Minimum Description Length codes are critical
Authors:
Ryan John Cubero,
Matteo Marsili,
Yasser Roudi
Abstract:
In the Minimum Description Length (MDL) principle, learning from the data is equivalent to an optimal coding problem. We show that the codes that achieve optimal compression in MDL are critical in a very precise sense. First, when they are taken as generative models of samples, they generate samples with broad empirical distributions and with a high value of the relevance, defined as the entropy o…
▽ More
In the Minimum Description Length (MDL) principle, learning from the data is equivalent to an optimal coding problem. We show that the codes that achieve optimal compression in MDL are critical in a very precise sense. First, when they are taken as generative models of samples, they generate samples with broad empirical distributions and with a high value of the relevance, defined as the entropy of the empirical frequencies. These results are derived for different statistical models (Dirichlet model, independent and pairwise dependent spin models, and restricted Boltzmann machines). Second, MDL codes sit precisely at a second order phase transition point where the symmetry between the sampled outcomes is spontaneously broken. The order parameter controlling the phase transition is the coding cost of the samples. The phase transition is a manifestation of the optimality of MDL codes, and it arises because codes that achieve a higher compression do not exist. These results suggest a clear interpretation of the widespread occurrence of statistical criticality as a characterization of samples which are maximally informative on the underlying generative process.
△ Less
Submitted 2 October, 2018; v1 submitted 3 September, 2018;
originally announced September 2018.
-
Statistical Criticality arises in Most Informative Representations
Authors:
Ryan John Cubero,
Junghyo Jo,
Matteo Marsili,
Yasser Roudi,
Juyong Song
Abstract:
We show that statistical criticality, i.e. the occurrence of power law frequency distributions, arises in samples that are maximally informative about the underlying generating process. In order to reach this conclusion, we first identify the frequency with which different outcomes occur in a sample, as the variable carrying useful information on the generative process. The entropy of the frequenc…
▽ More
We show that statistical criticality, i.e. the occurrence of power law frequency distributions, arises in samples that are maximally informative about the underlying generating process. In order to reach this conclusion, we first identify the frequency with which different outcomes occur in a sample, as the variable carrying useful information on the generative process. The entropy of the frequency, that we call relevance, provides an upper bound to the number of informative bits. This differs from the entropy of the data, that we take as a measure of resolution. Samples that maximise relevance at a given resolution - that we call maximally informative samples - exhibit statistical criticality. In particular, Zipf's law arises at the optimal trade-off between resolution (i.e. compression) and relevance. As a byproduct, we derive a bound of the maximal number of parameters that can be estimated from a dataset, in the absence of prior knowledge on the generative model.
Furthermore, we relate criticality to the statistical properties of the representation of the data generating process. We show that, as a consequence of the concentration property of the Asymptotic Equipartition Property, representations that are maximally informative about the data generating process are characterised by an exponential distribution of energy levels. This arises from a principle of minimal entropy, that is conjugate of the maximum entropy principle in statistical mechanics. This explains why statistical criticality requires no parameter fine tuning in maximally informative samples.
△ Less
Submitted 8 July, 2019; v1 submitted 1 August, 2018;
originally announced August 2018.
-
Multiscale relevance and informative encoding in neuronal spike trains
Authors:
Ryan John Cubero,
Matteo Marsili,
Yasser Roudi
Abstract:
Neuronal responses to complex stimuli and tasks can encompass a wide range of time scales. Understanding these responses requires measures that characterize how the information on these response patterns are represented across multiple temporal resolutions. In this paper we propose a metric -- which we call multiscale relevance (MSR) -- to capture the dynamical variability of the activity of singl…
▽ More
Neuronal responses to complex stimuli and tasks can encompass a wide range of time scales. Understanding these responses requires measures that characterize how the information on these response patterns are represented across multiple temporal resolutions. In this paper we propose a metric -- which we call multiscale relevance (MSR) -- to capture the dynamical variability of the activity of single neurons across different time scales. The MSR is a non-parametric, fully featureless indicator in that it uses only the time stamps of the firing activity without resorting to any a priori covariate or invoking any specific structure in the tuning curve for neural activity. When applied to neural data from the mEC and from the ADn and PoS regions of freely-behaving rodents, we found that neurons having low MSR tend to have low mutual information and low firing sparsity across the correlates that are believed to be encoded by the region of the brain where the recordings were made. In addition, neurons with high MSR contain significant information on spatial navigation and allow to decode spatial position or head direction as efficiently as those neurons whose firing activity has high mutual information with the covariate to be decoded and significantly better than the set of neurons with high local variations in their interspike intervals. Given these results, we propose that the MSR can be used as a measure to rank and select neurons for their information content without the need to appeal to any a priori covariate.
△ Less
Submitted 20 December, 2019; v1 submitted 28 February, 2018;
originally announced February 2018.
-
On Maximum Entropy and Inference
Authors:
Luigi Gresele,
Matteo Marsili
Abstract:
Maximum Entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent) variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (suffic…
▽ More
Maximum Entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent) variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (sufficient statistics) directly from the data, once a model is identified by Bayesian model selection. We explore this approach in the case of spin models with interactions of arbitrary order, and we discuss how relevant interactions can be inferred. In this perspective, the dimensionality of the inference problem is not set by the number of parameters in the model, but by the frequency distribution of the data. We illustrate the method showing its ability to recover the correct model in a few prototype cases and discuss its application on a real dataset.
△ Less
Submitted 25 November, 2017;
originally announced January 2018.
-
Resolution and Relevance Trade-offs in Deep Learning
Authors:
Juyong Song,
Matteo Marsili,
Junghyo Jo
Abstract:
Deep learning has been successfully applied to various tasks, but its underlying mechanism remains unclear. Neural networks associate similar inputs in the visible layer to the same state of hidden variables in deep layers. The fraction of inputs that are associated to the same state is a natural measure of similarity and is simply related to the cost in bits required to represent these inputs. Th…
▽ More
Deep learning has been successfully applied to various tasks, but its underlying mechanism remains unclear. Neural networks associate similar inputs in the visible layer to the same state of hidden variables in deep layers. The fraction of inputs that are associated to the same state is a natural measure of similarity and is simply related to the cost in bits required to represent these inputs. The degeneracy of states with the same information cost provides instead a natural measure of noise and is simply related the entropy of the frequency of states, that we call relevance. Representations with minimal noise, at a given level of similarity (resolution), are those that maximise the relevance. A signature of such efficient representations is that frequency distributions follow power laws. We show, in extensive numerical experiments, that deep neural networks extract a hierarchy of efficient representations from data, because they i) achieve low levels of noise (i.e. high relevance) and ii) exhibit power law distributions. We also find that the layer that is most efficient to reliably generate patterns of training data is the one for which relevance and resolution are traded at the same price, which implies that frequency distribution follows Zipf's law.
△ Less
Submitted 19 March, 2018; v1 submitted 31 October, 2017;
originally announced October 2017.
-
Advanced capabilities for materials modelling with Quantum ESPRESSO
Authors:
P. Giannozzi,
O. Andreussi,
T. Brumme,
O. Bunau,
M. Buongiorno Nardelli,
M. Calandra,
R. Car,
C. Cavazzoni,
D. Ceresoli,
M. Cococcioni,
N. Colonna,
I. Carnimeo,
A. Dal Corso,
S. de Gironcoli,
P. Delugas,
R. A. DiStasio Jr.,
A. Ferretti,
A. Floris,
G. Fratesi,
G. Fugallo,
R. Gebauer,
U. Gerstmann,
F. Giustino,
T. Gorni,
J. Jia
, et al. (25 additional authors not shown)
Abstract:
Quantum ESPRESSO is an integrated suite of open-source computer codes for quantum simulations of materials using state-of-the art electronic-structure techniques, based on density-functional theory, density-functional perturbation theory, and many-body perturbation theory, within the plane-wave pseudo-potential and projector-augmented-wave approaches. Quantum ESPRESSO owes its popularity to the wi…
▽ More
Quantum ESPRESSO is an integrated suite of open-source computer codes for quantum simulations of materials using state-of-the art electronic-structure techniques, based on density-functional theory, density-functional perturbation theory, and many-body perturbation theory, within the plane-wave pseudo-potential and projector-augmented-wave approaches. Quantum ESPRESSO owes its popularity to the wide variety of properties and processes it allows to simulate, to its performance on an increasingly broad array of hardware architectures, and to a community of researchers that rely on its capabilities as a core open-source development platform to implement theirs ideas. In this paper we describe recent extensions and improvements, covering new methodologies and property calculators, improved parallelization, code modularization, and extended interoperability both within the distribution and with external software.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
Influence of Reviewer Interaction Network on Long-term Citations: A Case Study of the Scientific Peer-Review System of the Journal of High Energy Physics
Authors:
Sandipan Sikdar,
Matteo Marsili,
Niloy Ganguly,
Animesh Mukherjee
Abstract:
A `peer-review system' in the context of judging research contributions, is one of the prime steps undertaken to ensure the quality of the submissions received, a significant portion of the publishing budget is spent towards successful completion of the peer-review by the publication houses. Nevertheless, the scientific community is largely reaching a consensus that peer-review system, although in…
▽ More
A `peer-review system' in the context of judging research contributions, is one of the prime steps undertaken to ensure the quality of the submissions received, a significant portion of the publishing budget is spent towards successful completion of the peer-review by the publication houses. Nevertheless, the scientific community is largely reaching a consensus that peer-review system, although indispensable, is nonetheless flawed. A very pertinent question therefore is "could this system be improved?". In this paper, we attempt to present an answer to this question by considering a massive dataset of around $29k$ papers with roughly $70k$ distinct review reports together consisting of $12m$ lines of review text from the Journal of High Energy Physics (JHEP) between 1997 and 2015. In specific, we introduce a novel \textit{reviewer-reviewer interaction network} (an edge exists between two reviewers if they were assigned by the same editor) and show that surprisingly the simple structural properties of this network such as degree, clustering coefficient, centrality (closeness, betweenness etc.) serve as strong predictors of the long-term citations (i.e., the overall scientific impact) of a submitted paper. These features, when plugged in a regression model, alone achieves a high $R^2$ of \0.79 and a low $RMSE$ of 0.496 in predicting the long-term citations. In addition, we also design a set of supporting features built from the basic characteristics of the submitted papers, the authors and the referees (e.g., the popularity of the submitting author, the acceptance rate history of a referee, the linguistic properties laden in the text of the review reports etc.), which further results in overall improvement with $R^2$ of 0.81 and $RMSE$ of 0.46.
△ Less
Submitted 2 May, 2017;
originally announced May 2017.
-
Photo-Induced Bandgap Renormalization Governs the Ultrafast Response of Single-Layer MoS2
Authors:
Eva A. A. Pogna,
Margherita Marsili,
Domenico De Fazio,
Stefano Dal Conte,
Cristian Manzoni,
Davide Sangalli,
Duhee Yoon,
Antonio Lombardo,
Andrea C. Ferrari,
Andrea Marini,
Giulio Cerullo,
Deborah Prezzi
Abstract:
Transition metal dichalcogenides (TMDs) are emerging as promising two-dimensional (2d) semiconductors for optoelectronic and flexible devices. However, a microscopic explanation of their photophysics -- of pivotal importance for the understanding and optimization of device operation -- is still lacking. Here we use femtosecond transient absorption spectroscopy, with pump pulse tunability and broad…
▽ More
Transition metal dichalcogenides (TMDs) are emerging as promising two-dimensional (2d) semiconductors for optoelectronic and flexible devices. However, a microscopic explanation of their photophysics -- of pivotal importance for the understanding and optimization of device operation -- is still lacking. Here we use femtosecond transient absorption spectroscopy, with pump pulse tunability and broadband probing, to monitor the relaxation dynamics of single-layer MoS2 over the entire visible range, upon photoexcitation of different excitonic transitions. We find that, irrespective of excitation photon energy, the transient absorption spectrum shows the simultaneous bleaching of all excitonic transitions and corresponding red-shifted photoinduced absorption bands. First-principle modeling of the ultrafast optical response reveals that a transient bandgap renormalization, caused by the presence of photo-excited carriers, is primarily responsible for the observed features. Our results demonstrate the strong impact of many-body effects in the transient optical response of TMDs even in the low-excitation-density regime.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
The Stochastic complexity of spin models: Are pairwise models really simple?
Authors:
Alberto Beretta,
Claudia Battistin,
Clélia de Mulatier,
Iacopo Mastromatteo,
Matteo Marsili
Abstract:
Models can be simple for different reasons: because they yield a simple and computationally efficient interpretation of a generic dataset (e.g. in terms of pairwise dependences) - as in statistical learning - or because they capture the essential ingredients of a specific phenomenon - as e.g. in physics - leading to non-trivial falsifiable predictions. In information theory and Bayesian inference,…
▽ More
Models can be simple for different reasons: because they yield a simple and computationally efficient interpretation of a generic dataset (e.g. in terms of pairwise dependences) - as in statistical learning - or because they capture the essential ingredients of a specific phenomenon - as e.g. in physics - leading to non-trivial falsifiable predictions. In information theory and Bayesian inference, the simplicity of a model is precisely quantified in the stochastic complexity, which measures the number of bits needed to encode its parameters. In order to understand how simple models look like, we study the stochastic complexity of spin models with interactions of arbitrary order. We highlight the existence of invariances with respect to bijections within the space of operators, which allow us to partition the space of all models into equivalence classes, in which models share the same complexity. We thus found that the complexity (or simplicity) of a model is not determined by the order of the interactions, but rather by their mutual arrangements. Models where statistical dependencies are localized on non-overlap** groups of few variables (and that afford predictions on independencies that are easy to falsify) are simple. On the contrary, fully connected pairwise models, which are often used in statistical learning, appear to be highly complex, because of their extended set of interactions.
△ Less
Submitted 11 April, 2018; v1 submitted 24 February, 2017;
originally announced February 2017.
-
Translating ceRNA susceptibilities into correlation functions
Authors:
Araks Martirosyan,
Matteo Marsili,
Andrea De Martino
Abstract:
Competition to bind microRNAs induces an effective positive crosstalk between their targets, therefore known as `competing endogenous RNAs' or ceRNAs. While such an effect is known to play a significant role in specific conditions, estimating its strength from data and, experimentally, in physiological conditions appears to be far from simple. Here we show that the susceptibility of ceRNAs to diff…
▽ More
Competition to bind microRNAs induces an effective positive crosstalk between their targets, therefore known as `competing endogenous RNAs' or ceRNAs. While such an effect is known to play a significant role in specific conditions, estimating its strength from data and, experimentally, in physiological conditions appears to be far from simple. Here we show that the susceptibility of ceRNAs to different types of perturbations affecting their competitors (and hence their tendency to crosstalk) can be encoded in quantities as intuitive and as simple to measure as correlation functions. We confirm this scenario by extensive numerical simulations and validate it by re-analyzing PTEN's crosstalk pattern from TCGA breast cancer dataset. These results clarify the links between different quantities used to estimate the intensity of ceRNA crosstalk and provide new keys to analyze transcriptional datasets and effectively probe ceRNA networks in silico.
△ Less
Submitted 19 January, 2017;
originally announced January 2017.
-
The missing assets and the size of Shadow Banking: an update
Authors:
Davide Fiaschi,
Imre Kondor,
Matteo Marsili,
Valerio Volpati
Abstract:
In a recent paper, using data from Forbes Global 2000, we have observed that the upper tail of the firm size distribution (by assets) falls off much faster than a Pareto distribution. The missing mass was suggested as an indicator of the size of the Shadow Banking (SB) sector. This short note provides the latest figures of the missing assets for 2013, 2014 and 2015. In 2013 and 2014 the dynamics o…
▽ More
In a recent paper, using data from Forbes Global 2000, we have observed that the upper tail of the firm size distribution (by assets) falls off much faster than a Pareto distribution. The missing mass was suggested as an indicator of the size of the Shadow Banking (SB) sector. This short note provides the latest figures of the missing assets for 2013, 2014 and 2015. In 2013 and 2014 the dynamics of the missing assets continued being strongly correlated with estimates of the size of the SB sector of the Financial Stability Board. In 2015 we find a sharp decrease in the size of missing assets, suggesting that the SB sector is deflating.
△ Less
Submitted 8 November, 2016;
originally announced November 2016.
-
Improving randomness characterization through Bayesian model selection
Authors:
Rafael Díaz Hernández Rojas,
Aldo Solís,
Alí M. Angulo Martínez,
Alfred B. U'Ren,
Jorge G. Hirsch,
Matteo Marsili,
Isaac Pérez Castillo
Abstract:
Nowadays random number generation plays an essential role in technology with important applications in areas ranging from cryptography, which lies at the core of current communication protocols, to Monte Carlo methods, and other probabilistic algorithms. In this context, a crucial scientific endeavour is to develop effective methods that allow the characterization of random number generators. Howe…
▽ More
Nowadays random number generation plays an essential role in technology with important applications in areas ranging from cryptography, which lies at the core of current communication protocols, to Monte Carlo methods, and other probabilistic algorithms. In this context, a crucial scientific endeavour is to develop effective methods that allow the characterization of random number generators. However, commonly employed methods either lack formality (e.g. the NIST test suite), or are inapplicable in principle (e.g. the characterization derived from the Algorithmic Theory of Information (ATI)). In this letter we present a novel method based on Bayesian model selection, which is both rigorous and effective, for characterizing randomness in a bit sequence. We derive analytic expressions for a model's likelihood which is then used to compute its posterior probability distribution. Our method proves to be more rigorous than NIST's suite and the Borel-Normality criterion and its implementation is straightforward. We have applied our method to an experimental device based on the process of spontaneous parametric downconversion, implemented in our laboratory, to confirm that it behaves as a genuine quantum random number generator (QRNG). As our approach relies on Bayesian inference, which entails model generalizability, our scheme transcends individual sequence analysis, leading to a characterization of the source of the random sequences itself.
△ Less
Submitted 12 June, 2017; v1 submitted 17 August, 2016;
originally announced August 2016.
-
Anomalies in the peer-review system: A case study of the journal of High Energy Physics
Authors:
Sandipan Sikdar,
Matteo Marsili,
Niloy Ganguly,
Animesh Mukherjee
Abstract:
Peer-review system has long been relied upon for bringing quality research to the notice of the scientific community and also preventing flawed research from entering into the literature. The need for the peer-review system has often been debated as in numerous cases it has failed in its task and in most of these cases editors and the reviewers were thought to be responsible for not being able to…
▽ More
Peer-review system has long been relied upon for bringing quality research to the notice of the scientific community and also preventing flawed research from entering into the literature. The need for the peer-review system has often been debated as in numerous cases it has failed in its task and in most of these cases editors and the reviewers were thought to be responsible for not being able to correctly judge the quality of the work. This raises a question "Can the peer-review system be improved?" Since editors and reviewers are the most important pillars of a reviewing system, we in this work, attempt to address a related question - given the editing/reviewing history of the editors or re- viewers "can we identify the under-performing ones?", with citations received by the edited/reviewed papers being used as proxy for quantifying performance. We term such review- ers and editors as anomalous and we believe identifying and removing them shall improve the performance of the peer- review system. Using a massive dataset of Journal of High Energy Physics (JHEP) consisting of 29k papers submitted between 1997 and 2015 with 95 editors and 4035 reviewers and their review history, we identify several factors which point to anomalous behavior of referees and editors. In fact the anomalous editors and reviewers account for 26.8% and 14.5% of the total editors and reviewers respectively and for most of these anomalous reviewers the performance degrades alarmingly over time.
△ Less
Submitted 17 August, 2016;
originally announced August 2016.
-
Large scale GW-BSE calculations with N3 scaling: excitonic effects in dye sensitised solar cells
Authors:
Margherita Marsili,
Edoardo Mosconi,
Filippo De Angelis,
Paolo Umari
Abstract:
Excitonic effects due to electron-hole coupling play a fundamental role in renormalising energy levels in dye sensitised and organic solar cells determining the driving force for electron extraction. We show that first-principles calculations based on many-body perturbation theory within the GW-BSE approach provide a quantitative picture of interfacial excited state energetics in organic dye-sensi…
▽ More
Excitonic effects due to electron-hole coupling play a fundamental role in renormalising energy levels in dye sensitised and organic solar cells determining the driving force for electron extraction. We show that first-principles calculations based on many-body perturbation theory within the GW-BSE approach provide a quantitative picture of interfacial excited state energetics in organic dye-sensitized TiO2 , delivering a general rule for evaluating relevant energy levels.To perform GW-BSE calculations in such large systems we introduce a new scheme based on maximally localized Wannier functions. With this method the overall scaling of GW-BSE calculations is reduced from O(N4) to O(N3).
△ Less
Submitted 29 August, 2016; v1 submitted 17 March, 2016;
originally announced March 2016.
-
Sparse model selection in the highly under-sampled regime
Authors:
Nicola Bulso,
Matteo Marsili,
Yasser Roudi
Abstract:
We propose a method for recovering the structure of a sparse undirected graphical model when very few samples are available. The method decides about the presence or absence of bonds between pairs of variable by considering one pair at a time and using a closed form formula, analytically derived by calculating the posterior probability for every possible model explaining a two body system using Je…
▽ More
We propose a method for recovering the structure of a sparse undirected graphical model when very few samples are available. The method decides about the presence or absence of bonds between pairs of variable by considering one pair at a time and using a closed form formula, analytically derived by calculating the posterior probability for every possible model explaining a two body system using Jeffreys prior. The approach does not rely on the optimisation of any cost functions and consequently is much faster than existing algorithms. Despite this time and computational advantage, numerical results show that for several sparse topologies the algorithm is comparable to the best existing algorithms, and is more accurate in the presence of hidden variables. We apply this approach to the analysis of US stock market data and to neural data, in order to show its efficiency in recovering robust statistical dependencies in real data with non stationary correlations in time and space.
△ Less
Submitted 2 January, 2017; v1 submitted 2 March, 2016;
originally announced March 2016.
-
When does inequality freeze an economy?
Authors:
João Pedro Jerico,
François P. Landes,
Matteo Marsili,
Isaac Pérez Castillo,
Valerio Volpati
Abstract:
Inequality and its consequences are the subject of intense recent debate. Using a simplified model of the economy, we address the relation between inequality and liquidity, the latter understood as the frequency of economic exchanges. Assuming a Pareto distribution of wealth for the agents, that is consistent with empirical findings, we find an inverse relation between wealth inequality and overal…
▽ More
Inequality and its consequences are the subject of intense recent debate. Using a simplified model of the economy, we address the relation between inequality and liquidity, the latter understood as the frequency of economic exchanges. Assuming a Pareto distribution of wealth for the agents, that is consistent with empirical findings, we find an inverse relation between wealth inequality and overall liquidity. We show that an increase in the inequality of wealth results in an even sharper concentration of the liquid financial resources. This leads to a congestion of the flow of goods and the arrest of the economy when the Pareto exponent reaches one.
△ Less
Submitted 21 April, 2016; v1 submitted 23 February, 2016;
originally announced February 2016.
-
Statistical mechanics of complex economies
Authors:
Marco Bardoscia,
Giacomo Livan,
Matteo Marsili
Abstract:
In the pursuit of ever increasing efficiency and growth, our economies have evolved to remarkable degrees of complexity, with nested production processes feeding each other in order to create products of greater sophistication from less sophisticated ones, down to raw materials. The engine of such an expansion have been competitive markets that, according to General Equilibrium Theory (GET), achie…
▽ More
In the pursuit of ever increasing efficiency and growth, our economies have evolved to remarkable degrees of complexity, with nested production processes feeding each other in order to create products of greater sophistication from less sophisticated ones, down to raw materials. The engine of such an expansion have been competitive markets that, according to General Equilibrium Theory (GET), achieve efficient allocations under specific conditions. We study large random economies within the GET framework, as templates of complex economies, and we find that a non-trivial phase transition occurs: the economy freezes in a state where all production processes collapse when either the number of primary goods or the number of available technologies fall below a critical threshold. As in other examples of phase transitions in large random systems, this is an unintended consequence of the growth in complexity. Our findings suggest that the Industrial Revolution can be regarded as a sharp transition between different phases, but also imply that well developed economies can collapse if too many intermediate goods are introduced.
△ Less
Submitted 4 April, 2017; v1 submitted 30 November, 2015;
originally announced November 2015.
-
Trade-offs in delayed information transmission in biochemical networks
Authors:
Francesca Mancini,
Matteo Marsili,
Aleksandra M. Walczak
Abstract:
In order to transmit biochemical signals, biological regulatory systems dissipate energy with concomitant entropy production. Additionally, signaling often takes place in challenging environmental conditions. In a simple model regulatory circuit given by an input and a delayed output, we explore the trade-offs between information transmission and the system's energetic efficiency. We determine the…
▽ More
In order to transmit biochemical signals, biological regulatory systems dissipate energy with concomitant entropy production. Additionally, signaling often takes place in challenging environmental conditions. In a simple model regulatory circuit given by an input and a delayed output, we explore the trade-offs between information transmission and the system's energetic efficiency. We determine the maximally informative network, given a fixed amount of entropy production and delayed response, exploring both the case with and without feedback. We find that feedback allows the circuit to overcome energy constraints and transmit close to the maximum available information even in the dissipationless limit. Negative feedback loops, characteristic of shock responses, are optimal at high dissipation. Close to equilibrium positive feedback loops, known for their stability, become more informative. Asking how the signaling network should be constructed to best function in the worst possible environment, rather than an optimally tuned one or in steady state, we discover that at large dissipation the same universal motif is optimal in all of these conditions.
△ Less
Submitted 14 April, 2015;
originally announced April 2015.
-
Identifying relevant positions in proteins by Critical Variable Selection
Authors:
Silvia Grigolon,
Silvio Franz,
Matteo Marsili
Abstract:
Evolution in its course found a variety of solutions to the same optimisation problem. The advent of high-throughput genomic sequencing has made available extensive data from which, in principle, one can infer the underlying structure on which biological functions rely. In this paper, we present a new method aimed at extracting sites encoding structural and func- tional properties from a set of pr…
▽ More
Evolution in its course found a variety of solutions to the same optimisation problem. The advent of high-throughput genomic sequencing has made available extensive data from which, in principle, one can infer the underlying structure on which biological functions rely. In this paper, we present a new method aimed at extracting sites encoding structural and func- tional properties from a set of protein primary sequences, namely a Multiple Sequence Alignment. The method, called Critical Variable Selection, is based on the idea that subsets of relevant sites cor- respond to subsequences that occur with a particularly broad frequency distribution in the dataset. By applying this algorithm to in silico sequences, to the Response Regulator Receiver and to the Voltage Sensor Domain of Ion Channels, we show that this procedure recovers not only information encoded in single site statistics and pairwise correlations but it also captures dependencies going beyond pairwise correlations. The method proposed here is complementary to Statistical Coupling Analysis, in that the most relevant sites predicted by the two methods markedly differ. We find robust and consistent results for datasets as small as few hundred sequences, that reveal a hidden hierarchy of sites that is consistent with present knowledge on biologically relevant sites and evo- lutionary dynamics. This suggests that Critical Variable Selection is able to identify in a Multiple Sequence Alignment a core of sites encoding functional and structural information.
△ Less
Submitted 19 January, 2016; v1 submitted 12 March, 2015;
originally announced March 2015.
-
Contour map of estimation error for Expected Shortfall
Authors:
Imre Kondor,
Fabio Caccioli,
Gábor Papp,
Matteo Marsili
Abstract:
The contour map of estimation error of Expected Shortfall (ES) is constructed. It allows one to quantitatively determine the sample size (the length of the time series) required by the optimization under ES of large institutional portfolios for a given size of the portfolio, at a given confidence level and a given estimation error.
The contour map of estimation error of Expected Shortfall (ES) is constructed. It allows one to quantitatively determine the sample size (the length of the time series) required by the optimization under ES of large institutional portfolios for a given size of the portfolio, at a given confidence level and a given estimation error.
△ Less
Submitted 22 February, 2015;
originally announced February 2015.
-
Criticality of mostly informative samples: A Bayesian model selection approach
Authors:
Ariel Haimovici,
Matteo Marsili
Abstract:
We discuss a Bayesian model selection approach to high dimensional data in the deep under sampling regime. The data is based on a representation of the possible discrete states $s$, as defined by the observer, and it consists of $M$ observations of the state. This approach shows that, for a given sample size $M$, not all states observed in the sample can be distinguished. Rather, only a partition…
▽ More
We discuss a Bayesian model selection approach to high dimensional data in the deep under sampling regime. The data is based on a representation of the possible discrete states $s$, as defined by the observer, and it consists of $M$ observations of the state. This approach shows that, for a given sample size $M$, not all states observed in the sample can be distinguished. Rather, only a partition of the sampled states $s$ can be resolved. Such partition defines an {\em emergent} classification $q_s$ of the states that becomes finer and finer as the sample size increases, through a process of {\em symmetry breaking} between states. This allows us to distinguish between the $resolution$ of a given representation of the observer defined states $s$, which is given by the entropy of $s$, and its $relevance$ which is defined by the entropy of the partition $q_s$. Relevance has a non-monotonic dependence on resolution, for a given sample size. In addition, we characterise most relevant samples and we show that they exhibit power law frequency distributions, generally taken as signatures of "criticality". This suggests that "criticality" reflects the relevance of a given representation of the states of a complex system, and does not necessarily require a specific mechanism of self-organisation to a critical point.
△ Less
Submitted 2 September, 2015; v1 submitted 1 February, 2015;
originally announced February 2015.
-
Phenotypic constraints promote latent versatility and carbon efficiency in metabolic networks
Authors:
Marco Bardoscia,
Matteo Marsili,
Areejit Samal
Abstract:
System-level properties of metabolic networks may be the direct product of natural selection or arise as a by-product of selection on other properties. Here we study the effect of direct selective pressure for growth or viability in particular environments on two properties of metabolic networks: latent versatility to function in additional environments and carbon usage efficiency. Using a Markov…
▽ More
System-level properties of metabolic networks may be the direct product of natural selection or arise as a by-product of selection on other properties. Here we study the effect of direct selective pressure for growth or viability in particular environments on two properties of metabolic networks: latent versatility to function in additional environments and carbon usage efficiency. Using a Markov Chain Monte Carlo (MCMC) sampling based on Flux Balance Analysis (FBA), we sample from a known biochemical universe random viable metabolic networks that differ in the number of directly constrained environments. We find that the latent versatility of sampled metabolic networks increases with the number of directly constrained environments and with the size of the networks. We then show that the average carbon wastage of sampled metabolic networks across the constrained environments decreases with the number of directly constrained environments and with the size of the networks. Our work expands the growing body of evidence about nonadaptive origins of key functional properties of biological networks.
△ Less
Submitted 31 July, 2015; v1 submitted 20 August, 2014;
originally announced August 2014.
-
$L_p$ regularized portfolio optimization
Authors:
Fabio Caccioli,
Imre Kondor,
Matteo Marsili,
Susanne Still
Abstract:
Investors who optimize their portfolios under any of the coherent risk measures are naturally led to regularized portfolio optimization when they take into account the impact their trades make on the market. We show here that the impact function determines which regularizer is used. We also show that any regularizer based on the norm $L_p$ with $p>1$ makes the sensitivity of coherent risk measures…
▽ More
Investors who optimize their portfolios under any of the coherent risk measures are naturally led to regularized portfolio optimization when they take into account the impact their trades make on the market. We show here that the impact function determines which regularizer is used. We also show that any regularizer based on the norm $L_p$ with $p>1$ makes the sensitivity of coherent risk measures to estimation error disappear, while regularizers with $p<1$ do not. The $L_1$ norm represents a border case: its "soft" implementation does not remove the instability, but rather shifts its locus, whereas its "hard" implementation (equivalent to a ban on short selling) eliminates it. We demonstrate these effects on the important special case of Expected Shortfall (ES) that is on its way to becoming the next global regulatory market risk measure.
△ Less
Submitted 15 April, 2014;
originally announced April 2014.
-
Condensation phenomena in fat-tailed distributions: a characterization by means of an order parameter
Authors:
Mario Filiasi,
Elia Zarinelli,
Erik Vesselli,
Matteo Marsili
Abstract:
Condensation phenomena are ubiquitous in nature and are found in condensed matter, disordered systems, networks, finance, etc. In the present work we investigate one of the best frameworks in which condensation phenomena take place, namely, the sum of independent and fat-tailed distributed random variables. For large deviations of the sum, this system undergoes a phase transition and shifts from a…
▽ More
Condensation phenomena are ubiquitous in nature and are found in condensed matter, disordered systems, networks, finance, etc. In the present work we investigate one of the best frameworks in which condensation phenomena take place, namely, the sum of independent and fat-tailed distributed random variables. For large deviations of the sum, this system undergoes a phase transition and shifts from a democratic phase to a condensed phase, where a single variable (the condensate) carries a finite fraction of the sum. This phenomenon yields the failure of the standard results of the Large Deviation Theory. In this work we exploit the Density Functional Method to overcome the limitation of the Large Deviation Theory and characterize the condensation transition in terms of an order parameter, i.e. the Inverse Participation Ratio (IPR). This procedure leads us to investigate the system in the large-deviation regime where both the sum and the IPR are constrained, observing new phase transitions. As a sample application, the case of condensation phenomena in financial time-series is briefly discussed.
△ Less
Submitted 2 February, 2015; v1 submitted 30 September, 2013;
originally announced September 2013.
-
The Interrupted Power Law and The Size of Shadow Banking
Authors:
Davide Fiaschi,
Imre Kondor,
Matteo Marsili,
Valerio Volpati
Abstract:
Using public data (Forbes Global 2000) we show that the asset sizes for the largest global firms follow a Pareto distribution in an intermediate range, that is ``interrupted'' by a sharp cut-off in its upper tail, where it is totally dominated by financial firms. This flattening of the distribution contrasts with a large body of empirical literature which finds a Pareto distribution for firm sizes…
▽ More
Using public data (Forbes Global 2000) we show that the asset sizes for the largest global firms follow a Pareto distribution in an intermediate range, that is ``interrupted'' by a sharp cut-off in its upper tail, where it is totally dominated by financial firms. This flattening of the distribution contrasts with a large body of empirical literature which finds a Pareto distribution for firm sizes both across countries and over time. Pareto distributions are generally traced back to a mechanism of proportional random growth, based on a regime of constant returns to scale. This makes our findings of an ``interrupted'' Pareto distribution all the more puzzling, because we provide evidence that financial firms in our sample should operate in such a regime. We claim that the missing mass from the upper tail of the asset size distribution is a consequence of shadow banking activity and that it provides an (upper) estimate of the size of the shadow banking system. This estimate -- which we propose as a shadow banking index -- compares well with estimates of the Financial Stability Board until 2009, but it shows a sharper rise in shadow banking activity after 2010. Finally, we propose a proportional random growth model that reproduces the observed distribution, thereby providing a quantitative estimate of the intensity of shadow banking activity.
△ Less
Submitted 4 April, 2014; v1 submitted 9 September, 2013;
originally announced September 2013.