-
Snowmass2021 Cosmic Frontier White Paper: Rubin Observatory after LSST
Authors:
Bob Blum,
Seth W. Digel,
Alex Drlica-Wagner,
Salman Habib,
Katrin Heitmann,
Mustapha Ishak,
Saurabh W. Jha,
Steven M. Kahn,
Rachel Mandelbaum,
Phil Marshall,
Jeffrey A. Newman,
Aaron Roodman,
Christopher W. Stubbs
Abstract:
The Vera C. Rubin Observatory will begin the Legacy Survey of Space and Time (LSST) in 2024, spanning an area of 18,000 square degrees in six bands, with more than 800 observations of each field over ten years. The unprecedented data set will enable great advances in the study of the formation and evolution of structure and exploration of physics of the dark universe. The observations will hold cl…
▽ More
The Vera C. Rubin Observatory will begin the Legacy Survey of Space and Time (LSST) in 2024, spanning an area of 18,000 square degrees in six bands, with more than 800 observations of each field over ten years. The unprecedented data set will enable great advances in the study of the formation and evolution of structure and exploration of physics of the dark universe. The observations will hold clues about the cause for the accelerated expansion of the universe and possibly the nature of dark matter. During the next decade, LSST will be able to confirm or dispute if tensions seen today in cosmological data are due to new physics. New and unexpected phenomena could confirm or disrupt our current understanding of the universe. Findings from LSST will guide the path forward post-LSST. The Rubin Observatory will still be a uniquely powerful facility even then, capable of revealing further insights into the physics of the dark universe. These could be obtained via innovative observing strategies, e.g., targeting new probes at shorter timescales than with LSST, or via modest instrumental changes, e.g., new filters, or through an entirely new instrument for the focal plane. This White Paper highlights some of the opportunities in each scenario from Rubin observations after LSST.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Ultrafast sensing of photoconductivity decay using microwave resonators
Authors:
B. Gyüre-Garami,
B. Blum,
O. Sági,
A. Bojtor,
S. Kollarics,
G. Csősz,
B. G. Márkus,
J. Volk,
F. Simon
Abstract:
Microwave reflectance probed photoconductivity (or $μ$-PCD) measurement represents a contactless and non-invasive method to characterize impurity content in semiconductors. Major drawbacks of the method include a difficult separation of reflectance due to dielectric and conduction effects and that the $μ$-PCD signal is prohibitively weak for highly conducting samples. Both of these limitations cou…
▽ More
Microwave reflectance probed photoconductivity (or $μ$-PCD) measurement represents a contactless and non-invasive method to characterize impurity content in semiconductors. Major drawbacks of the method include a difficult separation of reflectance due to dielectric and conduction effects and that the $μ$-PCD signal is prohibitively weak for highly conducting samples. Both of these limitations could be tackled with the use of microwave resonators due to the well-known sensitivity of resonator parameters to minute changes in the material properties combined with a null measurement. A general misconception is that time resolution of resonator measurements is limited beyond their bandwidth by the readout electronics response time. While it is true for conventional resonator measurements, such as those employing a frequency sweep, we present a time-resolved resonator parameter readout method which overcomes these limitations and allows measurement of complex material parameters and to enhance $μ$-PCD signals with the ultimate time resolution limit being the resonator time constant. This is achieved by detecting the transient response of microwave resonators on the timescale of a few 100 ns \emph{during} the $μ$-PCD decay signal. The method employs a high-stability oscillator working with a fixed frequency which results in a stable and highly accurate measurement.
△ Less
Submitted 4 December, 2019; v1 submitted 30 September, 2019;
originally announced September 2019.
-
An Artificial Intelligence-Based System for Nutrient Intake Assessment of Hospitalised Patients
Authors:
Ya Lu,
Thomai Stathopoulou,
Maria F. Vasiloglou,
Stergios Christodoulidis,
Beat Blum,
Thomas Walser,
Vinzenz Meier,
Zeno Stanga,
Stavroula G. Mougiakakou
Abstract:
Regular nutrient intake monitoring in hospitalised patients plays a critical role in reducing the risk of disease-related malnutrition (DRM). Although several methods to estimate nutrient intake have been developed, there is still a clear demand for a more reliable and fully automated technique, as this could improve the data accuracy and reduce both the participant burden and the health costs. In…
▽ More
Regular nutrient intake monitoring in hospitalised patients plays a critical role in reducing the risk of disease-related malnutrition (DRM). Although several methods to estimate nutrient intake have been developed, there is still a clear demand for a more reliable and fully automated technique, as this could improve the data accuracy and reduce both the participant burden and the health costs. In this paper, we propose a novel system based on artificial intelligence to accurately estimate nutrient intake, by simply processing RGB depth image pairs captured before and after a meal consumption. For the development and evaluation of the system, a dedicated and new database of images and recipes of 322 meals was assembled, coupled to data annotation using innovative strategies. With this database, a system was developed that employed a novel multi-task neural network and an algorithm for 3D surface construction. This allowed sequential semantic food segmentation and estimation of the volume of the consumed food, and permitted fully automatic estimation of nutrient intake for each food type with a 15% estimation error.
△ Less
Submitted 12 June, 2019; v1 submitted 7 June, 2019;
originally announced June 2019.
-
Goodness-of-fit statistics for approximate Bayesian computation
Authors:
Louisiane Lemaire,
Flora Jay,
I-Hung Lee,
Katalin Csilléry,
Michael G. B. Blum
Abstract:
Approximate Bayesian computation is a statistical framework that uses numerical simulations to calibrate and compare models. Instead of computing likelihood functions, Approximate Bayesian computation relies on numerical simulations, which makes it applicable to complex models in ecology and evolution. As usual for statistical modeling, evaluating goodness-of-fit is a fundamental step for Approxim…
▽ More
Approximate Bayesian computation is a statistical framework that uses numerical simulations to calibrate and compare models. Instead of computing likelihood functions, Approximate Bayesian computation relies on numerical simulations, which makes it applicable to complex models in ecology and evolution. As usual for statistical modeling, evaluating goodness-of-fit is a fundamental step for Approximate Bayesian Computation. Here, we introduce a goodness-of-fit approach based on hypothesis-testing. We introduce two test statistics based on the mean distance between numerical summaries of the data and simulated ones. One test statistic relies on summaries simulated with the prior predictive distribution whereas the other one relies on simulations from the posterior predictive distribution. For different coalescent models, we find that the statistics are well calibrated, meaning that the type I error can be controlled. However, the statistical power of the two statistics is extremely variable across models ranging from 20% to 100%. The difference of power between the two statistics is negligible in models of demographic inference but substantial in an additional and purely statistical example. When analyzing resequencing data to evaluate models of human demography, the two statistics confirm that an out-of-Africa bottleneck cannot be rejected for Asiatic and European data. We also consider two speciation models in the context of a butterfly species complex. One goodness-of-fit statistic indicates a poor fit for both models, and the numerical summaries causing the poor fit were identified using posterior predictive checks. Statistical tests for goodness-of-fit should foster evaluation of model fit in Approximate Bayesian Computation. The test statistic based on simulations from the prior predictive distribution is implemented in the gfit function of the R abc package.
△ Less
Submitted 15 January, 2016;
originally announced January 2016.
-
Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes data
Authors:
Nicolas Duforet-Frebourg,
Keurcien Luu,
Guillaume Laval,
Eric Bazin,
Michael G. B. Blum
Abstract:
To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Considering the co…
▽ More
To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) after removal of recently admixed individuals resulting in 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3X). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and non-coding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult.
△ Less
Submitted 18 November, 2015; v1 submitted 8 April, 2015;
originally announced April 2015.
-
Genome scans for detecting footprints of local adaptation using a Bayesian factor model
Authors:
N. Duforet-Frebourg,
E. Bazin,
M. G. B. Blum
Abstract:
A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genoty** numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as $F_{ST}$. Howev…
▽ More
A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genoty** numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as $F_{ST}$. However they are important caveats with approaches related to $F_{ST}$ because they require grou** individuals into populations and they additionally assume a particular model of population structure. Here we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. As outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that the factor model can achieve a 2-fold or more reduction of false discovery rate compared to the software BayeScan or compared to a $F_{ST}$ approach. We analyze the data of the Human Genome Diversity Panel to provide an example of how factor models can be used to detect local adaptation with a large number of SNPs. The Bayesian factor model is implemented in the open-source PCAdapt software.
△ Less
Submitted 29 July, 2014; v1 submitted 21 February, 2014;
originally announced February 2014.
-
Effects of Fluctuating Energy Input on the Small Scales in Turbulence
Authors:
Chen-Chi Chien,
Daniel B. Blum,
Greg A. Voth
Abstract:
In the standard cascade picture of 3D turbulent fluid flows, energy is input at a constant rate at large scales. Energy is then transferred to smaller scales by an intermittent process that has been the focus of a vast literature. However, the energy input at large scales is not constant in most real turbulent flows. We explore the signatures of these fluctuations of large scale energy input on sm…
▽ More
In the standard cascade picture of 3D turbulent fluid flows, energy is input at a constant rate at large scales. Energy is then transferred to smaller scales by an intermittent process that has been the focus of a vast literature. However, the energy input at large scales is not constant in most real turbulent flows. We explore the signatures of these fluctuations of large scale energy input on small scale turbulence statistics. Measurements were made in a flow between oscillating grids, with Re up to 271, in which temporal variations in the large scale energy input can be introduced by modulating the oscillating grid frequency. We find that the Kolmogorov constant for second order longitudinal structure functions depends on the magnitude of the fluctuations in the large scale energy input. We can quantitatively predict the measured change with a model based on Kolmogorov's refined similarity theory. The effects of fluctuations of the energy input can also be observed using structure functions conditioned on the instantaneous large scale velocity. A linear parameterization using the curvature of the iconditional structure functions provides a fairly good match with the measured changes in the Kolmogorov constant. Conditional structure functions are found to provide a more sensitive measure of the presence of fluctuations in the large scale energy input than inertial range scaling coefficients.
△ Less
Submitted 29 June, 2013;
originally announced July 2013.
-
Diagnostic tools of approximate Bayesian computation using the coverage property
Authors:
D. Prangle,
M. G. B. Blum,
G. Popovic,
S. A. Sisson
Abstract:
Approximate Bayesian computation (ABC) is an approach for sampling from an approximate posterior distribution in the presence of a computationally intractable likelihood function. A common implementation is based on simulating model, parameter and dataset triples, (m,θ,y), from the prior, and then accepting as samples from the approximate posterior, those pairs (m,θ) for which y, or a summary of y…
▽ More
Approximate Bayesian computation (ABC) is an approach for sampling from an approximate posterior distribution in the presence of a computationally intractable likelihood function. A common implementation is based on simulating model, parameter and dataset triples, (m,θ,y), from the prior, and then accepting as samples from the approximate posterior, those pairs (m,θ) for which y, or a summary of y, is "close" to the observed data. Closeness is typically determined though a distance measure and a kernel scale parameter, ε. Appropriate choice of εis important to producing a good quality approximation. This paper proposes diagnostic tools for the choice of εbased on assessing the coverage property, which asserts that credible intervals have the correct coverage levels. We provide theoretical results on coverage for both model and parameter inference, and adapt these into diagnostics for the ABC context. We re-analyse a study on human demographic history to determine whether the adopted posterior approximation was appropriate. R code implementing the proposed methodology is freely available in the package "abc."
△ Less
Submitted 14 January, 2013;
originally announced January 2013.
-
Non-stationary patterns of isolation-by-distance: inferring measures of local genetic differentiation with Bayesian kriging
Authors:
Nicolas Duforet-Frebourg,
Michael G. B. Blum
Abstract:
Patterns of isolation-by-distance arise when population differentiation increases with increasing geographic distances. Patterns of isolation-by-distance are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density c…
▽ More
Patterns of isolation-by-distance arise when population differentiation increases with increasing geographic distances. Patterns of isolation-by-distance are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density can generate non-stationary patterns of isolation-by-distance where the rate at which genetic differentiation accumulates varies across space. To characterize non-stationary patterns of isolation-by-distance, we infer local genetic differentiation based on Bayesian kriging. Local genetic differentiation for a sampled population is defined as the average genetic differentiation between the sampled population and fictive neighboring populations. To avoid defining populations in advance, the method can also be applied at the scale of individuals making it relevant for landscape genetics. Inference of local genetic differentiation relies on a matrix of pairwise similarity or dissimilarity between populations or individuals such as matrices of FST between pairs of populations. Simulation studies show that maps of local genetic differentiation can reveal barriers to gene flow but also other patterns such as continuous variations of gene flow across habitat. The potential of the method is illustrated with 2 data sets: genome-wide SNP data for human Swedish populations and AFLP markers for alpine plant species. The software LocalDiff implementing the method is available at http://membres-timc.imag.fr/Michael.Blum/LocalDiff.html
△ Less
Submitted 7 January, 2014; v1 submitted 24 September, 2012;
originally announced September 2012.
-
The study of ground-level ozone in Kiev and its impact on public health
Authors:
A. V. Shavrina,
I. A. Mikulskaya,
S. I. Kiforenko,
V. A. Sheminova,
A. A. Veles,
O. B. Blum
Abstract:
Ground-level ozone in Kiev for an episode of its high concentration in August 2000 was simulated with the model of the urban air pollution UAM-V (Urban Airshed Model). The study of total ozone over Kiev and its concentration changes with height in the troposphere is made on the basis of ground-based observations with the infrared Fourier spectrometer at the Main Astronomical Observatory of Nationa…
▽ More
Ground-level ozone in Kiev for an episode of its high concentration in August 2000 was simulated with the model of the urban air pollution UAM-V (Urban Airshed Model). The study of total ozone over Kiev and its concentration changes with height in the troposphere is made on the basis of ground-based observations with the infrared Fourier spectrometer at the Main Astronomical Observatory of National Academy of Sciences of Ukraine as a part of the ESA-NIVR-KNMI no 2907. In 2008 the satellite Aura-OMI data OMO3PR on the atmosphere ozone profiles became available. Beginning in 2005, these data include the ozone concentration in the lower layer of the atmosphere and can be used for the evaluation of the ground-level ozone concentrations in all cities of Ukraine. Some statistical investigation of ozone air pollution in Kiev and medical statistics data on respiratory system was carried out with the application of the "Statistica" package. The regression analysis, prognostic regression simulation, and retrospective prognosis of the epidemiological situation with respect to respiratory system pathologies in Kiev during 2000-2007 were performed.
△ Less
Submitted 23 April, 2012; v1 submitted 9 April, 2012;
originally announced April 2012.
-
A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation
Authors:
M. G. B. Blum,
M. A. Nunes,
D. Prangle,
S. A. Sisson
Abstract:
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics fr…
▽ More
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.
△ Less
Submitted 11 June, 2013; v1 submitted 16 February, 2012;
originally announced February 2012.
-
A Continuation Method for Nash Equilibria in Structured Games
Authors:
B. Blum,
D. Koller,
C. R. Shelton
Abstract:
Structured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorith…
▽ More
Structured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorithms are derived from a continuation method for normal-form and extensive-form games due to Govindan and Wilson; they follow a trajectory through a space of perturbed games and their equilibria, exploiting game structure through fast computation of the Jacobian of the payoff function. They are theoretically guaranteed to find at least one equilibrium of the game, and may find more. Our approach provides the first efficient algorithm for computing exact equilibria in graphical games with arbitrary topology, and the first algorithm to exploit fine-grained structural properties of MAIDs. Experimental results are presented demonstrating the effectiveness of the algorithms and comparing them to predecessors. The running time of the graphical game algorithm is similar to, and often better than, the running time of previous approximate algorithms. The algorithm for MAIDs can effectively solve games that are much larger than those solvable by previous methods.
△ Less
Submitted 29 September, 2011;
originally announced October 2011.
-
Effects of non-universal large scales on conditional structure functions in turbulence
Authors:
Daniel B. Blum,
Surendra Kunwar,
James Johnson,
Greg A. Voth
Abstract:
We report measurements of conditional Eulerian and Lagrangian structure functions in order to assess the effects of non-universal properties of the large scales on the small scales in turbulence. We study a 1m $\times$ 1m $\times$ 1.5m flow between oscillating grids which produces $R_λ=285$ while containing regions of nearly homogeneous and highly inhomogeneous turbulence. Large data sets of thr…
▽ More
We report measurements of conditional Eulerian and Lagrangian structure functions in order to assess the effects of non-universal properties of the large scales on the small scales in turbulence. We study a 1m $\times$ 1m $\times$ 1.5m flow between oscillating grids which produces $R_λ=285$ while containing regions of nearly homogeneous and highly inhomogeneous turbulence. Large data sets of three-dimensional tracer particle velocities have been collected using stereoscopic high speed cameras with real-time image compression technology. Eulerian and Lagrangian structure functions are measured in both homogeneous and inhomogeneous regions of the flow. We condition the structure functions on the instantaneous large scale velocity or on the grid phase. At all scales, the structure functions depend strongly on the large scale velocity, but are independent of the grid phase. We see clear signatures of inhomogeneity near the oscillating grids, but even in the homogeneous region in the center we see a surprisingly strong dependence on the large scale velocity that remains at all scales. Previous work has shown that similar correlations extend to very high Reynolds numbers. Comprehensive measurements of these effects in a laboratory flow provide a powerful tool for assessing the effects of shear, inhomogeneity and intermittency of the large scales on the small scales in turbulence.
△ Less
Submitted 1 December, 2009; v1 submitted 5 August, 2009;
originally announced August 2009.
-
HIV with contact-tracing: a case study in Approximate Bayesian Computation
Authors:
Michael G. B. Blum,
Viet Chi Tran
Abstract:
Missing data is a recurrent issue in epidemiology where the infection process may be partially observed. Approximate Bayesian Computation, an alternative to data imputation methods such as Markov Chain Monte Carlo integration, is proposed for making inference in epidemiological models. It is a likelihood-free method that relies exclusively on numerical simulations. ABC consists in computing a dist…
▽ More
Missing data is a recurrent issue in epidemiology where the infection process may be partially observed. Approximate Bayesian Computation, an alternative to data imputation methods such as Markov Chain Monte Carlo integration, is proposed for making inference in epidemiological models. It is a likelihood-free method that relies exclusively on numerical simulations. ABC consists in computing a distance between simulated and observed summary statistics and weighting the simulations according to this distance. We propose an original extension of ABC to path-valued summary statistics, corresponding to the cumulated number of detections as a function of time. For a standard compartmental model with Suceptible, Infectious and Recovered individuals (SIR), we show that the posterior distributions obtained with ABC and MCMC are similar. In a refined SIR model well-suited to the HIV contact-tracing data in Cuba, we perform a comparison between ABC with full and binned detection times. For the Cuban data, we evaluate the efficiency of the detection system and predict the evolution of the HIV-AIDS disease. In particular, the percentage of undetected infectious individuals is found to be of the order of 40%.
△ Less
Submitted 31 May, 2010; v1 submitted 6 October, 2008;
originally announced October 2008.
-
Non-linear regression models for Approximate Bayesian Computation
Authors:
M. G. B. Blum,
O. Francois
Abstract:
Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior densi…
▽ More
Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.
△ Less
Submitted 23 February, 2009; v1 submitted 24 September, 2008;
originally announced September 2008.
-
The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance
Authors:
Michael G. B. Blum,
Olivier François,
Svante Janson
Abstract:
For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two…
▽ More
For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees.
△ Less
Submitted 14 February, 2007;
originally announced February 2007.
-
A mean-field analysis of community structure in social and kin networks
Authors:
E. Durand,
M. G. B Blum,
O. Francois
Abstract:
We provide a mean-field analysis of community structure of social and biological networks assuming that actors are able to evaluate some tree-derived distance to the other actors and tend to aggregate with the less distant. We show that such networks have small components, and give exact descriptions for the probability distribution of a typical community size and the number of communities. In p…
▽ More
We provide a mean-field analysis of community structure of social and biological networks assuming that actors are able to evaluate some tree-derived distance to the other actors and tend to aggregate with the less distant. We show that such networks have small components, and give exact descriptions for the probability distribution of a typical community size and the number of communities. In particular, we show that the probability distribution of the community size is well-approximated by a power-law distribution with exponent two. We illustrate the robustness of the mean-field analysis by comparing its predictions on previously studied social networks and biological data.
△ Less
Submitted 13 April, 2006;
originally announced April 2006.
-
The continuum limit in the quenched approximation
Authors:
C. Bernard T. Blum,
C. DeTar,
Steven Gottlieb,
Urs M. Heller,
J. Hetrick,
K. Rummukainen,
R. Sugar,
D. Toussaint,
M. Wingate
Abstract:
Previous work at $6/g^2=5.7$ with quenched staggered quarks is extended with new calculations at 5.85 and 6.15 on lattices up to $32^3\times 64$. These calculations allow a more detailed study of extrapolation in quark mass, finite volume and lattice spacing than has heretofore been possible. We discuss how closely the quenched spectrum approaches that of the real world.
Previous work at $6/g^2=5.7$ with quenched staggered quarks is extended with new calculations at 5.85 and 6.15 on lattices up to $32^3\times 64$. These calculations allow a more detailed study of extrapolation in quark mass, finite volume and lattice spacing than has heretofore been possible. We discuss how closely the quenched spectrum approaches that of the real world.
△ Less
Submitted 21 September, 1995;
originally announced September 1995.