Search | arXiv e-print repository

arXiv:2405.11950 [pdf, other]

WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles

Authors: Tabea M. G. Pakull, Hendrik Damm, Ahmad Idrissi-Yaghir, Henning Schäfer, Peter A. Horn, Christoph M. Friedrich

Abstract: This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization in the biomedical domain, aimed at making scientific publications accessible to non-specialists. Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries from complex scientific texts. The summarization perfor… ▽ More This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization in the biomedical domain, aimed at making scientific publications accessible to non-specialists. Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries from complex scientific texts. The summarization performance was enhanced through various approaches, including instruction tuning, few-shot learning, and prompt variations tailored to incorporate specific context information. The experiments demonstrated that fine-tuning generally led to the best performance across most evaluated metrics. Few-shot learning notably improved the models' ability to generate relevant and factually accurate texts, particularly when using a well-crafted prompt. Additionally, a Dynamic Expert Selection (DES) mechanism to optimize the selection of text outputs based on readability and factuality metrics was developed. Out of 54 participants, the WisPerMed team reached the 4th place, measured by readability, factuality, and relevance. Determined by the overall score, our approach improved upon the baseline by approx. 5.5 percentage points and was only approx 1.5 percentage points behind the first place. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 4 pages, 6 figure, 3 tables, submitted to: BIONLP 2024 and Shared Tasks @ ACL 2024

arXiv:2405.10004 [pdf, other]

doi 10.1038/s41597-024-03496-6

ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset

Authors: Johannes Rückert, Louise Bloch, Raphael Brüngel, Ahmad Idrissi-Yaghir, Henning Schäfer, Cynthia S. Schmidt, Sven Koitka, Obioma Pelka, Asma Ben Abacha, Alba G. Seco de Herrera, Henning Müller, Peter A. Horn, Felix Nensa, Christoph M. Friedrich

Abstract: Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated versio… ▽ More Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning. △ Less

Submitted 18 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted for Scientific Data

arXiv:2404.05694 [pdf, other]

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Authors: Ahmad Idrissi-Yaghir, Amin Dada, Henning Schäfer, Kamyar Arzideh, Giulia Baldini, Jan Trienes, Max Hasin, Jeanette Bewersdorff, Cynthia S. Schmidt, Marie Bauer, Kaleb E. Smith, Jiang Bian, Yonghui Wu, Jörg Schlötterer, Torsten Zesch, Peter A. Horn, Christin Seifert, Felix Nensa, Jens Kleesiek, Christoph M. Friedrich

Abstract: Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are commo… ▽ More Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks. △ Less

Submitted 8 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2310.20398 [pdf, other]

doi 10.1016/j.jcp.2023.112596

A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks

Authors: Veronica Saz Ulibarrena, Philipp Horn, Simon Portegies Zwart, Elena Sellentin, Barry Koren, Maxwell X. Cai

Abstract: Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the l… ▽ More Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the last few years, although few attempts have been made to use them to speed up the simulation of the motion of celestial bodies. We study the advantages and limitations of using Hamiltonian Neural Networks to replace computationally expensive parts of the numerical simulation. We compare the results of the numerical integration of a planetary system with asteroids with those obtained by a Hamiltonian Neural Network and a conventional Deep Neural Network, with special attention to understanding the challenges of this problem. Due to the non-linear nature of the gravitational equations of motion, errors in the integration propagate. To increase the robustness of a method that uses neural networks, we propose a hybrid integrator that evaluates the prediction of the network and replaces it with the numerical solution if considered inaccurate. Hamiltonian Neural Networks can make predictions that resemble the behavior of symplectic integrators but are challenging to train and in our case fail when the inputs differ ~7 orders of magnitude. In contrast, Deep Neural Networks are easy to train but fail to conserve energy, leading to fast divergence from the reference solution. The hybrid integrator designed to include the neural networks increases the reliability of the method and prevents large energy errors without increasing the computing cost significantly. For this problem, the use of neural networks results in faster simulations when the number of asteroids is >70. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted for publication in the Journal of Computational Physics

arXiv:2303.01002 [pdf, other]

doi 10.1103/PhysRevE.108.054310

Nearest-neighbour directed random hyperbolic graphs

Authors: I. A. Kasyanov, P. van der Hoorn, D. Krioukov, M. V. Tamm

Abstract: Undirected hyperbolic graph models have been extensively used as models of scale-free small-world networks with high clustering coefficient. Here we presented a simple directed hyperbolic model, where nodes randomly distributed on a hyperbolic disk are connected to a fixed number m of their nearest spatial neighbours. We introduce also a canonical version of this network (which we call "network wi… ▽ More Undirected hyperbolic graph models have been extensively used as models of scale-free small-world networks with high clustering coefficient. Here we presented a simple directed hyperbolic model, where nodes randomly distributed on a hyperbolic disk are connected to a fixed number m of their nearest spatial neighbours. We introduce also a canonical version of this network (which we call "network with varied connection radius"), where maximal length of outgoing bond is space-dependent and is determined by fixing the average out-degree to m. We study local bond length, in-degree and reciprocity in these networks as a function of spatial coordinates of the nodes, and show that the network has a distinct core-periphery structure. We show that for small densities of nodes the overall in-degree has a truncated power law distribution. We demonstrate that reciprocity of the network can be regulated by adjusting an additional temperature-like parameter without changing other global properties of the network. △ Less

Submitted 26 November, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 26 papers, 12 figures

Journal ref: Phys.Rev. E 108, 054310 (2023)

arXiv:2209.13311 [pdf, other]

Optimal Placement of Base Stations in Border Surveillance using Limited Capacity Drones

Authors: S. Bereg, J. M. Díaz-Báñez, M. Haghpanah, P. Horn, M. A. Lopez, N. Marín, A. Ramírez-Vigueras, F. Rodríguez, O. Solé-Pi, A. Stevens, J. Urrutia

Abstract: Imagine an island modeled as a simple polygon $¶$ with $n$ vertices whose coastline we wish to monitor. We consider the problem of building the minimum number of refueling stations along the boundary of $¶$ in such a way that a drone can follow a polygonal route enclosing the island without running out of fuel. A drone can fly a maximum distance $d$ between consecutive stations and is restricted t… ▽ More Imagine an island modeled as a simple polygon $¶$ with $n$ vertices whose coastline we wish to monitor. We consider the problem of building the minimum number of refueling stations along the boundary of $¶$ in such a way that a drone can follow a polygonal route enclosing the island without running out of fuel. A drone can fly a maximum distance $d$ between consecutive stations and is restricted to move either along the boundary of $¶$ or its exterior (i.e., over water). We present an algorithm that, given $\mathcal P$, finds the locations for a set of refueling stations whose cardinality is at most the optimal plus one. The time complexity of this algorithm is $O(n^2 + \frac{L}{d} n)$, where $L$ is the length of $\mathcal P$. We also present an algorithm that returns an additive $ε$-approximation for the problem of minimizing the fuel capacity required for the drones when we are allowed to place $k$ base stations around the boundary of the island; this algorithm also finds the locations of these refueling stations. Finally, we propose a practical discretization heuristic which, under certain conditions, can be used to certify optimality of the results. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: 26 pages

arXiv:2208.12864 [pdf, ps, other]

Ortho-unit polygons can be guarded with at most $\lfloor \frac{n-4}{8} \rfloor$ guards

Authors: J. M. Díaz-Báñez, P. Horn, M. A. Lopez, N. Marín, A. Ramírez-Vigueras, O. Solé-Pi, A. Stevens, J. Urrutia

Abstract: An orthogonal polygon is called an ortho-unit polygon if its vertices have integer coordinates, and all of its edges have length one. In this paper we prove that any ortho-unit polygon with $n \geq 12$ vertices can be guarded with at most $\lfloor \frac{n-4}{8} \rfloor$ guards. An orthogonal polygon is called an ortho-unit polygon if its vertices have integer coordinates, and all of its edges have length one. In this paper we prove that any ortho-unit polygon with $n \geq 12$ vertices can be guarded with at most $\lfloor \frac{n-4}{8} \rfloor$ guards. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 9 pages, 8 figures

MSC Class: 68 ACM Class: F.2.2

arXiv:2204.08508 [pdf, other]

doi 10.1103/PhysRevE.106.054308

Entropy of labeled versus unlabeled networks

Authors: Jeremy Paton, Harrison Hartle, Huck Stepanyants, Pim van der Hoorn, Dmitri Krioukov

Abstract: The structure of a network is an unlabeled graph, yet graphs in most models of complex networks are labeled by meaningless random integers. Is the associated labeling noise always negligible, or can it overpower the network-structural signal? To address this question, we introduce and consider the sparse unlabeled versions of popular network models, and compare their entropy against the original l… ▽ More The structure of a network is an unlabeled graph, yet graphs in most models of complex networks are labeled by meaningless random integers. Is the associated labeling noise always negligible, or can it overpower the network-structural signal? To address this question, we introduce and consider the sparse unlabeled versions of popular network models, and compare their entropy against the original labeled versions. We show that labeled and unlabeled Erdos-Renyi graphs are entropically equivalent, even though their degree distributions are very different. The labeled and unlabeled versions of the configuration model may have different prefactors in their leading entropy terms, although this remains conjectural. Our main results are upper and lower bounds for the entropy of labeled and unlabeled one-dimensional random geometric graphs. We show that their unlabeled entropy is negligible in comparison with the labeled entropy. This means that in sparse networks the entropy of meaningless labeling may dominate the entropy of the network structure. The main implication of this result is that the common practice of using exchangeable models to reason about real-world networks with distinguishable nodes may introduce uncontrolled aberrations into conclusions made about these networks, suggesting a need for a thorough reexamination of the statistical foundations and key results of network science. △ Less

Submitted 18 November, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

Journal ref: Phys. Rev. E 106, 054308 (2022)

arXiv:2008.01209 [pdf, other]

doi 10.1103/PhysRevResearch.3.013211

Ollivier-Ricci curvature convergence in random geometric graphs

Authors: Pim van der Hoorn, William J. Cunningham, Gabor Lippner, Carlo Trugenberger, Dmitri Krioukov

Abstract: Connections between continuous and discrete worlds tend to be elusive. One example is curvature. Even though there exist numerous nonequivalent definitions of graph curvature, none is known to converge in any limit to any traditional definition of curvature of a Riemannian manifold. Here we show that Ollivier curvature of random geometric graphs in any Riemannian manifold converges in the continuu… ▽ More Connections between continuous and discrete worlds tend to be elusive. One example is curvature. Even though there exist numerous nonequivalent definitions of graph curvature, none is known to converge in any limit to any traditional definition of curvature of a Riemannian manifold. Here we show that Ollivier curvature of random geometric graphs in any Riemannian manifold converges in the continuum limit to Ricci curvature of the underlying manifold, but only if the definition of Ollivier graph curvature is properly generalized to apply to mesoscopic graph neighborhoods. This result establishes the first rigorous link between a definition of curvature applicable to networks and a traditional definition of curvature of smooth spaces. △ Less

Submitted 9 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Journal ref: Phys. Rev. Research 3, 013211 (2021)

arXiv:2007.00124 [pdf, other]

doi 10.1103/PhysRevResearch.2.043157

Weighted hypersoft configuration model

Authors: Ivan Voitalov, Pim van der Hoorn, Maksim Kitsak, Fragkiskos Papadopoulos, Dmitri Krioukov

Abstract: Maximum entropy null models of networks come in different flavors that depend on the type of constraints under which entropy is maximized. If the constraints are on degree sequences or distributions, we are dealing with configuration models. If the degree sequence is constrained exactly, the corresponding microcanonical ensemble of random graphs with a given degree sequence is the configuration mo… ▽ More Maximum entropy null models of networks come in different flavors that depend on the type of constraints under which entropy is maximized. If the constraints are on degree sequences or distributions, we are dealing with configuration models. If the degree sequence is constrained exactly, the corresponding microcanonical ensemble of random graphs with a given degree sequence is the configuration model per se. If the degree sequence is constrained only on average, the corresponding grand-canonical ensemble of random graphs with a given expected degree sequence is the soft configuration model. If the degree sequence is not fixed at all but randomly drawn from a fixed distribution, the corresponding hypercanonical ensemble of random graphs with a given degree distribution is the hypersoft configuration model, a more adequate description of dynamic real-world networks in which degree sequences are never fixed but degree distributions often stay stable. Here, we introduce the hypersoft configuration model of weighted networks. The main contribution is a particular version of the model with power-law degree and strength distributions, and superlinear scaling of strengths with degrees, mimicking the properties of some real-world networks. As a byproduct, we generalize the notions of sparse graphons and their entropy to weighted networks. △ Less

Submitted 29 October, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

Comments: 26 pages, 10 figures

Journal ref: Phys. Rev. Research 2, 043157 (2020)

arXiv:2004.10917 [pdf, other]

doi 10.1016/j.dam.2021.09.021

Flexibility of Planar Graphs -- Sharpening the Tools to Get Lists of Size Four

Authors: Ilkyoo Choi, Felix Christian Clemen, Michael Ferrara, Paul Horn, Fuhong Ma, Tomáš Masařík

Abstract: A graph where each vertex $v$ has a list $L(v)$ of available colors is $L$-colorable if there is a proper coloring such that the color of $v$ is in $L(v)$ for each $v$. A graph is $k$-choosable if every assignment $L$ of at least $k$ colors to each vertex guarantees an $L$-coloring. Given a list assignment $L$, an $L$-request for a vertex $v$ is a color $c\in L(v)$. In this paper, we look at a var… ▽ More A graph where each vertex $v$ has a list $L(v)$ of available colors is $L$-colorable if there is a proper coloring such that the color of $v$ is in $L(v)$ for each $v$. A graph is $k$-choosable if every assignment $L$ of at least $k$ colors to each vertex guarantees an $L$-coloring. Given a list assignment $L$, an $L$-request for a vertex $v$ is a color $c\in L(v)$. In this paper, we look at a variant of the widely studied class of precoloring extension problems from [Z. Dvořák, S. Norin, and L. Postle: List coloring with requests. J. Graph Theory 2019], wherein one must satisfy "enough", as opposed to all, of the requested set of precolors. A graph $G$ is $\varepsilon$-flexible for list size $k$ if for any $k$-list assignment $L$, and any set $S$ of $L$-requests, there is an $L$-coloring of $G$ satisfying an $\varepsilon$-fraction of the requests in $S$. It is conjectured that planar graphs are $\varepsilon$-flexible for list size $5$, yet it is proved only for list size $6$ and for certain subclasses of planar graphs. We give a stronger version of the main tool used in the proofs of the aforementioned results. By doing so, we improve upon a result by Masařík and show that planar graphs without $K_4^-$ are $\varepsilon$-flexible for list size $5$. We also prove that planar graphs without $4$-cycles and $3$-cycle distance at least 2 are $\varepsilon$-flexible for list size $4$. Finally, we introduce a new (slightly weaker) form of $\varepsilon$-flexibility where each vertex has exactly one request. In that setting, we provide a stronger tool and we demonstrate its usefulness to further extend the class of graphs that are $\varepsilon$-flexible for list size $5$. △ Less

Submitted 7 July, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: 18 pages, 4 figures

MSC Class: 05C15

Journal ref: Discrete Applied Mathematics 306 (2022) 120-132

arXiv:2003.14012 [pdf, other]

Problems with classification, hypothesis testing, and estimator convergence in the analysis of degree distributions in networks

Authors: Pim van der Hoorn, Ivan Voitalov, Remco van der Hofstad, Dmitri Krioukov

Abstract: In their recent work "Scale-free networks are rare", Broido and Clauset address the problem of the analysis of degree distributions in networks to classify them as scale-free at different strengths of "scale-freeness." Over the last two decades, a multitude of papers in network science have reported that the degree distributions in many real-world networks follow power laws. Such networks were the… ▽ More In their recent work "Scale-free networks are rare", Broido and Clauset address the problem of the analysis of degree distributions in networks to classify them as scale-free at different strengths of "scale-freeness." Over the last two decades, a multitude of papers in network science have reported that the degree distributions in many real-world networks follow power laws. Such networks were then referred to as scale-free. However, due to a lack of a precise definition, the term has evolved to mean a range of different things, leading to confusion and contradictory claims regarding scale-freeness of a given network. Recognizing this problem, the authors of "Scale-free networks are rare" try to fix it. They attempt to develop a versatile and statistically principled approach to remove this scale-free ambiguity accumulated in network science literature. Although their paper presents a fair attempt to address this fundamental problem, we must bring attention to some important issues in it. △ Less

Submitted 31 March, 2020; originally announced March 2020.

arXiv:1811.02071 [pdf, other]

doi 10.1103/PhysRevResearch.1.033034

Scale-free Networks Well Done

Authors: Ivan Voitalov, Pim van der Hoorn, Remco van der Hofstad, Dmitri Krioukov

Abstract: We bring rigor to the vibrant activity of detecting power laws in empirical degree distributions in real-world networks. We first provide a rigorous definition of power-law distributions, equivalent to the definition of regularly varying distributions that are widely used in statistics and other fields. This definition allows the distribution to deviate from a pure power law arbitrarily but withou… ▽ More We bring rigor to the vibrant activity of detecting power laws in empirical degree distributions in real-world networks. We first provide a rigorous definition of power-law distributions, equivalent to the definition of regularly varying distributions that are widely used in statistics and other fields. This definition allows the distribution to deviate from a pure power law arbitrarily but without affecting the power-law tail exponent. We then identify three estimators of these exponents that are proven to be statistically consistent -- that is, converging to the true value of the exponent for any regularly varying distribution -- and that satisfy some additional niceness requirements. In contrast to estimators that are currently popular in network science, the estimators considered here are based on fundamental results in extreme value theory, and so are the proofs of their consistency. Finally, we apply these estimators to a representative collection of synthetic and real-world data. According to their estimates, real-world scale-free networks are definitely not as rare as one would conclude based on the popular but unrealistic assumption that real-world data comes from power laws of pristine purity, void of noise and deviations. △ Less

Submitted 22 October, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Journal ref: Phys. Rev. Research 1, 033034 (2019)

arXiv:1810.06055 [pdf, other]

A Simple Change Comparison Method for Image Sequences Based on Uncertainty Coefficient

Authors: Ruzhang Zhao, Yajun Fang, Berthold K. P. Horn

Abstract: For identification of change information in image sequences, most studies focus on change detection in one image sequence, while few studies have considered the change level comparison between two different image sequences. Moreover, most studies require the detection of image information in details, for example, object detection. Based on Uncertainty Coefficient(UC), this paper proposes an innova… ▽ More For identification of change information in image sequences, most studies focus on change detection in one image sequence, while few studies have considered the change level comparison between two different image sequences. Moreover, most studies require the detection of image information in details, for example, object detection. Based on Uncertainty Coefficient(UC), this paper proposes an innovative method CCUC for change comparison between two image sequences. The proposed method is computationally efficient and simple to implement. The change comparison stems from video monitoring system. The limited number of provided screens and a large number of monitoring cameras require the videos or image sequences ordered by change level. We demonstrate this new method by applying it on two publicly available image sequences. The results are able to show the method can distinguish the different change level for sequences. △ Less

Submitted 14 October, 2018; originally announced October 2018.

Comments: 5 pages, 5 figures, 2 tables, accepted as a conference paper at IEEE UV 2018, Boston, USA

arXiv:1807.05044 [pdf, other]

doi 10.1137/18M1201019

Random Walks on Simplicial Complexes and the normalized Hodge 1-Laplacian

Authors: Michael T. Schaub, Austin R. Benson, Paul Horn, Gabor Lippner, Ali Jadbabaie

Abstract: Focusing on coupling between edges, we generalize the relationship between the normalized graph Laplacian and random walks on graphs by devising an appropriate normalization for the Hodge Laplacian -- the generalization of the graph Laplacian for simplicial complexes -- and relate this to a random walk on edges. Importantly, these random walks are intimately connected to the topology of the simpli… ▽ More Focusing on coupling between edges, we generalize the relationship between the normalized graph Laplacian and random walks on graphs by devising an appropriate normalization for the Hodge Laplacian -- the generalization of the graph Laplacian for simplicial complexes -- and relate this to a random walk on edges. Importantly, these random walks are intimately connected to the topology of the simplicial complex, just as random walks on graphs are related to the topology of the graph. This serves as a foundational step towards incorporating Laplacian-based analytics for higher-order interactions. We demonstrate how to use these dynamics for data analytics that extract information about the edge-space of a simplicial complex that complements and extends graph-based analysis. Specifically, we use our normalized Hodge Laplacian to derive spectral embeddings for examining trajectory data of ocean drifters near Madagascar and also develop a generalization of personalized PageRank for the edge-space of simplicial complexes to analyze a book co-purchasing dataset. △ Less

Submitted 6 November, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

Comments: 38 pages, 11 figures, 1 table (abstract above shortened); to appear in SIAM Review, June 2020

Journal ref: SIAM Review 2020 62:2, 353-391

arXiv:1802.05623 [pdf, ps, other]

An $O(1)$-Approximation Algorithm for Dynamic Weighted Vertex Cover with Soft Capacity

Authors: Hao-Ting Wei, Wing-Kai Hon, Paul Horn, Chung-Shou Liao, Kunihiko Sadakane

Abstract: This study considers the (soft) capacitated vertex cover problem in a dynamic setting. This problem generalizes the dynamic model of the vertex cover problem, which has been intensively studied in recent years. Given a dynamically changing vertex-weighted graph $G=(V,E)$, which allows edge insertions and edge deletions, the goal is to design a data structure that maintains an approximate minimum v… ▽ More This study considers the (soft) capacitated vertex cover problem in a dynamic setting. This problem generalizes the dynamic model of the vertex cover problem, which has been intensively studied in recent years. Given a dynamically changing vertex-weighted graph $G=(V,E)$, which allows edge insertions and edge deletions, the goal is to design a data structure that maintains an approximate minimum vertex cover while satisfying the capacity constraint of each vertex. That is, when picking a copy of a vertex $v$ in the cover, the number of $v$'s incident edges covered by the copy is up to a given capacity of $v$. We extend Bhattacharya et al.'s work [SODA'15 and ICALP'15] to obtain a deterministic primal-dual algorithm for maintaining a constant-factor approximate minimum capacitated vertex cover with $O(\log n / ε)$ amortized update time, where $n$ is the number of vertices in the graph. The algorithm can be extended to (1) a more general model in which each edge is associated with a nonuniform and unsplittable demand, and (2) the more general capacitated set cover problem. △ Less

Submitted 20 February, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

arXiv:1705.10261 [pdf, other]

doi 10.1007/s10955-017-1887-7

Sparse Maximum-Entropy Random Graphs with a Given Power-Law Degree Distribution

Authors: Pim van der Hoorn, Gabor Lippner, Dmitri Krioukov

Abstract: Even though power-law or close-to-power-law degree distributions are ubiquitously observed in a great variety of large real networks, the mathematically satisfactory treatment of random power-law graphs satisfying basic statistical requirements of realism is still lacking. These requirements are: sparsity, exchangeability, projectivity, and unbiasedness. The last requirement states that entropy of… ▽ More Even though power-law or close-to-power-law degree distributions are ubiquitously observed in a great variety of large real networks, the mathematically satisfactory treatment of random power-law graphs satisfying basic statistical requirements of realism is still lacking. These requirements are: sparsity, exchangeability, projectivity, and unbiasedness. The last requirement states that entropy of the graph ensemble must be maximized under the degree distribution constraints. Here we prove that the hypersoft configuration model (HSCM), belonging to the class of random graphs with latent hyperparameters, also known as inhomogeneous random graphs or $W$-random graphs, is an ensemble of random power-law graphs that are sparse, unbiased, and either exchangeable or projective. The proof of their unbiasedness relies on generalized graphons, and on map** the problem of maximization of the normalized Gibbs entropy of a random graph ensemble, to the graphon entropy maximization problem, showing that the two entropies converge to each other in the large-graph limit. △ Less

Submitted 10 October, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

MSC Class: 05C80 (Primary) 05C82; 54C70 (Secondary)

Journal ref: Journal of Statistical Physics, v.173(3), p.806-844, 2018

arXiv:1504.01535 [pdf, ps, other]

doi 10.1103/PhysRevE.92.022803

Phase transitions for scaling of structural correlations in directed networks

Authors: Pim van der Hoorn, Nelly Litvak

Abstract: Analysis of degree-degree dependencies in complex networks, and their impact on processes on networks requires null models, i.e. models that generate uncorrelated scale-free networks. Most models to date however show structural negative dependencies, caused by finite size effects. We analyze the behavior of these structural negative degree-degree dependencies, using rank based correlation measures… ▽ More Analysis of degree-degree dependencies in complex networks, and their impact on processes on networks requires null models, i.e. models that generate uncorrelated scale-free networks. Most models to date however show structural negative dependencies, caused by finite size effects. We analyze the behavior of these structural negative degree-degree dependencies, using rank based correlation measures, in the directed Erased Configuration Model. We obtain expressions for the scaling as a function of the exponents of the distributions. Moreover, we show that this scaling undergoes a phase transition, where one region exhibits scaling related to the natural cut-off of the network while another region has scaling similar to the structural cut-off for uncorrelated networks. By establishing the speed of convergence of these structural dependencies we are able to asses statistical significance of degree-degree dependencies on finite complex networks when compared to networks generated by the directed Erased Configuration Model. △ Less

Submitted 1 July, 2015; v1 submitted 7 April, 2015; originally announced April 2015.

MSC Class: 62H20; 05C80

Journal ref: Phys. Rev. E 92, 022803 (2015)

arXiv:1308.3388 [pdf, ps, other]

Models of on-line social networks

Authors: Anthony Bonato, Noor Hadi, Paul Horn, Pawel Pralat, Chang** Wang

Abstract: We present a deterministic model for on-line social networks (OSNs) based on transitivity and local knowledge in social interactions. In the Iterated Local Transitivity (ILT) model, at each time-step and for every existing node $x$, a new node appears which joins to the closed neighbour set of $x.$ The ILT model provably satisfies a number of both local and global properties that were observed in… ▽ More We present a deterministic model for on-line social networks (OSNs) based on transitivity and local knowledge in social interactions. In the Iterated Local Transitivity (ILT) model, at each time-step and for every existing node $x$, a new node appears which joins to the closed neighbour set of $x.$ The ILT model provably satisfies a number of both local and global properties that were observed in OSNs and other real-world complex networks, such as a densification power law, decreasing average distance, and higher clustering than in random graphs with the same average degree. Experimental studies of social networks demonstrate poor expansion properties as a consequence of the existence of communities with low number of inter-community edges. Bounds on the spectral gap for both the adjacency and normalized Laplacian matrices are proved for graphs arising from the ILT model, indicating such bad expansion properties. The cop and domination number are shown to remain the same as the graph from the initial time-step $G_0$, and the automorphism group of $G_0$ is a subgroup of the automorphism group of graphs generated at all later time-steps. A randomized version of the ILT model is presented, which exhibits a tuneable densification power law exponent, and maintains several properties of the deterministic model. △ Less

Submitted 15 August, 2013; originally announced August 2013.

arXiv:1210.4595 [pdf, other]

Two Layer 3D Floor Planning

Authors: Paul Horn, Gabor Lippner

Abstract: A 3D floor plan is a non-overlap** arrangement of blocks within a large box. Floor planning is a central notion in chip-design, and with recent advances in 3D integrated circuits, understanding 3D floor plans has become important. In this paper, we study so called mosaic 3D floor plans where the interior blocks partition the host box under a topological equivalence. We give representations which… ▽ More A 3D floor plan is a non-overlap** arrangement of blocks within a large box. Floor planning is a central notion in chip-design, and with recent advances in 3D integrated circuits, understanding 3D floor plans has become important. In this paper, we study so called mosaic 3D floor plans where the interior blocks partition the host box under a topological equivalence. We give representations which give an upper bound on the number of general 3D floor plans, and further consider the number of two layer mosaic floorplans. We prove that the number of two layer mosaic floor plans is $n^{(1+o(1))n/3}$. This contrasts with previous work which has studied `corner free' mosaic floor plans, where the number is just exponential. The upper bound is by giving a representation, while the lower bound is a randomized construction. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: 14 pages, 10 figures

MSC Class: 05A16; 52C45

arXiv:1209.2088 [pdf, ps, other]

Spreading Processes and Large Components in Ordered, Directed Random Graphs

Authors: Paul Horn, Malik Magdon-Ismail

Abstract: Order the vertices of a directed random graph \math{v_1,...,v_n}; edge \math{(v_i,v_j)} for \math{i<j} exists independently with probability \math{p}. This random graph model is related to certain spreading processes on networks. We consider the component reachable from \math{v_1} and prove existence of a sharp threshold \math{p^*=\log n/n} at which this reachable component transitions from \math{… ▽ More Order the vertices of a directed random graph \math{v_1,...,v_n}; edge \math{(v_i,v_j)} for \math{i<j} exists independently with probability \math{p}. This random graph model is related to certain spreading processes on networks. We consider the component reachable from \math{v_1} and prove existence of a sharp threshold \math{p^*=\log n/n} at which this reachable component transitions from \math{o(n)} to \math{Ω(n)}. △ Less

Submitted 11 September, 2012; v1 submitted 10 September, 2012; originally announced September 2012.

Comments: Working paper, under review

Showing 1–21 of 21 results for author: Horn, P