Search | arXiv e-print repository

arXiv:1906.04548 [pdf, other]

Spring-Electrical Models For Link Prediction

Authors: Yana Kashinskaya, Egor Samosvat, Akmal Artikov

Abstract: We propose a link prediction algorithm that is based on spring-electrical models. The idea to study these models came from the fact that spring-electrical models have been successfully used for networks visualization. A good network visualization usually implies that nodes similar in terms of network topology, e.g., connected and/or belonging to one cluster, tend to be visualized close to each oth… ▽ More We propose a link prediction algorithm that is based on spring-electrical models. The idea to study these models came from the fact that spring-electrical models have been successfully used for networks visualization. A good network visualization usually implies that nodes similar in terms of network topology, e.g., connected and/or belonging to one cluster, tend to be visualized close to each other. Therefore, we assumed that the Euclidean distance between nodes in the obtained network layout correlates with a probability of a link between them. We evaluate the proposed method against several popular baselines and demonstrate its flexibility by applying it to undirected, directed and bipartite networks. △ Less

Submitted 24 May, 2019; originally announced June 2019.

Comments: Accepted to WSDM 2019

arXiv:1711.05828 [pdf, other]

BoostJet: Towards Combining Statistical Aggregates with Neural Embeddings for Recommendations

Authors: Rhicheek Patra, Egor Samosvat, Michael Roizner, Andrei Mishchenko

Abstract: Recommenders have become widely popular in recent years because of their broader applicability in many e-commerce applications. These applications rely on recommenders for generating advertisements for various offers or providing content recommendations. However, the quality of the generated recommendations depends on user features (like demography, temporality), offer features (like popularity, p… ▽ More Recommenders have become widely popular in recent years because of their broader applicability in many e-commerce applications. These applications rely on recommenders for generating advertisements for various offers or providing content recommendations. However, the quality of the generated recommendations depends on user features (like demography, temporality), offer features (like popularity, price), and user-offer features (like implicit or explicit feedback). Current state-of-the-art recommenders do not explore such diverse features concurrently while generating the recommendations. In this paper, we first introduce the notion of Trackers which enables us to capture the above-mentioned features and thus incorporate users' online behaviour through statistical aggregates of different features (demography, temporality, popularity, price). We also show how to capture offer-to-offer relations, based on their consumption sequence, leveraging neural embeddings for offers in our Offer2Vec algorithm. We then introduce BoostJet, a novel recommender which integrates the Trackers along with the neural embeddings using MatrixNet, an efficient distributed implementation of gradient boosted decision tree, to improve the recommendation quality significantly. We provide an in-depth evaluation of BoostJet on Yandex's dataset, collecting online behaviour from tens of millions of online users, to demonstrate the practicality of BoostJet in terms of recommendation quality as well as scalability. △ Less

Submitted 7 December, 2017; v1 submitted 15 November, 2017; originally announced November 2017.

Comments: 9 pages, 9 figures

arXiv:1706.05655 [pdf, other]

Preferential placement for community structure formation

Authors: Aleksandr Dorodnykh, Liudmila Ostroumova Prokhorenkova, Egor Samosvat

Abstract: Various models have been recently proposed to reflect and predict different properties of complex networks. However, the community structure, which is one of the most important properties, is not well studied and modeled. In this paper, we suggest a principle called "preferential placement", which allows to model a realistic clustering structure. We provide an extensive empirical analysis of the o… ▽ More Various models have been recently proposed to reflect and predict different properties of complex networks. However, the community structure, which is one of the most important properties, is not well studied and modeled. In this paper, we suggest a principle called "preferential placement", which allows to model a realistic clustering structure. We provide an extensive empirical analysis of the obtained structure as well as some theoretical results. △ Less

Submitted 3 February, 2019; v1 submitted 18 June, 2017; originally announced June 2017.

arXiv:1607.01742 [pdf, ps, other]

Generating maximally disassortative graphs with given degree distribution

Authors: Pim van der Hoorn, Liudmila Ostroumova Prokhorenkova, Egor Samosvat

Abstract: In this paper we consider the optimization problem of generating graphs with a prescribed degree distribution, such that the correlation between the degrees of connected nodes, as measured by Spearman's rho, is minimal. We provide an algorithm for solving this problem and obtain a complete characterization of the joint degree distribution in these maximally disassortative graphs, in terms of the s… ▽ More In this paper we consider the optimization problem of generating graphs with a prescribed degree distribution, such that the correlation between the degrees of connected nodes, as measured by Spearman's rho, is minimal. We provide an algorithm for solving this problem and obtain a complete characterization of the joint degree distribution in these maximally disassortative graphs, in terms of the size-biased degree distribution. As a result we get a lower bound for Spearman's rho on graphs with an arbitrary given degree distribution. We use this lower bound to show that for any fixed tail exponent, there exist scale-free degree sequences with this exponent such that the minimum value of Spearman's rho for all graphs with such degree sequences is arbitrary close to zero. This implies that specifying only the tail behavior of the degree distribution, as is often done in the analysis of complex networks, gives no guarantees for the minimum value of Spearman's rho. △ Less

Submitted 6 July, 2016; originally announced July 2016.

MSC Class: 05C80 (Primary) 62H20 (Secondary)

arXiv:1509.04733 [pdf, other]

Factorization threshold models for scale-free networks generation

Authors: Akmal Artikov, Aleksandr Dorodnykh, Yana Kashinskaya, Egor Samosvat

Abstract: Many real networks such as the World Wide Web, financial, biological, citation and social networks have a power-law degree distribution. Networks with this feature are also called scale-free. Several models for producing scale-free networks have been obtained by now and most of them are based on the preferential attachment approach. We will offer the model with another scale-free property explanat… ▽ More Many real networks such as the World Wide Web, financial, biological, citation and social networks have a power-law degree distribution. Networks with this feature are also called scale-free. Several models for producing scale-free networks have been obtained by now and most of them are based on the preferential attachment approach. We will offer the model with another scale-free property explanation. The main idea is to approximate the network's adjacency matrix by multiplication of the matrices $V$ and $V^T$, where $V$ is the matrix of vertices' latent features. This approach is called matrix factorization and is successfully used in the link prediction problem. To create a generative model of scale-free networks we will sample latent features $V$ from some probabilistic distribution and try to generate a network's adjacency matrix. Entries in the generated matrix are dot products of latent features which are real numbers. In order to create an adjacency matrix, we approximate entries with the Boolean domain $\{0, 1\}$. We have incorporated the threshold parameter $θ$ into the model for discretization of a dot product. Actually, we have been influenced by the geographical threshold models which were recently proven to have good results in a scale-free networks generation. The overview of our results is the following. First, we will describe our model formally. Second, we will tune the threshold $θ$ in order to generate sparse growing networks. Finally, we will show that our model produces scale-free networks with the fixed power-law exponent which equals two. In order to generate oriented networks with tunable power-law exponents and to obtain other model properties, we will offer different modifications of our model. Some of our results will be demonstrated using computer simulation. △ Less

Submitted 22 December, 2016; v1 submitted 15 September, 2015; originally announced September 2015.

arXiv:1410.1997 [pdf, other]

Global clustering coefficient in scale-free networks

Authors: Liudmila Ostroumova Prokhorenkova, Egor Samosvat

Abstract: In this paper, we analyze the behavior of the global clustering coefficient in scale free graphs. We are especially interested in the case of degree distribution with an infinite variance, since such degree distribution is usually observed in real-world networks of diverse nature. There are two common definitions of the clustering coefficient of a graph: global clustering and average local clust… ▽ More In this paper, we analyze the behavior of the global clustering coefficient in scale free graphs. We are especially interested in the case of degree distribution with an infinite variance, since such degree distribution is usually observed in real-world networks of diverse nature. There are two common definitions of the clustering coefficient of a graph: global clustering and average local clustering. It is widely believed that in real networks both clustering coefficients tend to some positive constant as the networks grow. There are several models for which the average local clustering coefficient tends to a positive constant. On the other hand, there are no models of scale-free networks with an infinite variance of degree distribution and with a constant global clustering. In this paper we prove that if the degree distribution obeys the power law with an infinite variance, then the global clustering coefficient tends to zero with high probability as the size of a graph grows. △ Less

Submitted 8 June, 2015; v1 submitted 8 October, 2014; originally announced October 2014.

arXiv:1406.4308 [pdf, ps, other]

Recency-based preferential attachment models

Authors: Liudmila Ostroumova Prokhorenkova, Egor Samosvat

Abstract: Preferential attachment models were shown to be very effective in predicting such important properties of real-world networks as the power-law degree distribution, small diameter, etc. Many different models are based on the idea of preferential attachment: LCD, Buckley-Osthus, Holme-Kim, fitness, random Apollonian network, and many others. Although preferential attachment models reflect some imp… ▽ More Preferential attachment models were shown to be very effective in predicting such important properties of real-world networks as the power-law degree distribution, small diameter, etc. Many different models are based on the idea of preferential attachment: LCD, Buckley-Osthus, Holme-Kim, fitness, random Apollonian network, and many others. Although preferential attachment models reflect some important properties of real-world networks, they do not allow to model the so-called recency property. Recency property reflects the fact that in many real networks vertices tend to connect to other vertices of similar age. This fact motivated us to introduce a new class of models - recency-based models. This class is a generalization of fitness models, which were suggested by Bianconi and Barabasi. Bianconi and Barabasi extended preferential attachment models with pages' inherent quality or fitness of vertices. When a new vertex is added to the graph, it is joined to some already existing vertices that are chosen with probabilities proportional to the product of their fitness and incoming degree. We generalize fitness models by adding a recency factor to the attractiveness function. This means that pages are gaining incoming links according to their attractiveness, which is determined by the incoming degree of the page (current popularity), its inherent quality (some page-specific constant) and age (new pages are gaining new links more rapidly). We analyze different properties of recency-based models. In particular, we show that some distributions of inherent quality lead to the power-law degree distribution. △ Less

Submitted 22 December, 2015; v1 submitted 17 June, 2014; originally announced June 2014.

MSC Class: 60C05 ACM Class: G.2.2

arXiv:1307.6080 [pdf, other]

Timely crawling of high-quality ephemeral new content

Authors: Damien Lefortier, Liudmila Ostroumova, Egor Samosvat, Pavel Serdyukov

Abstract: Nowadays, more and more people use the Web as their primary source of up-to-date information. In this context, fast crawling and indexing of newly created Web pages has become crucial for search engines, especially because user traffic to a significant fraction of these new pages (like news, blog and forum posts) grows really quickly right after they appear, but lasts only for several days. In t… ▽ More Nowadays, more and more people use the Web as their primary source of up-to-date information. In this context, fast crawling and indexing of newly created Web pages has become crucial for search engines, especially because user traffic to a significant fraction of these new pages (like news, blog and forum posts) grows really quickly right after they appear, but lasts only for several days. In this paper, we study the problem of timely finding and crawling of such ephemeral new pages (in terms of user interest). Traditional crawling policies do not give any particular priority to such pages and may thus crawl them not quickly enough, and even crawl already obsolete content. We thus propose a new metric, well thought out for this task, which takes into account the decrease of user interest for ephemeral pages over time. We show that most ephemeral new pages can be found at a relatively small set of content sources and present a procedure for finding such a set. Our idea is to periodically recrawl content sources and crawl newly created pages linked from them, focusing on high-quality (in terms of user interest) content. One of the main difficulties here is to divide resources between these two activities in an efficient way. We find the adaptive balance between crawls and recrawls by maximizing the proposed metric. Further, we incorporate search engine click logs to give our crawler an insight about the current user demands. Efficiency of our approach is finally demonstrated experimentally on real-world data. △ Less

Submitted 24 July, 2013; v1 submitted 23 July, 2013; originally announced July 2013.

arXiv:1209.4523 [pdf, other]

Evolution of the Media Web

Authors: Damien Lefortier, Liudmila Ostroumova, Egor Samosvat

Abstract: We present a detailed study of the part of the Web related to media content, i.e., the Media Web. Using publicly available data, we analyze the evolution of incoming and outgoing links from and to media pages. Based on our observations, we propose a new class of models for the appearance of new media content on the Web where different \textit{attractiveness} functions of nodes are possible includi… ▽ More We present a detailed study of the part of the Web related to media content, i.e., the Media Web. Using publicly available data, we analyze the evolution of incoming and outgoing links from and to media pages. Based on our observations, we propose a new class of models for the appearance of new media content on the Web where different \textit{attractiveness} functions of nodes are possible including ones taken from well-known preferential attachment and fitness models. We analyze these models theoretically and empirically and show which ones realistically predict both the incoming degree distribution and the so-called \textit{recency property} of the Media Web, something that existing models did not do well. Finally we compare these models by estimating the likelihood of the real-world link graph from our data set given each model and obtain that models we introduce are significantly more likely than previously proposed ones. One of the most surprising results is that in the Media Web the probability for a post to be cited is determined, most likely, by its quality rather than by its current popularity. △ Less

Submitted 1 August, 2013; v1 submitted 20 September, 2012; originally announced September 2012.

arXiv:1205.3015 [pdf, other]

Generalized preferential attachment: tunable power-law degree distribution and clustering coefficient

Authors: Liudmila Ostroumova, Alexander Ryabchenko, Egor Samosvat

Abstract: We propose a wide class of preferential attachment models of random graphs, generalizing previous approaches. Graphs described by these models obey the power-law degree distribution, with the exponent that can be controlled in the models. Moreover, clustering coefficient of these graphs can also be controlled. We propose a concrete flexible model from our class and provide an efficient algorithm f… ▽ More We propose a wide class of preferential attachment models of random graphs, generalizing previous approaches. Graphs described by these models obey the power-law degree distribution, with the exponent that can be controlled in the models. Moreover, clustering coefficient of these graphs can also be controlled. We propose a concrete flexible model from our class and provide an efficient algorithm for generating graphs in this model. All our theoretical results are demonstrated in practice on examples of graphs obtained using this algorithm. Moreover, observations of generated graphs lead to future questions and hypotheses not yet justified by theory. △ Less

Submitted 19 May, 2015; v1 submitted 14 May, 2012; originally announced May 2012.

Showing 1–10 of 10 results for author: Samosvat, E