-
Where Do You Want To Invest? Predicting Startup Funding From Freely, Publicly Available Web Information
Authors:
Mariia Garkavenko,
Eric Gaussier,
Hamid Mirisaee,
Cédric Lagnier,
Agnès Guerraz
Abstract:
We consider in this paper the problem of predicting the ability of a startup to attract investments using freely, publicly available data. Information about startups on the web usually comes either as unstructured data from news, social networks, and websites or as structured data from commercial databases, such as Crunchbase. The possibility of predicting the success of a startup from structured…
▽ More
We consider in this paper the problem of predicting the ability of a startup to attract investments using freely, publicly available data. Information about startups on the web usually comes either as unstructured data from news, social networks, and websites or as structured data from commercial databases, such as Crunchbase. The possibility of predicting the success of a startup from structured databases has been studied in the literature and it has been shown that initial public offerings (IPOs), mergers and acquisitions (M\&A) as well as funding events can be predicted with various machine learning techniques. In such studies, heterogeneous information from the web and social networks is usually used as a complement to the information coming from databases. However, building and maintaining such databases demands tremendous human effort. We thus study here whether one can solely rely on readily available sources of information, such as the website of a startup, its social media activity as well as its presence on the web, to predict its funding events. As illustrated in our experiments, the method we propose yields results comparable to the ones making also use of structured data available in private databases.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Terminology-based Text Embedding for Computing Document Similarities on Technical Content
Authors:
Hamid Mirisaee,
Eric Gaussier,
Cedric Lagnier,
Agnes Guerraz
Abstract:
We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the keyphrases (composite keywords) of documents and, then, use them to score the sentences. Using the ranked sentences, we propose two approaches to embed document…
▽ More
We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the keyphrases (composite keywords) of documents and, then, use them to score the sentences. Using the ranked sentences, we propose two approaches to embed documents and show their performances with respect to two baselines. With domain expert annotations, we illustrate that the proposed methods can find more relevant documents and outperform the baselines up to 27% in terms of NDCG.
△ Less
Submitted 1 July, 2019; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Learning Information Spread in Content Networks
Authors:
Cédric Lagnier,
Simon Bourigault,
Sylvain Lamprier,
Ludovic Denoyer,
Patrick Gallinari
Abstract:
We introduce a model for predicting the diffusion of content information on social media. When propagation is usually modeled on discrete graph structures, we introduce here a continuous diffusion model, where nodes in a diffusion cascade are projected onto a latent space with the property that their proximity in this space reflects the temporal diffusion process. We focus on the task of predictin…
▽ More
We introduce a model for predicting the diffusion of content information on social media. When propagation is usually modeled on discrete graph structures, we introduce here a continuous diffusion model, where nodes in a diffusion cascade are projected onto a latent space with the property that their proximity in this space reflects the temporal diffusion process. We focus on the task of predicting contaminated users for an initial initial information source and provide preliminary results on differents datasets.
△ Less
Submitted 2 February, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.