-
Predicting Survey Response with Quotation-based Modeling: A Case Study on Favorability towards the United States
Authors:
Alireza Amirshahi,
Nicolas Kirsch,
Jonathan Reymond,
Saleh Baghersalimi
Abstract:
The acquisition of survey responses is a crucial component in conducting research aimed at comprehending public opinion. However, survey data collection can be arduous, time-consuming, and expensive, with no assurance of an adequate response rate. In this paper, we propose a pioneering approach for predicting survey responses by examining quotations using machine learning. Our investigation focuse…
▽ More
The acquisition of survey responses is a crucial component in conducting research aimed at comprehending public opinion. However, survey data collection can be arduous, time-consuming, and expensive, with no assurance of an adequate response rate. In this paper, we propose a pioneering approach for predicting survey responses by examining quotations using machine learning. Our investigation focuses on evaluating the degree of favorability towards the United States, a topic of interest to many organizations and governments. We leverage a vast corpus of quotations from individuals across different nationalities and time periods to extract their level of favorability. We employ a combination of natural language processing techniques and machine learning algorithms to construct a predictive model for survey responses. We investigate two scenarios: first, when no surveys have been conducted in a country, and second when surveys have been conducted but in specific years and do not cover all the years. Our experimental results demonstrate that our proposed approach can predict survey responses with high accuracy. Furthermore, we provide an exhaustive analysis of the crucial features that contributed to the model's performance. This study has the potential to impact survey research in the field of data science by substantially decreasing the cost and time required to conduct surveys while simultaneously providing accurate predictions of public opinion.
△ Less
Submitted 27 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Map** the Space of Chemical Reactions Using Attention-Based Neural Networks
Authors:
Philippe Schwaller,
Daniel Probst,
Alain C. Vaucher,
Vishnu H. Nair,
David Kreutter,
Teodoro Laino,
Jean-Louis Reymond
Abstract:
Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of mole…
▽ More
Organic reactions are usually assigned to classes containing reactions with similar reagents and mechanisms. Reaction classes facilitate the communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task. It requires the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center, and the distinction between reactants and reagents. This work shows that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints that capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The insights into chemical reaction space enabled by our learned fingerprints are illustrated by an interactive reaction atlas providing visual clustering and similarity searching.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Visualization of Very Large High-Dimensional Data Sets as Minimum Spanning Trees
Authors:
Daniel Probst,
Jean-Louis Reymond
Abstract:
The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem wit…
▽ More
The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing chemical structures and associated properties. However, there are currently no algorithms to visualize such data while preserving both global and local features with a sufficient level of detail to allow for human inspection and interpretation. Here, we propose a solution to this problem with a new data visualization method, TMAP, capable of representing data sets of up to millions of data points and arbitrary high dimensionality as a two-dimensional tree (http://tmap.gdb.tools). Visualizations based on TMAP are better suited than t-SNE or UMAP for the exploration and interpretation of large data sets due to their tree-like nature, increased local and global neighborhood and structure preservation, and the transparency of the methods the algorithm is based on. We apply TMAP to the most used chemistry data sets including databases of molecules such as ChEMBL, FDB17, the Natural Products Atlas, DSSTox, as well as to the MoleculeNet benchmark collection of data sets. We also show its broad applicability with further examples from biology, particle physics, and literature.
△ Less
Submitted 6 January, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.