-
NoPropaganda at SemEval-2020 Task 11: A Borrowed Approach to Sequence Tagging and Text Classification
Authors:
Ilya Dimov,
Vladislav Korzun,
Ivan Smurov
Abstract:
This paper describes our contribution to SemEval-2020 Task 11: Detection Of Propaganda Techniques In News Articles. We start with simple LSTM baselines and move to an autoregressive transformer decoder to predict long continuous propaganda spans for the first subtask. We also adopt an approach from relation extraction by envelo** spans mentioned above with special tokens for the second subtask o…
▽ More
This paper describes our contribution to SemEval-2020 Task 11: Detection Of Propaganda Techniques In News Articles. We start with simple LSTM baselines and move to an autoregressive transformer decoder to predict long continuous propaganda spans for the first subtask. We also adopt an approach from relation extraction by envelo** spans mentioned above with special tokens for the second subtask of propaganda technique classification. Our models report an F-score of 44.6% and a micro-averaged F-score of 58.2% for those tasks accordingly.
△ Less
Submitted 25 July, 2020;
originally announced July 2020.
-
arXiv:1910.13325
[pdf]
physics.data-an
cond-mat.mtrl-sci
cs.LG
physics.app-ph
physics.comp-ph
stat.ML
Fragment Graphical Variational AutoEncoding for Screening Molecules with Small Data
Authors:
John Armitage,
Leszek J. Spalek,
Malgorzata Nguyen,
Mark Nikolka,
Ian E. Jacobs,
Lorena Marañón,
Iyad Nasrallah,
Guillaume Schweicher,
Ivan Dimov,
Dimitrios Simatos,
Iain McCulloch,
Christian B. Nielsen,
Gareth Conduit,
Henning Sirringhaus
Abstract:
In the majority of molecular optimization tasks, predictive machine learning (ML) models are limited due to the unavailability and cost of generating big experimental datasets on the specific task. To circumvent this limitation, ML models are trained on big theoretical datasets or experimental indicators of molecular suitability that are either publicly available or inexpensive to acquire. These a…
▽ More
In the majority of molecular optimization tasks, predictive machine learning (ML) models are limited due to the unavailability and cost of generating big experimental datasets on the specific task. To circumvent this limitation, ML models are trained on big theoretical datasets or experimental indicators of molecular suitability that are either publicly available or inexpensive to acquire. These approaches produce a set of candidate molecules which have to be ranked using limited experimental data or expert knowledge. Under the assumption that structure is related to functionality, here we use a molecular fragment-based graphical autoencoder to generate unique structural fingerprints to efficiently search through the candidate set. We demonstrate that fragment-based graphical autoencoding reduces the error in predicting physical characteristics such as the solubility and partition coefficient in the small data regime compared to other extended circular fingerprints and string based approaches. We further demonstrate that this approach is capable of providing insight into real world molecular optimization problems, such as searching for stabilization additives in organic semiconductors by accurately predicting 92% of test molecules given 69 training examples. This task is a model example of black box molecular optimization as there is minimal theoretical and experimental knowledge to accurately predict the suitability of the additives.
△ Less
Submitted 30 October, 2019; v1 submitted 21 October, 2019;
originally announced October 2019.
-
On randomization of neural networks as a form of post-learning strategy
Authors:
K. G. Kapanova,
I. Dimov,
J. M. Sellier
Abstract:
Today artificial neural networks are applied in various fields - engineering, data analysis, robotics. While they represent a successful tool for a variety of relevant applications, mathematically speaking they are still far from being conclusive. In particular, they suffer from being unable to find the best configuration possible during the training process (local minimum problem). In this paper,…
▽ More
Today artificial neural networks are applied in various fields - engineering, data analysis, robotics. While they represent a successful tool for a variety of relevant applications, mathematically speaking they are still far from being conclusive. In particular, they suffer from being unable to find the best configuration possible during the training process (local minimum problem). In this paper, we focus on this issue and suggest a simple, but effective, post-learning strategy to allow the search for improved set of weights at a relatively small extra computational cost. Therefore, we introduce a novel technique based on analogy with quantum effects occurring in nature as a way to improve (and sometimes overcome) this problem. Several numerical experiments are presented to validate the approach.
△ Less
Submitted 26 November, 2015;
originally announced November 2015.