-
Toloka Visual Question Answering Benchmark
Authors:
Dmitry Ustalov,
Nikita Pavlichenko,
Sergey Koshelev,
Daniil Likhobaba,
Alisa Smirnova
Abstract:
In this paper, we present Toloka Visual Question Answering, a new crowdsourced dataset allowing comparing performance of machine learning systems against human level of expertise in the grounding visual question answering task. In this task, given an image and a textual question, one has to draw the bounding box around the object correctly responding to that question. Every image-question pair con…
▽ More
In this paper, we present Toloka Visual Question Answering, a new crowdsourced dataset allowing comparing performance of machine learning systems against human level of expertise in the grounding visual question answering task. In this task, given an image and a textual question, one has to draw the bounding box around the object correctly responding to that question. Every image-question pair contains the response, with only one correct response per image. Our dataset contains 45,199 pairs of images and questions in English, provided with ground truth bounding boxes, split into train and two test subsets. Besides describing the dataset and releasing it under a CC BY license, we conducted a series of experiments on open source zero-shot baseline models and organized a multi-phase competition at WSDM Cup that attracted 48 participants worldwide. However, by the time of paper submission, no machine learning model outperformed the non-expert crowdsourcing baseline according to the intersection over union evaluation score.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Identifying topology of leaky photonic lattices with machine learning
Authors:
Ekaterina O. Smolina,
Lev A. Smirnov,
Daniel Leykam,
Franco Nori,
Daria A. Smirnova
Abstract:
We show how machine learning techniques can be applied for the classification of topological phases in leaky photonic lattices using limited measurement data. We propose an approach based solely on bulk intensity measurements, thus exempt from the need for complicated phase retrieval procedures. In particular, we design a fully connected neural network that accurately determines topological proper…
▽ More
We show how machine learning techniques can be applied for the classification of topological phases in leaky photonic lattices using limited measurement data. We propose an approach based solely on bulk intensity measurements, thus exempt from the need for complicated phase retrieval procedures. In particular, we design a fully connected neural network that accurately determines topological properties from the output intensity distribution in dimerized waveguide arrays with leaky channels, after propagation of a spatially localized initial excitation at a finite distance, in a setting that closely emulates realistic experimental conditions.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Itsy Bitsy SpiderNet: Fully Connected Residual Network for Fraud Detection
Authors:
Sergey Afanasiev,
Anastasiya Smirnova,
Diana Kotereva
Abstract:
With the development of high technology, the scope of fraud is increasing, resulting in annual losses of billions of dollars worldwide. The preventive protection measures become obsolete and vulnerable over time, so effective detective tools are needed. In this paper, we propose a convolutional neural network architecture SpiderNet designed to solve fraud detection problems. We noticed that the pr…
▽ More
With the development of high technology, the scope of fraud is increasing, resulting in annual losses of billions of dollars worldwide. The preventive protection measures become obsolete and vulnerable over time, so effective detective tools are needed. In this paper, we propose a convolutional neural network architecture SpiderNet designed to solve fraud detection problems. We noticed that the principles of pooling and convolutional layers in neural networks are very similar to the way antifraud analysts work when conducting investigations. Moreover, the skip-connections used in neural networks make the usage of features of various power in antifraud models possible. Our experiments have shown that SpiderNet provides better quality compared to Random Forest and adapted for antifraud modeling problems 1D-CNN, 1D-DenseNet, F-DenseNet neural networks. We also propose new approaches for fraud feature engineering called B-tests and W-tests, which generalize the concepts of Benford's Law for fraud anomalies detection. Our results showed that B-tests and W-tests give a significant increase to the quality of our anti-fraud models. The SpiderNet code is available at https://github.com/aasmirnova24/SpiderNet
△ Less
Submitted 28 September, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Interactive Duplicate Search in Software Documentation
Authors:
D. V. Luciv,
D. V. Koznov,
A. A. Shelikhovskii,
K. Yu. Romanovsky,
G. A. Chernishev,
A. N. Terekhov,
D. A. Grigoriev,
A. N. Smirnova,
D. V. Borovkov,
A. I. Vasenina
Abstract:
Various software features such as classes, methods, requirements, and tests often have similar functionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled duplicates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of duplicate detection in software documentation is of importance. Solving it makes planned reuse…
▽ More
Various software features such as classes, methods, requirements, and tests often have similar functionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled duplicates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of duplicate detection in software documentation is of importance. Solving it makes planned reuse possible, as well as creating and using templates for unification and automatic generation of documentation. In this paper, we present an interactive process for duplicate detection that involves the user in order to conduct meaningful search. It includes a new formal definition of a near duplicate, a pattern-based, and the proof of its completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of several industrial projects.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.