-
Modular Synthesis of Efficient Quantum Uncomputation
Authors:
Hristo Venev,
Timon Gehr,
Dimitar Dimitrov,
Martin Vechev
Abstract:
A key challenge of quantum programming is uncomputation: the reversible deallocation of qubits. And while there has been much recent progress on automating uncomputation, state-of-the-art methods are insufficient for handling today's expressive quantum programming languages. A core reason is that they operate on primitive quantum circuits, while quantum programs express computations beyond circuit…
▽ More
A key challenge of quantum programming is uncomputation: the reversible deallocation of qubits. And while there has been much recent progress on automating uncomputation, state-of-the-art methods are insufficient for handling today's expressive quantum programming languages. A core reason is that they operate on primitive quantum circuits, while quantum programs express computations beyond circuits, for instance, they can capture families of circuits defined recursively in terms of uncomputation and adjoints.
In this paper, we introduce the first modular automatic approach to synthesize correct and efficient uncomputation for expressive quantum programs. Our method is based on two core technical contributions: (i) an intermediate representation (IR) that can capture expressive quantum programs and comes with support for uncomputation, and (ii) modular algorithms over that IR for synthesizing uncomputation and adjoints.
We have built a complete end-to-end implementation of our method, including an implementation of the IR and the synthesis algorithms, as well as a translation from an expressive fragment of the Silq programming language to our IR and circuit generation from the IR. Our experimental evaluation demonstrates that we can handle programs beyond the capabilities of existing uncomputation approaches, while being competitive on the benchmarks they can handle. More broadly, we show that it is possible to benefit from the greater expressivity and safety offered by high-level quantum languages without sacrificing efficiency.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
DAGER: Exact Gradient Inversion for Large Language Models
Authors:
Ivo Petrov,
Dimitar I. Dimitrov,
Maximilian Baader,
Mark Niklas Müller,
Martin Vechev
Abstract:
Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit…
▽ More
Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data. However, prior work has shown that the data can actually be recovered by the server using so-called gradient inversion attacks. While these attacks perform well when applied on images, they are limited in the text domain and only permit approximate reconstruction of small batches and short input sequences. In this work, we propose DAGER, the first algorithm to recover whole batches of input text exactly. DAGER leverages the low-rank structure of self-attention layer gradients and the discrete nature of token embeddings to efficiently check if a given token sequence is part of the client data. We use this check to exactly recover full batches in the honest-but-curious setting without any prior on the data for both encoder- and decoder-based architectures using exhaustive heuristic search and a greedy approach, respectively. We provide an efficient GPU implementation of DAGER and show experimentally that it recovers full batches of size up to 128 on large language models (LLMs), beating prior attacks in speed (20x at same batch size), scalability (10x larger batches), and reconstruction quality (ROUGE-1/2 > 0.99).
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Your Transformer is Secretly Linear
Authors:
Anton Razzhigaev,
Matvey Mikhalchuk,
Elizaveta Goncharova,
Nikolai Gerasimenko,
Ivan Oseledets,
Denis Dimitrov,
Andrey Kuznetsov
Abstract:
This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o…
▽ More
This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm of the transformer layer. Our experiments show that removing or linearly approximating some of the most linear blocks of transformers does not affect significantly the loss or model performance. Moreover, in our pretraining experiments on smaller models we introduce a cosine-similarity-based regularization, aimed at reducing layer linearity. This regularization improves performance metrics on benchmarks like Tiny Stories and SuperGLUE and as well successfully decreases the linearity of the models. This study challenges the existing understanding of transformer architectures, suggesting that their operation may be more linear than previously assumed.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
OmniFusion Technical Report
Authors:
Elizaveta Goncharova,
Anton Razzhigaev,
Matvey Mikhalchuk,
Maxim Kurkin,
Irina Abdullaeva,
Matvey Skripkin,
Ivan Oseledets,
Denis Dimitrov,
Andrey Kuznetsov
Abstract:
Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various…
▽ More
Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various CLIP ViT-based encoders (SigLIP, InternVIT, etc.), and their fusing approach, image encoding method (whole image or tiles encoding) and two 7B LLMs (the proprietary one and open-source Mistral). Experiments on 8 visual-language benchmarks show the top score for the best OmniFusion setup in terms of different VQA tasks in comparison with open-source LLaVA-like solutions: VizWiz, Pope, MM-Vet, ScienceQA, MMBench, TextVQA, VQAv2, MMMU. We also propose a variety of situations, where OmniFusion provides highly-detailed answers in different domains: housekee**, sightseeing, culture, medicine, handwritten and scanned equations recognition, etc. Mistral-based OmniFusion model is an open-source solution with weights, training and inference scripts available at https://github.com/AIRI-Institute/OmniFusion.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models
Authors:
Stephan Linzbach,
Dimitar Dimitrov,
Laura Kallmeyer,
Kilian Evang,
Hajira Jabeen,
Stefan Dietze
Abstract:
Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance.…
▽ More
Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance. Simultaneously, evaluating the impact of either prompt syntax or information is challenging due to their interdependence. We designed CONPARE-LAMA - a dedicated probe, consisting of 34 million distinct prompts that facilitate comparison across minimal paraphrases. These paraphrases follow a unified meta-template enabling the controlled variation of syntax and semantics across arbitrary relations. CONPARE-LAMA enables insights into the independent impact of either syntactical form or semantic information of paraphrases on the knowledge retrieval performance of PLMs. Extensive knowledge retrieval experiments using our probe reveal that prompts following clausal syntax have several desirable properties in comparison to appositive syntax: i) they are more useful when querying PLMs with a combination of supplementary information, ii) knowledge is more consistently recalled across different combinations of supplementary information, and iii) they decrease response uncertainty when retrieving known facts. In addition, range information can boost knowledge retrieval performance more than domain information, even though domain information is more reliably helpful across syntactic forms.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Authors:
Rocktim Jyoti Das,
Simeon Emilov Hristov,
Haonan Li,
Dimitar Iliyanov Dimitrov,
Ivan Koychev,
Preslav Nakov
Abstract:
We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images,…
▽ More
We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images, tables, figures, diagrams, maps, scientific symbols, and equations. The questions come in 11 languages from 7 language families. Unlike existing benchmarks, EXAMS-V is uniquely curated by gathering school exam questions from various countries, with a variety of education systems. This distinctive approach calls for intricate reasoning across diverse languages and relies on region-specific knowledge. Solving the problems in the dataset requires advanced perception and joint reasoning over the text and the visual content of the image. Our evaluation results demonstrate that this is a challenging dataset, which is difficult even for advanced vision-text models such as GPT-4V and Gemini; this underscores the inherent complexity of the dataset and its significance as a future benchmark.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
SPEAR:Exact Gradient Inversion of Batches in Federated Learning
Authors:
Dimitar I. Dimitrov,
Maximilian Baader,
Mark Niklas Müller,
Martin Vechev
Abstract:
Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larg…
▽ More
Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for a batch size of $b=1$, with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to $b \lesssim 25$ exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time.
△ Less
Submitted 3 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
MERA: A Comprehensive LLM Evaluation in Russian
Authors:
Alena Fenogenova,
Artem Chervyakov,
Nikita Martynov,
Anastasia Kozlova,
Maria Tikhonova,
Albina Akhmetgareeva,
Anton Emelyanov,
Denis Shevelev,
Pavel Lebedev,
Leonid Sinev,
Ulyana Isaeva,
Katerina Kolomeytseva,
Daniil Moskovskiy,
Elizaveta Goncharova,
Nikita Savushkin,
Polina Mikhailova,
Denis Dimitrov,
Alexander Panchenko,
Sergei Markov
Abstract:
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitatio…
▽ More
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.
△ Less
Submitted 12 January, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
Kandinsky 3.0 Technical Report
Authors:
Vladimir Arkhipkin,
Andrei Filatov,
Viacheslav Vasilev,
Anastasia Maltseva,
Said Azizov,
Igor Pavlov,
Julia Agafonova,
Andrey Kuznetsov,
Denis Dimitrov
Abstract:
We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction…
▽ More
We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction. We focus on the key components that, as we have identified as a result of a large number of experiments, had the most significant impact on improving the quality of our model compared to the others. We also describe extensions and applications of our model, including super resolution, inpainting, image editing, image-to-video generation, and a distilled version of Kandinsky 3.0 - Kandinsky 3.1, which does inference in 4 steps of the reverse process and 20 times faster without visual quality decrease. By side-by-side human preferences comparison, Kandinsky becomes better in text understanding and works better on specific domains. The code is available at https://github.com/ai-forever/Kandinsky-3
△ Less
Submitted 28 June, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
Authors:
Vladimir Arkhipkin,
Zein Shaheen,
Viacheslav Vasilev,
Elizaveta Dakhova,
Andrey Kuznetsov,
Denis Dimitrov
Abstract:
Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyfram…
▽ More
Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth. We compare several temporal conditioning approaches for keyframes generation. The results show the advantage of using separate temporal blocks over temporal layers in terms of metrics reflecting video generation quality aspects and human preference. The design of our interpolation model significantly reduces computational costs compared to other masked frame interpolation approaches. Furthermore, we evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores. Finally, we compare our pipeline with existing solutions and achieve top-2 scores overall and top-1 among open-source solutions: CLIPSIM = 0.2976 and FVD = 433.054. Project page: https://ai-forever.github.io/kandinsky-video/
△ Less
Submitted 20 December, 2023; v1 submitted 21 November, 2023;
originally announced November 2023.
-
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
Authors:
Anton Razzhigaev,
Matvey Mikhalchuk,
Elizaveta Goncharova,
Ivan Oseledets,
Denis Dimitrov,
Andrey Kuznetsov
Abstract:
In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from…
▽ More
In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
△ Less
Submitted 26 February, 2024; v1 submitted 10 November, 2023;
originally announced November 2023.
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Authors:
Anton Razzhigaev,
Arseniy Shakhmatov,
Anastasia Maltseva,
Vladimir Arkhipkin,
Igor Pavlov,
Ilya Ryabov,
Angelina Kuts,
Alexander Panchenko,
Andrey Kuznetsov,
Denis Dimitrov
Abstract:
Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel explo…
▽ More
Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel exploration of latent diffusion architecture, combining the principles of the image prior models with latent diffusion techniques. The image prior model is trained separately to map text embeddings to image embeddings of CLIP. Another distinct feature of the proposed model is the modified MoVQ implementation, which serves as the image autoencoder component. Overall, the designed model contains 3.3B parameters. We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting. Additionally, we released the source code and checkpoints for the Kandinsky models. Experimental evaluations demonstrate a FID score of 8.03 on the COCO-30K dataset, marking our model as the top open-source performer in terms of measurable image generation quality.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles
Authors:
Georgi Pachov,
Dimitar Dimitrov,
Ivan Koychev,
Preslav Nakov
Abstract:
The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different rese…
▽ More
The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered dataset, using data from multiple languages. Finally, the three approaches are combined in a simple majority voting ensemble, resulting in 0.77 macro F1 on the test set and achieving 2nd place on the English subtask.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning
Authors:
Kostadin Garov,
Dimitar I. Dimitrov,
Nikola Jovanović,
Martin Vechev
Abstract:
Malicious server (MS) attacks have enabled the scaling of data stealing in federated learning to large batch sizes and secure aggregation, settings previously considered private. However, many concerns regarding the client-side detectability of MS attacks were raised, questioning their practicality. In this work, for the first time, we thoroughly study client-side detectability. We first demonstra…
▽ More
Malicious server (MS) attacks have enabled the scaling of data stealing in federated learning to large batch sizes and secure aggregation, settings previously considered private. However, many concerns regarding the client-side detectability of MS attacks were raised, questioning their practicality. In this work, for the first time, we thoroughly study client-side detectability. We first demonstrate that all prior MS attacks are detectable by principled checks, and formulate a necessary set of requirements that a practical MS attack must satisfy. Next, we propose SEER, a novel attack framework that satisfies these requirements. The key insight of SEER is the use of a secret decoder, jointly trained with the shared model. We show that SEER can steal user data from gradients of realistic networks, even for large batch sizes of up to 512 and under secure aggregation. Our work is a promising step towards assessing the true vulnerability of federated learning in real-world settings.
△ Less
Submitted 15 April, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition
Authors:
Igor Markov,
Sergey Nesteruk,
Andrey Kuznetsov,
Denis Dimitrov
Abstract:
Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While…
▽ More
Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
FARE: Provably Fair Representation Learning with Practical Certificates
Authors:
Nikola Jovanović,
Mislav Balunović,
Dimitar I. Dimitrov,
Martin Vechev
Abstract:
Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. Recent regulatory directives stress the need for FRL methods that provide practical certificates, i.e., provable upper bounds on the unfairness of any downstream classifier trained on preprocessed data, which directly provides assurance in a practical scenario. Creating such…
▽ More
Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. Recent regulatory directives stress the need for FRL methods that provide practical certificates, i.e., provable upper bounds on the unfairness of any downstream classifier trained on preprocessed data, which directly provides assurance in a practical scenario. Creating such FRL methods is an important challenge that remains unsolved. In this work, we address that challenge and introduce FARE (Fairness with Restricted Encoders), the first FRL method with practical fairness certificates. FARE is based on our key insight that restricting the representation space of the encoder enables the derivation of practical guarantees, while still permitting favorable accuracy-fairness tradeoffs for suitable instantiations, such as one we propose based on fair trees. To produce a practical certificate, we develop and apply a statistical procedure that computes a finite sample high-confidence upper bound on the unfairness of any downstream classifier trained on FARE embeddings. In our comprehensive experimental evaluation, we demonstrate that FARE produces practical certificates that are tight and often even comparable with purely empirical results obtained by prior methods, which establishes the practical value of our approach.
△ Less
Submitted 8 June, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
TabLeak: Tabular Data Leakage in Federated Learning
Authors:
Mark Vero,
Mislav Balunović,
Dimitar I. Dimitrov,
Martin Vechev
Abstract:
While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the…
▽ More
While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. TabLeak is based on two key contributions: (i) a method which leverages a softmax relaxation and pooled ensembling to solve the optimization problem, and (ii) an entropy-based uncertainty quantification scheme to enable human assessment. We evaluate TabLeak on four tabular datasets for both FedSGD and FedAvg training protocols, and show that it successfully breaks several settings previously deemed safe. For instance, we extract large subsets of private data at >90% accuracy even at the large batch size of 128. Our findings demonstrate that current high-stakes tabular FL is excessively vulnerable to leakage attacks.
△ Less
Submitted 7 July, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI
Authors:
Semen Budennyy,
Vladimir Lazarev,
Nikita Zakharenko,
Alexey Korovin,
Olga Plosskaya,
Denis Dimitrov,
Vladimir Arkhipkin,
Ivan Oseledets,
Ivan Barsola,
Ilya Egorov,
Aleksandra Kosterina,
Leonid Zhukov
Abstract:
The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy co…
▽ More
The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientists and researchers to track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy consumption tracking and correct regional CO2 emissions accounting. We encourage research community to search for new optimal Artificial Intelligence (AI) architectures with a lower computational cost. The motivation also comes from the concept of AI-based green house gases sequestrating cycle with both Sustainable AI and Green AI pathways.
△ Less
Submitted 3 August, 2022; v1 submitted 31 July, 2022;
originally announced August 2022.
-
Data Leakage in Federated Averaging
Authors:
Dimitar I. Dimitrov,
Mislav Balunović,
Nikola Konstantinov,
Martin Vechev
Abstract:
Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches ar…
▽ More
Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches are used, and (iii) labels and network weights vary simultaneously across client steps. In this work, we propose a new optimization-based attack which successfully attacks FedAvg by addressing the above challenges. First, we solve the optimization problem using automatic differentiation that forces a simulation of the client's update that generates the unobserved parameters for the recovered labels and inputs to match the received client update. Second, we address the large number of batches by relating images from different epochs with a permutation invariant prior. Third, we recover the labels by estimating the parameters of existing FedSGD attacks at every FedAvg step. On the popular FEMNIST dataset, we demonstrate that on average we successfully recover >45% of the client's images from realistic FedAvg updates computed on 10 local epochs of 10 batches each with 5 images, compared to only <10% using the baseline. Our findings show many real-world federated learning implementations based on FedAvg are vulnerable.
△ Less
Submitted 1 November, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.
-
Detecting and Understanding Harmful Memes: A Survey
Authors:
Shivam Sharma,
Firoj Alam,
Md. Shad Akhtar,
Dimitar Dimitrov,
Giovanni Da San Martino,
Hamed Firooz,
Alon Halevy,
Fabrizio Silvestri,
Preslav Nakov,
Tanmoy Chakraborty
Abstract:
The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes, which are of particular interest due to their viral nature. With this in mind, here we offer a comp…
▽ More
The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes, which are of particular interest due to their viral nature. With this in mind, here we offer a comprehensive survey with a focus on harmful memes. Based on a systematic analysis of recent literature, we first propose a new typology of harmful memes, and then we highlight and summarize the relevant state of the art. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism, partly due to the lack of suitable datasets. We further find that existing datasets mostly capture multi-class scenarios, which are not inclusive of the affective spectrum that memes can represent. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual, blending different cultures. We conclude by highlighting several challenges related to multimodal semiotics, technological constraints, and non-trivial social engagement, and we present several open-ended aspects such as delineating online harm and empirically examining related frameworks and assistive interventions, which we believe will motivate and drive future research.
△ Less
Submitted 29 May, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Group Control for Procedural Rules: Parameterized Complexity and Consecutive Domains
Authors:
Yongjie Yang,
Dinko Dimitrov
Abstract:
We consider Group Control by Adding Individuals (GCAI) in the setting of group identification for two procedural rules -- the consensus-start-respecting rule and the liberal-start-respecting rule. It is known that GCAI for both rules are NP-hard, but whether they are fixed-parameter tractable with respect to the number of distinguished individuals remained open. We resolve both open problems in th…
▽ More
We consider Group Control by Adding Individuals (GCAI) in the setting of group identification for two procedural rules -- the consensus-start-respecting rule and the liberal-start-respecting rule. It is known that GCAI for both rules are NP-hard, but whether they are fixed-parameter tractable with respect to the number of distinguished individuals remained open. We resolve both open problems in the affirmative. In addition, we strengthen the NP-hardness of GCAI by showing that, with respect to the natural parameter the number of added individuals, GCAI for both rules are W[2]-hard. Notably, the W[2]-hardness for the liberal-start-respecting rule holds even when restricted to a very special case where the qualifications of individuals satisfy the so-called consecutive ones property. However, for the consensus-start-respecting rule, the problem becomes polynomial-time solvable in this special case. We also study a dual restriction where the disqualifications of individuals fulfill the consecutive ones property, and show that under this restriction GCAI for both rules turn out to be polynomial-time solvable. Our reductions for showing W[2]-hardness also imply several lower bounds concerning kernelization and exact algorithms.
△ Less
Submitted 26 January, 2023; v1 submitted 31 March, 2022;
originally announced March 2022.
-
RuCLIP -- new models and experiments: a technical report
Authors:
Alex Shonenkov,
Andrey Kuznetsov,
Denis Dimitrov,
Tatyana Shavrina,
Daniil Chesakov,
Anastasia Maltseva,
Alena Fenogenova,
Igor Pavlov,
Anton Emelyanov,
Sergey Markov,
Daria Bakshandaeva,
Vera Shybaeva,
Andrey Chertok
Abstract:
In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and co…
▽ More
In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and concentrate on the conducted experiments. Inference execution time comparison is also presented in the report.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Survey on Large Scale Neural Network Training
Authors:
Julia Gusak,
Daria Cherniuk,
Alena Shilova,
Alexander Katrutsa,
Daniel Bershatsky,
Xunyi Zhao,
Lionel Eyraud-Dubois,
Oleg Shlyazhko,
Denis Dimitrov,
Ivan Oseledets,
Olivier Beaumont
Abstract:
Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good us…
▽ More
Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
LAMP: Extracting Text from Gradients with Language Model Priors
Authors:
Mislav Balunović,
Dimitar I. Dimitrov,
Nikola Jovanović,
Martin Vechev
Abstract:
Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients…
▽ More
Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our attack is based on two key insights: (i) modeling prior text probability with an auxiliary language model, guiding the search towards more natural text, and (ii) alternating continuous and discrete optimization, which minimizes reconstruction loss on embeddings, while avoiding local minima by applying discrete text transformations. Our experiments demonstrate that LAMP is significantly more effective than prior work: it reconstructs 5x more bigrams and 23% longer subsequences on average. Moreover, we are the first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.
△ Less
Submitted 19 October, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
A new face swap method for image and video domains: a technical report
Authors:
Daniil Chesakov,
Anastasia Maltseva,
Alexander Groshev,
Andrey Kuznetsov,
Denis Dimitrov
Abstract:
Deep fake technology became a hot field of research in the last few years. Researchers investigate sophisticated Generative Adversarial Networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swap**. Achieved results show that the deep fake unsupervised synthesis task has problems in terms of the visual quality of generated data. These problems usua…
▽ More
Deep fake technology became a hot field of research in the last few years. Researchers investigate sophisticated Generative Adversarial Networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swap**. Achieved results show that the deep fake unsupervised synthesis task has problems in terms of the visual quality of generated data. These problems usually lead to high fake detection accuracy when an expert analyzes them. The first problem is that existing image-to-image approaches do not consider video domain specificity and frame-by-frame processing leads to face jittering and other clearly visible distortions. Another problem is the generated data resolution, which is low for many existing methods due to high computational complexity. The third problem appears when the source face has larger proportions (like bigger cheeks), and after replacement it becomes visible on the face border. Our main goal was to develop such an approach that could solve these problems and outperform existing solutions on a number of clue metrics. We introduce a new face swap pipeline that is based on FaceShifter architecture and fixes the problems stated above. With a new eye loss function, super-resolution block, and Gaussian-based face mask generation leads to improvements in quality which is confirmed during evaluation.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
Authors:
Georgii Novikov,
Daniel Bershatsky,
Julia Gusak,
Alex Shonenkov,
Denis Dimitrov,
Ivan Oseledets
Abstract:
Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of…
▽ More
Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks.
△ Less
Submitted 2 February, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Handwritten text generation and strikethrough characters augmentation
Authors:
Alex Shonenkov,
Denis Karachev,
Max Novopoltsev,
Mark Potanin,
Denis Dimitrov,
Andrey Chertok
Abstract:
We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate (WER) and Character Error Rate (CER) beyond best-reported results on handwriting text recognition (HTR) tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix),…
▽ More
We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate (WER) and Character Error Rate (CER) beyond best-reported results on handwriting text recognition (HTR) tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in HTR tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to HTR. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of HTR models
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Emojich -- zero-shot emoji generation using Russian language: a technical report
Authors:
Alex Shonenkov,
Daria Bakshandaeva,
Denis Dimitrov,
Aleksandr Nikolich
Abstract:
This technical report presents a text-to-image neural network "Emojich" that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage, while giving special style to the images generated. Here are presented some engineering methods, code realization, all hyp…
▽ More
This technical report presents a text-to-image neural network "Emojich" that generates emojis using captions in Russian language as a condition. We aim to keep the generalization ability of a pretrained big model ruDALL-E Malevich (XL) 1.3B parameters at the fine-tuning stage, while giving special style to the images generated. Here are presented some engineering methods, code realization, all hyper-parameters for reproducing results and a Telegram bot where everyone can create their own customized sets of stickers. Also, some newly generated emojis obtained by "Emojich" model are demonstrated.
△ Less
Submitted 12 January, 2022; v1 submitted 4 December, 2021;
originally announced December 2021.
-
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Authors:
Daria Bakshandaeva,
Denis Dimitrov,
Vladimir Arkhipkin,
Alex Shonenkov,
Mark Potanin,
Denis Karachev,
Andrey Kuznetsov,
Anton Voronov,
Vera Davydova,
Elena Tutubalina,
Aleksandr Petiushko
Abstract:
Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Transl…
▽ More
Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants' submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture - a baseline solution, in the center of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one.
△ Less
Submitted 28 December, 2022; v1 submitted 21 November, 2021;
originally announced November 2021.
-
Bayesian Framework for Gradient Leakage
Authors:
Mislav Balunović,
Dimitar I. Dimitrov,
Robin Staab,
Martin Vechev
Abstract:
Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrase…
▽ More
Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrased as an optimization problem. We demonstrate that existing leakage attacks can be seen as approximations of this optimal adversary with different assumptions on the probability distributions of the input data and gradients. Our experiments confirm the effectiveness of the Bayes optimal adversary when it has knowledge of the underlying distribution. Further, our experimental evaluation shows that several existing heuristic defenses are not effective against stronger attacks, especially early in the training process. Thus, our findings indicate that the construction of more effective defenses and their evaluation remains an open problem.
△ Less
Submitted 17 March, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Detecting Harmful Memes and Their Targets
Authors:
Shraman Pramanick,
Dimitar Dimitrov,
Rituparna Mukherjee,
Shivam Sharma,
Md. Shad Akhtar,
Preslav Nakov,
Tanmoy Chakraborty
Abstract:
Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without app…
▽ More
Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without appropriate contexts, off-the-shelf multimodal models may not be adequate to understand their underlying semantics. In this work, we propose two novel problem formulations: detecting harmful memes and the social entities that these harmful memes target. To this end, we present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19. Each meme went through a rigorous two-stage annotation process. In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to: individual, organization, community, or society/general public/other. The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks. We further discuss the limitations of these models and we argue that more research is needed to address these problems.
△ Less
Submitted 24 September, 2021;
originally announced October 2021.
-
Detecting Propaganda Techniques in Memes
Authors:
Dimitar Dimitrov,
Bishr Bin Ali,
Shaden Shaar,
Firoj Alam,
Fabrizio Silvestri,
Hamed Firooz,
Preslav Nakov,
Giovanni Da San Martino
Abstract:
Propaganda can be defined as a form of communication that aims to influence the opinions or the actions of people towards a specific goal; this is achieved by means of well-defined rhetorical and psychological devices. Propaganda, in the form we know it today, can be dated back to the beginning of the 17th century. However, it is with the advent of the Internet and the social media that it has sta…
▽ More
Propaganda can be defined as a form of communication that aims to influence the opinions or the actions of people towards a specific goal; this is achieved by means of well-defined rhetorical and psychological devices. Propaganda, in the form we know it today, can be dated back to the beginning of the 17th century. However, it is with the advent of the Internet and the social media that it has started to spread on a much larger scale than before, thus becoming major societal and political issue. Nowadays, a large fraction of propaganda in social media is multimodal, mixing textual with visual content. With this in mind, here we propose a new multi-label multimodal task: detecting the type of propaganda techniques used in memes. We further create and release a new corpus of 950 memes, carefully annotated with 22 propaganda techniques, which can appear in the text, in the image, or in both. Our analysis of the corpus shows that understanding both modalities together is essential for detecting these techniques. This is further confirmed in our experiments with several state-of-the-art multimodal models.
△ Less
Submitted 7 August, 2021;
originally announced September 2021.
-
MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Authors:
Shraman Pramanick,
Shivam Sharma,
Dimitar Dimitrov,
Md Shad Akhtar,
Preslav Nakov,
Tanmoy Chakraborty
Abstract:
Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes…
▽ More
Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes such as hate speech and propaganda, there has been little work on harm in general. Here, we aim to bridge this gap. We focus on two tasks: (i)detecting harmful memes, and (ii)identifying the social entities they target. We further extend a recently released HarMeme dataset, which covered COVID-19, with additional memes and a new topic: US politics. To solve these tasks, we propose MOMENTA (MultimOdal framework for detecting harmful MemEs aNd Their tArgets), a novel multimodal deep neural network that uses global and local perspectives to detect harmful memes. MOMENTA systematically analyzes the local and the global perspective of the input meme (in both modalities) and relates it to the background context. MOMENTA is interpretable and generalizable, and our experiments show that it outperforms several strong rivaling approaches.
△ Less
Submitted 22 September, 2021; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Shared Certificates for Neural Network Verification
Authors:
Marc Fischer,
Christian Sprecher,
Dimitar I. Dimitrov,
Gagandeep Singh,
Martin Vechev
Abstract:
Existing neural network verifiers compute a proof that each input is handled correctly under a given perturbation by propagating a symbolic abstraction of reachable values at each layer. This process is repeated from scratch independently for each input (e.g., image) and perturbation (e.g., rotation), leading to an expensive overall proof effort when handling an entire dataset. In this work, we in…
▽ More
Existing neural network verifiers compute a proof that each input is handled correctly under a given perturbation by propagating a symbolic abstraction of reachable values at each layer. This process is repeated from scratch independently for each input (e.g., image) and perturbation (e.g., rotation), leading to an expensive overall proof effort when handling an entire dataset. In this work, we introduce a new method for reducing this verification cost without losing precision based on a key insight that abstractions obtained at intermediate layers for different inputs and perturbations can overlap or contain each other. Leveraging our insight, we introduce the general concept of shared certificates, enabling proof effort reuse across multiple inputs to reduce overall verification costs. We perform an extensive experimental evaluation to demonstrate the effectiveness of shared certificates in reducing the verification cost on a range of datasets and attack specifications on image classifiers including the popular patch and geometric perturbations. We release our implementation at https://github.com/eth-sri/proof-sharing.
△ Less
Submitted 23 November, 2023; v1 submitted 1 September, 2021;
originally announced September 2021.
-
StackMix and Blot Augmentations for Handwritten Text Recognition
Authors:
Alex Shonenkov,
Denis Karachev,
Maxim Novopoltsev,
Mark Potanin,
Denis Dimitrov
Abstract:
This paper proposes a handwritten text recognition(HTR) system that outperforms current state-of-the-artmethods. The comparison was carried out on three of themost frequently used in HTR task datasets, namely Ben-tham, IAM, and Saint Gall. In addition, the results on tworecently presented datasets, Peter the Greats manuscriptsand HKR Dataset, are provided.The paper describes the architecture of th…
▽ More
This paper proposes a handwritten text recognition(HTR) system that outperforms current state-of-the-artmethods. The comparison was carried out on three of themost frequently used in HTR task datasets, namely Ben-tham, IAM, and Saint Gall. In addition, the results on tworecently presented datasets, Peter the Greats manuscriptsand HKR Dataset, are provided.The paper describes the architecture of the neural net-work and two ways of increasing the volume of train-ing data: augmentation that simulates strikethrough text(HandWritten Blots) and a new text generation method(StackMix), which proved to be very effective in HTR tasks.StackMix can also be applied to the standalone task of gen-erating handwritten text based on printed text.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images
Authors:
Dimitar Dimitrov,
Bishr Bin Ali,
Shaden Shaar,
Firoj Alam,
Fabrizio Silvestri,
Hamed Firooz,
Preslav Nakov,
Giovanni Da San Martino
Abstract:
We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks: (i) detecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.…
▽ More
We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks: (i) detecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.e., both in the text and in the image. It was a popular task, attracting 71 registrations, and 22 teams that eventually made an official submission on the test set. The evaluation results for the third subtask confirmed the importance of both modalities, the text and the image. Moreover, some teams reported benefits when not just combining the two modalities, e.g., by using early or late fusion, but rather modeling the interaction between them in a joint model.
△ Less
Submitted 25 April, 2021;
originally announced May 2021.
-
A Survey on Multimodal Disinformation Detection
Authors:
Firoj Alam,
Stefano Cresci,
Tanmoy Chakraborty,
Fabrizio Silvestri,
Dimiter Dimitrov,
Giovanni Da San Martino,
Shaden Shaar,
Hamed Firooz,
Preslav Nakov
Abstract:
Recent years have witnessed the proliferation of offensive content online such as fake news, propaganda, misinformation, and disinformation. While initially this was mostly about textual content, over time images and videos gained popularity, as they are much easier to consume, attract more attention, and spread further than text. As a result, researchers started leveraging different modalities an…
▽ More
Recent years have witnessed the proliferation of offensive content online such as fake news, propaganda, misinformation, and disinformation. While initially this was mostly about textual content, over time images and videos gained popularity, as they are much easier to consume, attract more attention, and spread further than text. As a result, researchers started leveraging different modalities and combinations thereof to tackle online multimodal offensive content. In this study, we offer a survey on the state-of-the-art on multimodal disinformation detection covering various combinations of modalities: text, images, speech, video, social media network structure, and temporal information. Moreover, while some studies focused on factuality, others investigated how harmful the content is. While these two components in the definition of disinformation (i) factuality, and (ii) harmfulness, are equally important, they are typically studied in isolation. Thus, we argue for the need to tackle disinformation detection by taking into account multiple modalities as well as both factuality and harmfulness, in the same framework. Finally, we discuss current challenges and future research directions
△ Less
Submitted 28 September, 2022; v1 submitted 13 March, 2021;
originally announced March 2021.
-
Digital Peter: Dataset, Competition and Handwriting Recognition Methods
Authors:
Mark Potanin,
Denis Dimitrov,
Alex Shonenkov,
Vladimir Bataev,
Denis Karachev,
Maxim Novopoltsev
Abstract:
This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The ope…
▽ More
This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The open machine learning competition Digital Peter was held based on the considered dataset. The baseline solution for this competition as well as more advanced methods on handwritten text recognition are described in the article. Full dataset and all code are publicly available.
△ Less
Submitted 27 August, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Provably Robust Adversarial Examples
Authors:
Dimitar I. Dimitrov,
Gagandeep Singh,
Timon Gehr,
Martin Vechev
Abstract:
We introduce the concept of provably robust adversarial examples for deep neural networks - connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which wo…
▽ More
We introduce the concept of provably robust adversarial examples for deep neural networks - connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which works by iteratively refining the region initially obtained via sampling until a refined region is certified to be adversarial with existing state-of-the-art verifiers. At each step, a novel optimization procedure is applied to maximize the region's volume under the constraint that the convex relaxation of the network behavior with respect to the region implies a chosen bound on the certification objective. Our experimental evaluation shows the effectiveness of PARADE: it successfully finds large provably robust regions including ones containing $\approx 10^{573}$ adversarial examples for pixel intensity and $\approx 10^{599}$ for geometric perturbations. The provability enables our robust examples to be significantly more effective against state-of-the-art defenses based on randomized smoothing than the individual attacks used to construct the regions.
△ Less
Submitted 17 March, 2022; v1 submitted 23 July, 2020;
originally announced July 2020.
-
TweetsCOV19 -- A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic
Authors:
Dimitar Dimitrov,
Erdal Baran,
Pavlos Fafalios,
Ran Yu,
Xiaofei Zhu,
Matthäus Zloch,
Stefan Dietze
Abstract:
Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigati…
▽ More
Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigating measures and their societal impact. Understanding such discourse, its evolution, and interdependencies with real-world events or (mis)information can foster valuable insights. On the other hand, such corpora are crucial facilitators for computational methods addressing tasks such as sentiment analysis, event detection, or entity recognition. However, obtaining, archiving, and semantically annotating large amounts of tweets is costly. In this paper, we describe TweetsCOV19, a publicly available knowledge base of currently more than 8 million tweets, spanning October 2019 - April 2020. Metadata about the tweets as well as extracted entities, hashtags, user mentions, sentiments, and URLs are exposed using established RDF/S vocabularies, providing an unprecedented knowledge base for a range of knowledge discovery tasks. Next to a description of the dataset and its extraction and annotation process, we present an initial analysis and use cases of the corpus.
△ Less
Submitted 15 August, 2020; v1 submitted 25 June, 2020;
originally announced June 2020.
-
A Mixed Initiative Semantic Web Framework for Process Composition
Authors:
**ghai Rao,
Dimitar Dimitrov,
Paul Hofmann,
Norman Sadeh
Abstract:
Semantic Web technologies offer the prospect of significantly reducing the amount of effort required to integrate existing enterprise functionality in support of new composite processes; whether within a given organization or across multiple ones. A significant body of work in this area has aimed to fully automate this process, while assuming that all functionality has already been encapsulated in…
▽ More
Semantic Web technologies offer the prospect of significantly reducing the amount of effort required to integrate existing enterprise functionality in support of new composite processes; whether within a given organization or across multiple ones. A significant body of work in this area has aimed to fully automate this process, while assuming that all functionality has already been encapsulated in the form of semantic web services with rich and accurate annotations. In this article, we argue that this assumption is often unrealistic. Instead, we describe a mixed initiative framework for semantic web service discovery and composition that aims at flexibly interleaving human decision making and automated functionality in environments where annotations may be incomplete and even inconsistent.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
A Markerless Deep Learning-based 6 Degrees of Freedom PoseEstimation for with Mobile Robots using RGB Data
Authors:
Linh Kästner,
Daniel Dimitrov,
Jens Lambrecht
Abstract:
Augmented Reality has been subject to various integration efforts within industries due to its ability to enhance human machine interaction and understanding. Neural networks have achieved remarkable results in areas of computer vision, which bear great potential to assist and facilitate an enhanced Augmented Reality experience. However, most neural networks are computationally intensive and deman…
▽ More
Augmented Reality has been subject to various integration efforts within industries due to its ability to enhance human machine interaction and understanding. Neural networks have achieved remarkable results in areas of computer vision, which bear great potential to assist and facilitate an enhanced Augmented Reality experience. However, most neural networks are computationally intensive and demand huge processing power thus, are not suitable for deployment on Augmented Reality devices. In this work we propose a method to deploy state of the art neural networks for real time 3D object localization on augmented reality devices. As a result, we provide a more automated method of calibrating the AR devices with mobile robotic systems. To accelerate the calibration process and enhance user experience, we focus on fast 2D detection approaches which are extracting the 3D pose of the object fast and accurately by using only 2D input. The results are implemented into an Augmented Reality application for intuitive robot control and sensor data visualization. For the 6D annotation of 2D images, we developed an annotation tool, which is, to our knowledge, the first open source tool to be available. We achieve feasible results which are generally applicable to any AR device thus making this work promising for further research in combining high demanding neural networks with Internet of Things devices.
△ Less
Submitted 16 January, 2020;
originally announced January 2020.
-
Query for Architecture, Click through Military: Comparing the Roles of Search and Navigation on Wikipedia
Authors:
Dimitar Dimitrov,
Florian Lemmerich,
Fabian Flöck,
Markus Strohmaier
Abstract:
As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and e…
▽ More
As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare -- the relative amount of views an article received by search --, and (ii) resistance -- the ability of an article to relay traffic to other Wikipedia articles -- to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a "dead end" for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Forbidden branches in trees with minimal atom-bond connectivity index
Authors:
Darko Dimitrov,
Zhibin Du,
Carlos M. da Fonseca
Abstract:
The atom-bond connectivity (ABC) index has been, in recent years, one of the most actively studied vertex-degree-based graph invariants in chemical graph theory. For a given graph $G$, the ABC index is defined as $\sum_{uv\in E}\sqrt{\frac{d(u) +d(v)-2}{d(u)d(v)}}$, where $d(u)$ is the degree of vertex $u$ in $G$ and $E(G)$ denotes the set of edges of $G$. In this paper we present some new structu…
▽ More
The atom-bond connectivity (ABC) index has been, in recent years, one of the most actively studied vertex-degree-based graph invariants in chemical graph theory. For a given graph $G$, the ABC index is defined as $\sum_{uv\in E}\sqrt{\frac{d(u) +d(v)-2}{d(u)d(v)}}$, where $d(u)$ is the degree of vertex $u$ in $G$ and $E(G)$ denotes the set of edges of $G$. In this paper we present some new structural properties of trees with a minimal ABC index (also refer to as a minimal-ABC tree), which is a step further towards understanding their complete characterization. We show that a minimal-ABC tree cannot simultaneously contain a $B_4$-branch and $B_1$ or $B_2$-branches.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.
-
On structural properties of trees with minimal atom-bond connectivity index IV: Solving a conjecture about the pendent paths of length three
Authors:
Darko Dimitrov
Abstract:
The atom-bond connectivity (ABC) index is one of the most investigated degree-based molecular structure descriptors with a variety of chemical applications. It is known that among all connected graphs, the trees minimize the ABC index. However, a full characterization of trees with a minimal ABC index is still an open problem. By now, one of the proved properties is that a tree with a minimal ABC…
▽ More
The atom-bond connectivity (ABC) index is one of the most investigated degree-based molecular structure descriptors with a variety of chemical applications. It is known that among all connected graphs, the trees minimize the ABC index. However, a full characterization of trees with a minimal ABC index is still an open problem. By now, one of the proved properties is that a tree with a minimal ABC index may have, at most, one pendent path of length $3$, with the conjecture that it cannot be a case if the order of a tree is larger than $1178$. Here, we provide an affirmative answer of a strengthened version of that conjecture, showing that a tree with minimal ABC index cannot contain a pendent path of length $3$ if its order is larger than $415$.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
The Complexity of Shelflisting
Authors:
Yongjie Yang,
Dinko Dimitrov
Abstract:
Optimal shelflisting invites profit maximization to become sensitive to the ways in which purchasing decisions are order-dependent. We study the computational complexity of the corresponding product arrangement problem when consumers are either rational maximizers, use a satisficing procedure, or apply successive choice. The complexity results we report are shown to crucially depend on the size of…
▽ More
Optimal shelflisting invites profit maximization to become sensitive to the ways in which purchasing decisions are order-dependent. We study the computational complexity of the corresponding product arrangement problem when consumers are either rational maximizers, use a satisficing procedure, or apply successive choice. The complexity results we report are shown to crucially depend on the size of the top cycle in consumers' preferences over products and on the direction in which alternatives on the shelf are encountered.
△ Less
Submitted 12 November, 2016;
originally announced November 2016.
-
What Makes a Link Successful on Wikipedia?
Authors:
Dimitar Dimitrov,
Philipp Singer,
Florian Lemmerich,
Markus Strohmaier
Abstract:
While a plethora of hypertext links exist on the Web, only a small amount of them are regularly clicked. Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link successful. We systematically analyze effects of link properties on the popularity of links. By utilizing mixed-effects hurdle models supplemented with descriptive…
▽ More
While a plethora of hypertext links exist on the Web, only a small amount of them are regularly clicked. Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link successful. We systematically analyze effects of link properties on the popularity of links. By utilizing mixed-effects hurdle models supplemented with descriptive insights, we find evidence of user preference towards links leading to the periphery of the network, towards links leading to semantically similar articles, and towards links in the top and left-side of the screen. We integrate these findings as Bayesian priors into a navigational Markov chain model and by doing so successfully improve the model fits. We further adapt and improve the well-known classic PageRank algorithm that assumes random navigation by accounting for observed navigational preferences of users in a weighted variation. This work facilitates understanding navigational click behavior and thus can contribute to improving link structures and algorithms utilizing these structures.
△ Less
Submitted 20 February, 2017; v1 submitted 8 November, 2016;
originally announced November 2016.
-
Remarks on the maximum atom-bond connectivity index of graphs with given parameters
Authors:
Darko Dimitrov,
Barbara Ikica,
Riste Škrekovski
Abstract:
The atom-bond connectivity (ABC) index is a degree-based molecular structure descriptor that can be used for modelling thermodynamic properties of organic chemical compounds. Motivated by its applicable potential, a series of investigations have been carried out in the past several years. In this note we first consider graphs with given edge-connectivity that attain the maximum ABC index. In parti…
▽ More
The atom-bond connectivity (ABC) index is a degree-based molecular structure descriptor that can be used for modelling thermodynamic properties of organic chemical compounds. Motivated by its applicable potential, a series of investigations have been carried out in the past several years. In this note we first consider graphs with given edge-connectivity that attain the maximum ABC index. In particular, we give an affirmative answer to the conjecture about the structure of graphs with edge-connectivity equal to one that maximize the ABC index, which was recently raised by Zhang, Yang, Wang and Zhang~\cite{zywz mabciggp-2016}. In addition, we provide supporting evidence for another conjecture posed by the same authors which concerns graphs that maximize the ABC index among all graphs with chromatic number equal to some fixed $χ\geq 3$. Specifically, we confirm this conjecture in the case where the order of the graph is divisible by $χ$.
△ Less
Submitted 8 October, 2016;
originally announced October 2016.
-
On the Irregularity of Some Molecular Structures
Authors:
Hosam Abdo,
Darko Dimitrov,
Wei Gao
Abstract:
Measures of the irregularity of chemical graphs could be helpful for QSAR/QSPR studies and for the descriptive purposes of biological and chemical properties, such as melting and boiling points, toxicity and resistance. Here we consider the following four established irregularity measures: the irregularity index by Albertson, the total irregularity, the variance of vertex degrees and the Collatz-S…
▽ More
Measures of the irregularity of chemical graphs could be helpful for QSAR/QSPR studies and for the descriptive purposes of biological and chemical properties, such as melting and boiling points, toxicity and resistance. Here we consider the following four established irregularity measures: the irregularity index by Albertson, the total irregularity, the variance of vertex degrees and the Collatz-Sinogowitz index.
Through the means of graph structural analysis and derivation, we study the above-mentioned irregularity measures of several chemical molecular graphs which frequently appear in chemical, medical and material engineering, as well as the nanotubes: $TUC_4 C_8(S)$, $TUC_4 C_8(R)$, Zig-Zag $TUHC_{6}$, $TUC_4$, Armchair $TUVC_{6}$, then dendrimers $T_{k,d}$ and the circumcoronene series of benzenoid $H_k$. In addition, the irregularities of Mycielski's constructions of cycle and path graphs are analyzed.
△ Less
Submitted 18 August, 2016;
originally announced August 2016.
-
How Hard Is It to Control A Group?
Authors:
Yongjie Yang,
Dinko Dimitrov
Abstract:
We consider group identification models in which the aggregation of individual opinions concerning who is qualified in a given society determines the set of socially qualified persons. In this setting, we study the extent to which social qualification can be changed when societies expand, shrink, or partition themselves. The answers we provide are with respect to the computational complexity of th…
▽ More
We consider group identification models in which the aggregation of individual opinions concerning who is qualified in a given society determines the set of socially qualified persons. In this setting, we study the extent to which social qualification can be changed when societies expand, shrink, or partition themselves. The answers we provide are with respect to the computational complexity of the corresponding control problems and fully cover the class of consent aggregation rules introduced by Samet & Schmeidler (2003) as well as procedural rules for group identification. We obtain both polynomial-time solvability results and NP-hardness results. In addition, we also study these problems from the parameterized complexity perspective, and obtain some fixed-parameter tractability results.
△ Less
Submitted 28 April, 2018; v1 submitted 10 June, 2016;
originally announced June 2016.