Search | arXiv e-print repository

KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

Authors: Oshri Naparstek, Roi Pony, Inbar Shapira, Foad Abo Dahood, Ophir Azulai, Yevgeny Yaroker, Nadav Rubinstein, Maksym Lysak, Peter Staar, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Elad Amrani, Idan Friedman, Orit Prince, Yevgeny Burshtein, Adi Raz Goldfarb, Udi Barzelay

Abstract: In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains. This effort has attracted substantial interest from both industry and academy, highlighting its significance in the current technological landscape. Most datasets in this area are primarily focused on Key Information Extraction (KIE), where… ▽ More In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains. This effort has attracted substantial interest from both industry and academy, highlighting its significance in the current technological landscape. Most datasets in this area are primarily focused on Key Information Extraction (KIE), where the extraction process revolves around extracting information using a specific, predefined set of keys. Unlike most existing datasets and benchmarks, our focus is on discovering key-value pairs (KVPs) without relying on predefined keys, navigating through an array of diverse templates and complex layouts. This task presents unique challenges, primarily due to the absence of comprehensive datasets and benchmarks tailored for non-predetermined KVP extraction. To address this gap, we introduce KVP10k , a new dataset and benchmark specifically designed for KVP extraction. The dataset contains 10707 richly annotated images. In our benchmark, we also introduce a new challenging task that combines elements of KIE as well as KVP in a single task. KVP10k sets itself apart with its extensive diversity in data and richly detailed annotations, paving the way for advancements in the field of information extraction from complex business documents. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: accepted ICDAR2024

arXiv:2404.10363 [pdf, other]

A Survey on Data-Driven Fault Diagnostic Techniques for Marine Diesel Engines

Authors: Ayah Youssef, Hassan Noura, Abderrahim El Amrani, El Mostafa El Adel, Mustapha Ouladsine

Abstract: Fault diagnosis in marine diesel engines is vital for maritime safety and operational efficiency.These engines are integral to marine vessels, and their reliable performance is crucial for safenavigation. Swift identification and resolution of faults are essential to prevent breakdowns,enhance safety, and reduce the risk of catastrophic failures at sea. Proactive fault diagnosisfacilitates timely… ▽ More Fault diagnosis in marine diesel engines is vital for maritime safety and operational efficiency.These engines are integral to marine vessels, and their reliable performance is crucial for safenavigation. Swift identification and resolution of faults are essential to prevent breakdowns,enhance safety, and reduce the risk of catastrophic failures at sea. Proactive fault diagnosisfacilitates timely maintenance, minimizes downtime, and ensures the overall reliability andlongevity of marine diesel engines. This paper explores the importance of fault diagnosis,emphasizing subsystems, common faults, and recent advancements in data-driven approachesfor effective marine diesel engine maintenance △ Less

Submitted 16 April, 2024; originally announced April 2024.

Journal ref: 12th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes, International Federation of Automatic Control (IFAC), Jun 2024, Farrera, Italy

arXiv:2402.18920 [pdf, other]

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

Authors: Dongliang Cao, Marvin Eisenberger, Nafie El Amrani, Daniel Cremers, Florian Bernard

Abstract: Although 3D shape matching and interpolation are highly interrelated, they are often studied separately and applied sequentially to relate different 3D shapes, thus resulting in sub-optimal performance. In this work we present a unified framework to predict both point-wise correspondences and shape interpolation between 3D shapes. To this end, we combine the deep functional map framework with clas… ▽ More Although 3D shape matching and interpolation are highly interrelated, they are often studied separately and applied sequentially to relate different 3D shapes, thus resulting in sub-optimal performance. In this work we present a unified framework to predict both point-wise correspondences and shape interpolation between 3D shapes. To this end, we combine the deep functional map framework with classical surface deformation models to map shapes in both spectral and spatial domains. On the one hand, by incorporating spatial maps, our method obtains more accurate and smooth point-wise correspondences compared to previous functional map methods for shape matching. On the other hand, by introducing spectral maps, our method gets rid of commonly used but computationally expensive geodesic distance constraints that are only valid for near-isometric shape deformations. Furthermore, we propose a novel test-time adaptation scheme to capture both pose-dominant and shape-dominant deformations. Using different challenging datasets, we demonstrate that our method outperforms previous state-of-the-art methods for both shape matching and interpolation, even compared to supervised approaches. △ Less

Submitted 27 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: accepted by CVPR2024

arXiv:2401.16339 [pdf]

doi 10.1016/j.compeleceng.2021.107257

SAT-CEP-monitor: An air quality monitoring software architecture combining complex event processing with satellite remote sensing

Authors: Badr-Eddine Boudriki Semlali, Chaker El Amrani, Guadalupe Ortiz, Juan Boubeta-Puig, Alfonso Garcia-de-Prado

Abstract: Air pollution is a major problem today that causes serious damage to human health. Urban areas are the most affected by the degradation of air quality caused by anthropogenic gas emissions. Although there are multiple proposals for air quality monitoring, in most cases, two limitations are imposed: the impossibility of processing data in Near Real-Time (NRT) for remote sensing approaches and the i… ▽ More Air pollution is a major problem today that causes serious damage to human health. Urban areas are the most affected by the degradation of air quality caused by anthropogenic gas emissions. Although there are multiple proposals for air quality monitoring, in most cases, two limitations are imposed: the impossibility of processing data in Near Real-Time (NRT) for remote sensing approaches and the impossibility of reaching areas of limited accessibility or low network coverage for ground data approaches. We propose a software architecture that efficiently combines complex event processing with remote sensing data from various satellite sensors to monitor air quality in NRT, giving support to decision-makers. We illustrate the proposed solution by calculating the air quality levels for several areas of Morocco and Spain, extracting and processing satellite information in NRT. This study also validates the air quality measured by ground stations and satellite sensor data. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Journal ref: Comput.Electr.Eng. 93:107257 (2021)

arXiv:2205.08249 [pdf, other]

doi 10.1145/3394171.3413612

Learnable Optimal Sequential Grou** for Video Scene Detection

Authors: Daniel Rotman, Yevgeny Yaroker, Elad Amrani, Udi Barzelay, Rami Ben-Ari

Abstract: Video scene detection is the task of dividing videos into temporal semantic chapters. This is an important preliminary step before attempting to analyze heterogeneous video content. Recently, Optimal Sequential Grou** (OSG) was proposed as a powerful unsupervised solution to solve a formulation of the video scene detection problem. In this work, we extend the capabilities of OSG to the learning… ▽ More Video scene detection is the task of dividing videos into temporal semantic chapters. This is an important preliminary step before attempting to analyze heterogeneous video content. Recently, Optimal Sequential Grou** (OSG) was proposed as a powerful unsupervised solution to solve a formulation of the video scene detection problem. In this work, we extend the capabilities of OSG to the learning regime. By giving the capability to both learn from examples and leverage a robust optimization formulation, we can boost performance and enhance the versatility of the technology. We present a comprehensive analysis of incorporating OSG into deep learning neural networks under various configurations. These configurations include learning an embedding in a straight-forward manner, a tailored loss designed to guide the solution of OSG, and an integrated model where the learning is performed through the OSG pipeline. With thorough evaluation and analysis, we assess the benefits and behavior of the various configurations, and show that our learnable OSG approach exhibits desirable behavior and enhanced performance compared to the state of the art. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Journal ref: In Proceedings of the 28th ACM International Conference on Multimedia, pp. 1958-1966. 2020

arXiv:2112.02300 [pdf, other]

Unsupervised Domain Generalization by Learning a Bridge Across Domains

Authors: Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

Abstract: The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system. In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively new and very practical Unsupervised Domain Generaliz… ▽ More The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system. In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively new and very practical Unsupervised Domain Generalization (UDG) setup of having no training supervision in neither source nor target domains. Our approach is based on self-supervised learning of a Bridge Across Domains (BrAD) - an auxiliary bridge domain accompanied by a set of semantics preserving visual (image-to-image) map**s to BrAD from each of the training domains. The BrAD and map**s to it are learned jointly (end-to-end) with a contrastive self-supervised representation model that semantically aligns each of the domains to its BrAD-projection, and hence implicitly drives all the domains (seen or unseen) to semantically align to each other. In this work, we show how using an edge-regularized BrAD our approach achieves significant gains across multiple benchmarks and a range of tasks, including UDG, Few-shot UDA, and unsupervised generalization across multi-domain datasets (including generalization to unseen domains and classes). △ Less

Submitted 17 May, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

arXiv:2103.10994 [pdf, other]

Self-Supervised Classification Network

Authors: Elad Amrani, Leonid Karlinsky, Alex Bronstein

Abstract: We present Self-Classifier -- a novel self-supervised end-to-end classification learning approach. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class) we propose a… ▽ More We present Self-Classifier -- a novel self-supervised end-to-end classification learning approach. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class) we propose a mathematically motivated variant of the cross-entropy loss that has a uniform prior asserted on the predicted labels. In our theoretical analysis, we prove that degenerate solutions are not in the set of optimal solutions of our approach. Self-Classifier is simple to implement and scalable. Unlike other popular unsupervised classification and contrastive representation learning approaches, it does not require any form of pre-training, expectation-maximization, pseudo-labeling, external clustering, a second network, stop-gradient operation, or negative pairs. Despite its simplicity, our approach sets a new state of the art for unsupervised classification of ImageNet; and even achieves comparable to state-of-the-art results for unsupervised representation learning. Code is available at https://github.com/elad-amrani/self-classifier. △ Less

Submitted 12 July, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: ECCV 2022 camera-ready with supplementary

arXiv:2003.03186 [pdf, other]

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning

Authors: Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein

Abstract: One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods often choose to ignore… ▽ More One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods often choose to ignore the presence of high levels of noise and thus yield sub-optimal results. In this work, we show that the problem of noise estimation for multimodal data can be reduced to a multimodal density estimation task. Using multimodal density estimation, we propose a noise estimation building block for multimodal representation learning that is based strictly on the inherent correlation between different modalities. We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval. Furthermore, we provide a theoretical probabilistic error bound substantiating our empirical results and analyze failure cases. Code: https://github.com/elad-amrani/ssml. △ Less

Submitted 10 December, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: Accepted to AAAI 2021

ACM Class: I.2.10; I.4; I.5

arXiv:1905.11137 [pdf, other]

Learning to Detect and Retrieve Objects from Unlabeled Videos

Authors: Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein

Abstract: Learning an object detector or retrieval requires a large data set with manual annotations. Such data sets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose to exploit the natural correlation in narrations and the visual presence of objects in video, to learn an object detector and retrieval without any manual labeling involved.… ▽ More Learning an object detector or retrieval requires a large data set with manual annotations. Such data sets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose to exploit the natural correlation in narrations and the visual presence of objects in video, to learn an object detector and retrieval without any manual labeling involved. We pose the problem as weakly supervised learning with noisy labels, and propose a novel object detection paradigm under these constraints. We handle the background rejection by using contrastive samples and confront the high level of label noise with a new clustering score. Our evaluation is based on a set of 11 manually annotated objects in over 5000 frames. We show comparison to a weakly-supervised approach as baseline and provide a strongly labeled upper bound. △ Less

Submitted 19 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: ICCV 2019 Workshop on Multi-modal Video Analysis and Moments in Time Challenge

ACM Class: I.2.10; I.4; I.5

Showing 1–9 of 9 results for author: Amrani, E