-
Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs
Authors:
Lokesh Mishra,
Sohayl Dhibi,
Yusik Kim,
Cesar Berrospi Ramis,
Shubham Gupta,
Michele Dolfi,
Peter Staar
Abstract:
Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as…
▽ More
Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as well as content. We propose Statements, a novel domain agnostic data structure for extracting quantitative facts and related information. We propose translating tables to statements as a new supervised deep-learning universal information extraction task. We introduce SemTabNet - a dataset of over 100K annotated tables. Investigating a family of T5-based Statement Extraction Models, our best model generates statements which are 82% similar to the ground-truth (compared to baseline of 21%). We demonstrate the advantages of statements by applying our model to over 2700 tables from ESG reports. The homogeneous nature of statements permits exploratory data analysis on expansive information found in large collections of ESG reports.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
ESG Accountability Made Easy: DocQA at Your Service
Authors:
Lokesh Mishra,
Cesar Berrospi,
Kasper Dinkla,
Diego Antognini,
Francesco Fusco,
Benedikt Bothur,
Maksym Lysak,
Nikolaos Livathinos,
Ahmed Nassar,
Panagiotis Vagenas,
Lucas Morin,
Christoph Auer,
Michele Dolfi,
Peter Staar
Abstract:
We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via…
▽ More
We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via large language models). Users can explore over 10,000 Environmental, Social, and Governance (ESG) disclosure reports from over 2000 corporations. The Deep Search platform can be accessed at: https://ds4sd.github.io.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Hybrid Model using Feature Extraction and Non-linear SVM for Brain Tumor Classification
Authors:
Lalita Mishra,
Shekhar Verma,
Shirshu Varma
Abstract:
It is essential to classify brain tumors from magnetic resonance imaging (MRI) accurately for better and timely treatment of the patients. In this paper, we propose a hybrid model, using VGG along with Nonlinear-SVM (Soft and Hard) to classify the brain tumors: glioma and pituitary and tumorous and non-tumorous. The VGG-SVM model is trained for two different datasets of two classes; thus, we perfo…
▽ More
It is essential to classify brain tumors from magnetic resonance imaging (MRI) accurately for better and timely treatment of the patients. In this paper, we propose a hybrid model, using VGG along with Nonlinear-SVM (Soft and Hard) to classify the brain tumors: glioma and pituitary and tumorous and non-tumorous. The VGG-SVM model is trained for two different datasets of two classes; thus, we perform binary classification. The VGG models are trained via the PyTorch python library to obtain the highest testing accuracy of tumor classification. The method is threefold, in the first step, we normalize and resize the images, and the second step consists of feature extraction through variants of the VGG model. The third step classified brain tumors using non-linear SVM (soft and hard). We have obtained 98.18% accuracy for the first dataset and 99.78% for the second dataset using VGG19. The classification accuracies for non-linear SVM are 95.50% and 97.98% with linear and rbf kernel and 97.95% for soft SVM with RBF kernel with D1, and 96.75% and 98.60% with linear and RBF kernel and 98.38% for soft SVM with RBF kernel with D2. Results indicate that the hybrid VGG-SVM model, especially VGG 19 with SVM, is able to outperform existing techniques and achieve high accuracy.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.