-
Deep Learning for Video-Based Assessment of Endotracheal Intubation Skills
Authors:
Jean-Paul Ainam,
Erim Yanik,
Rahul Rahul,
Taylor Kunkes,
Lora Cavuoto,
Brian Clemency,
Kaori Tanaka,
Matthew Hackett,
Jack Norfleet,
Suvranu De
Abstract:
Endotracheal intubation (ETI) is an emergency procedure performed in civilian and combat casualty care settings to establish an airway. Objective and automated assessment of ETI skills is essential for the training and certification of healthcare providers. However, the current approach is based on manual feedback by an expert, which is subjective, time- and resource-intensive, and is prone to poo…
▽ More
Endotracheal intubation (ETI) is an emergency procedure performed in civilian and combat casualty care settings to establish an airway. Objective and automated assessment of ETI skills is essential for the training and certification of healthcare providers. However, the current approach is based on manual feedback by an expert, which is subjective, time- and resource-intensive, and is prone to poor inter-rater reliability and halo effects. This work proposes a framework to evaluate ETI skills using single and multi-view videos. The framework consists of two stages. First, a 2D convolutional autoencoder (AE) and a pre-trained self-supervision network extract features from videos. Second, a 1D convolutional enhanced with a cross-view attention module takes the features from the AE as input and outputs predictions for skill evaluation. The ETI datasets were collected in two phases. In the first phase, ETI is performed by two subject cohorts: Experts and Novices. In the second phase, novice subjects perform ETI under time pressure, and the outcome is either Successful or Unsuccessful. A third dataset of videos from a single head-mounted camera for Experts and Novices is also analyzed. The study achieved an accuracy of 100% in identifying Expert/Novice trials in the initial phase. In the second phase, the model showed 85% accuracy in classifying Successful/Unsuccessful procedures. Using head-mounted cameras alone, the model showed a 96% accuracy on Expert and Novice classification while maintaining an accuracy of 85% on classifying successful and unsuccessful. In addition, GradCAMs are presented to explain the differences between Expert and Novice behavior and Successful and Unsuccessful trials. The approach offers a reliable and objective method for automated assessment of ETI skills.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Video-based Formative and Summative Assessment of Surgical Tasks using Deep Learning
Authors:
Erim Yanik,
Uwe Kruger,
Xavier Intes,
Rahul Rahul,
Suvranu De
Abstract:
To ensure satisfactory clinical outcomes, surgical skill assessment must be objective, time-efficient, and preferentially automated - none of which is currently achievable. Video-based assessment (VBA) is being deployed in intraoperative and simulation settings to evaluate technical skill execution. However, VBA remains manually- and time-intensive and prone to subjective interpretation and poor i…
▽ More
To ensure satisfactory clinical outcomes, surgical skill assessment must be objective, time-efficient, and preferentially automated - none of which is currently achievable. Video-based assessment (VBA) is being deployed in intraoperative and simulation settings to evaluate technical skill execution. However, VBA remains manually- and time-intensive and prone to subjective interpretation and poor inter-rater reliability. Herein, we propose a deep learning (DL) model that can automatically and objectively provide a high-stakes summative assessment of surgical skill execution based on video feeds and low-stakes formative assessment to guide surgical skill acquisition. Formative assessment is generated using heatmaps of visual features that correlate with surgical performance. Hence, the DL model paves the way to the quantitative and reproducible evaluation of surgical tasks from videos with the potential for broad dissemination in surgical training, certification, and credentialing.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Text2Chart: A Multi-Staged Chart Generator from Natural Language Text
Authors:
Md. Mahinur Rashid,
Hasin Kawsar Jahan,
Annysha Huzzat,
Riyasaat Ahmed Rahul,
Tamim Bin Zakir,
Farhana Meem,
Md. Saddam Hossain Mukta,
Swakkhar Shatabda
Abstract:
Generation of scientific visualization from analytical natural language text is a challenging task. In this paper, we propose Text2Chart, a multi-staged chart generator method. Text2Chart takes natural language text as input and produce visualization as two-dimensional charts. Text2Chart approaches the problem in three stages. Firstly, it identifies the axis elements of a chart from the given text…
▽ More
Generation of scientific visualization from analytical natural language text is a challenging task. In this paper, we propose Text2Chart, a multi-staged chart generator method. Text2Chart takes natural language text as input and produce visualization as two-dimensional charts. Text2Chart approaches the problem in three stages. Firstly, it identifies the axis elements of a chart from the given text known as x and y entities. Then it finds a map** of x-entities with its corresponding y-entities. Next, it generates a chart type suitable for the given text: bar, line or pie. Combination of these three stages is capable of generating visualization from the given analytical text. We have also constructed a dataset for this problem. Experiments show that Text2Chart achieves best performances with BERT based encodings with LSTM models in the first stage to label x and y entities, Random Forest classifier for the map** stage and fastText embedding with LSTM for the chart type prediction. In our experiments, all the stages show satisfactory results and effectiveness considering formation of charts from analytical text, achieving a commendable overall performance.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Cognitive IoT based Health Monitoring Scheme using Non-Orthogonal Multiple Access
Authors:
Ashiqur Rahman Rahul,
Saifur Rahman Sabuj,
Majumder Fazle Haider,
Shakil Ahmed
Abstract:
It has become very essential to address the limited spectrum capacity and their efficient utilization to support the increasing number of Internet of Things devices. When it comes to medical infrastructure, it becomes very imperative for medical devices to communicate with the base station. In such situations, communication over the wireless medium must provide optimized throughput (data rate) wit…
▽ More
It has become very essential to address the limited spectrum capacity and their efficient utilization to support the increasing number of Internet of Things devices. When it comes to medical infrastructure, it becomes very imperative for medical devices to communicate with the base station. In such situations, communication over the wireless medium must provide optimized throughput (data rate) with effectual energy usage, which will ensure precise medical feedback by the responsible staff. Taking into account, it is necessary to operate wireless communication precisely at a higher frequency with more substantial bandwidth and low latency. Cognitive Radio (CR) is traditionally a viable choice, where it identifies and utilizes the vacant spectrum, thus maximizing the primary user's capacity and achieving spectral efficiency. To ensure such outcomes, the Non-Orthogonal Multiple Access (NOMA) techniques have proven to deliver an effective solution to the increasing number of devices with unimpaired performance, especially when the communication shifts towards a higher frequency band such as the mmWave band. In this chapter, IoT based CR network in uplink communication is proposed alongside employing NOMA techniques for optimal throughput, and energy efficiency for a medical infrastructure. Numerical results show that effectual throughput and energy efficiency for a High Reliable Communication (HRC) device and Moderate Reliable Communication (MRC) device improve over 83.13% and 73.95%, respectively and their corresponding energy efficacy values show vast improvement (83.11% and 73.96% respectively). Likewise, for interference case both the throughput and the energy efficiency improve approximately over 93% for all devices.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images
Authors:
Shubham Paliwal,
Vishwanath D,
Rohit Rahul,
Monika Sharma,
Lovekesh Vig
Abstract:
With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular s…
▽ More
With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.
△ Less
Submitted 6 January, 2020;
originally announced January 2020.
-
One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis
Authors:
Vishal Sunder,
Ashwin Srinivasan,
Lovekesh Vig,
Gautam Shroff,
Rohit Rahul
Abstract:
Our interest in this paper is in meeting a rapidly growing industrial demand for information extraction from images of documents such as invoices, bills, receipts etc. In practice users are able to provide a very small number of example images labeled with the information that needs to be extracted. We adopt a novel two-level neuro-deductive, approach where (a) we use pre-trained deep neural netwo…
▽ More
Our interest in this paper is in meeting a rapidly growing industrial demand for information extraction from images of documents such as invoices, bills, receipts etc. In practice users are able to provide a very small number of example images labeled with the information that needs to be extracted. We adopt a novel two-level neuro-deductive, approach where (a) we use pre-trained deep neural networks to populate a relational database with facts about each document-image; and (b) we use a form of deductive reasoning, related to meta-interpretive learning of transition systems to learn extraction programs: Given task-specific transitions defined using the entities and relations identified by the neural detectors and a small number of instances (usually 1, sometimes 2) of images and the desired outputs, a resource-bounded meta-interpreter constructs proofs for the instance(s) via logical deduction; a set of logic programs that extract each desired entity is easily synthesized from such proofs. In most cases a single training example together with a noisy-clone of itself suffices to learn a program-set that generalizes well on test documents, at which time the value of each entity is determined by a majority vote across its program-set. We demonstrate our two-level neuro-deductive approach on publicly available datasets ("Patent" and "Doctor's Bills") and also describe its use in a real-life industrial problem.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
Automatic Information Extraction from Pi** and Instrumentation Diagrams
Authors:
Rohit Rahul,
Shubham Paliwal,
Monika Sharma,
Lovekesh Vig
Abstract:
One of the most common modes of representing engineering schematics are Pi** and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation,…
▽ More
One of the most common modes of representing engineering schematics are Pi** and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data-structure for capturing the structure of the pi** schematics. We evaluated proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained promising results.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Reading Industrial Inspection Sheets by Inferring Visual Relations
Authors:
Rohit Rahul,
Arindam Chowdhury,
Animesh,
Samarth Mittal,
Lovekesh Vig
Abstract:
The traditional mode of recording faults in heavy factory equipment has been via hand marked inspection sheets, wherein a machine engineer manually marks the faulty machine regions on a paper outline of the machine. Over the years, millions of such inspection sheets have been recorded and the data within these sheets has remained inaccessible. However, with industries going digital and waking up t…
▽ More
The traditional mode of recording faults in heavy factory equipment has been via hand marked inspection sheets, wherein a machine engineer manually marks the faulty machine regions on a paper outline of the machine. Over the years, millions of such inspection sheets have been recorded and the data within these sheets has remained inaccessible. However, with industries going digital and waking up to the potential value of fault data for machine health monitoring, there is an increased impetus towards digitization of these hand marked inspection records. To target this digitization, we propose a novel visual pipeline combining state of the art deep learning models, with domain knowledge and low level vision techniques, followed by inference of visual relationships. Our framework is robust to the presence of both static and non-static background in the document, variability in the machine template diagrams, unstructured shape of graphical objects to be identified and variability in the strokes of handwritten text. The proposed pipeline incorporates a capsule and spatial transformer network based classifier for accurate text reading, and a customized CTPN network for text detection in addition to hybrid techniques for arrow detection and dialogue cloud removal. We have tested our approach on a real world dataset of 50 inspection sheets for large containers and boilers. The results are visually appealing and the pipeline achieved an accuracy of 87.1% for text detection and 94.6% for text reading.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
Deep Reader: Information extraction from Document images via relation extraction and Natural Language
Authors:
Vishwanath D,
Rohit Rahul,
Gunjan Sehgal,
Swati,
Arindam Chowdhury,
Monika Sharma,
Lovekesh Vig,
Gautam Shroff,
Ashwin Srinivasan
Abstract:
Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and…
▽ More
Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and retrieving the corresponding structured representation for the document remains a challenge and finds application in a large number of real-world use cases. In this paper, we propose a novel enterprise based end-to-end framework called DeepReader which facilitates information extraction from document images via identification of visual entities and populating a meta relational model across different entities in the document image. The model schema allows for an easy to understand abstraction of the entities detected by the deep vision models and the relationships between them. DeepReader has a suite of state-of-the-art vision algorithms which are applied to recognize handwritten and printed text, eliminate noisy effects, identify the type of documents and detect visual entities like tables, lines and boxes. Deep Reader maps the extracted entities into a rich relational schema so as to capture all the relevant relationships between entities (words, textboxes, lines etc) detected in the document. Relevant information and fields can then be extracted from the document by writing SQL queries on top of the relationship tables. A natural language based interface is added on top of the relationship schema so that a non-technical user, specifying the queries in natural language, can fetch the information with minimal effort. In this paper, we also demonstrate many different capabilities of Deep Reader and report results on a real-world use case.
△ Less
Submitted 14 December, 2018; v1 submitted 11 December, 2018;
originally announced December 2018.