Search | arXiv e-print repository

Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis

Authors: Wen**g Shi, Phuong Tran, Sylvia Celedón-Pattichis, Marios S. Pattichis

Abstract: The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the ca… ▽ More The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the camera. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group. A massive independent testing dataset of 12,518,250 student label instances, of total duration of 21 hours and 22 minutes of real-life videos, is used for evaluating the performance of our proposed method for student group detection. The proposed method of using multiple image representations is shown to perform equally or better than YOLO on all video instances. Over the entire dataset, the proposed method achieved an F1 score of 0.85 compared to 0.80 for YOLO. Following student group detection, the paper presents the development of a dynamic participant tracking system for assessing student group participation through long video sessions. The proposed dynamic participant tracking system is shown to perform exceptionally well, missing a student in just one out of 35 testing videos. In comparison, a state of the art method fails to track students in 14 out of the 35 testing videos. The proposed method achieves 82.3% accuracy on an independent set of long, real-life collaborative videos. △ Less

Submitted 14 April, 2024; originally announced May 2024.

arXiv:2402.00261 [pdf, other]

Understanding Neural Network Systems for Image Analysis using Vector Spaces and Inverse Maps

Authors: Rebecca Pattichis, Marios S. Pattichis

Abstract: There is strong interest in develo** mathematical methods that can be used to understand complex neural networks used in image analysis. In this paper, we introduce techniques from Linear Algebra to model neural network layers as maps between signal spaces. First, we demonstrate how signal spaces can be used to visualize weight spaces and convolutional layer kernels. We also demonstrate how resi… ▽ More There is strong interest in develo** mathematical methods that can be used to understand complex neural networks used in image analysis. In this paper, we introduce techniques from Linear Algebra to model neural network layers as maps between signal spaces. First, we demonstrate how signal spaces can be used to visualize weight spaces and convolutional layer kernels. We also demonstrate how residual vector spaces can be used to further visualize information lost at each layer. Second, we introduce the concept of invertible networks and an algorithm for computing input images that yield specific outputs. We demonstrate our approach on two invertible networks and ResNet18. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2312.05352 [pdf, other]

A Review of Machine Learning Methods Applied to Video Analysis Systems

Authors: Marios S. Pattichis, Venkatesh Jatla, Alvaro E. Ullao Cerna

Abstract: The paper provides a survey of the development of machine-learning techniques for video analysis. The survey provides a summary of the most popular deep learning methods used for human activity recognition. We discuss how popular architectures perform on standard datasets and highlight the differences from real-life datasets dominated by multiple activities performed by multiple participants over… ▽ More The paper provides a survey of the development of machine-learning techniques for video analysis. The survey provides a summary of the most popular deep learning methods used for human activity recognition. We discuss how popular architectures perform on standard datasets and highlight the differences from real-life datasets dominated by multiple activities performed by multiple participants over long periods. For real-life datasets, we describe the use of low-parameter models (with 200X or 1,000X fewer parameters) that are trained to detect a single activity after the relevant objects have been successfully detected. Our survey then turns to a summary of machine learning methods that are specifically developed for working with a small number of labeled video samples. Our goal here is to describe modern techniques that are specifically designed so as to minimize the amount of ground truth that is needed for training and testing video analysis systems. We provide summaries of the development of self-supervised learning, semi-supervised learning, active learning, and zero-shot learning for applications in video analysis. For each method, we provide representative examples. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2207.00672 [pdf, other]

doi 10.1109/SSIAI49293.2020.9094589

The Importance of the Instantaneous Phase for classification using Convolutional Neural Networks

Authors: Luis Sanchez Tapia, Marios S. Pattichis, Sylvia Celedon-Pattichis, Carlos Lopez Leiva

Abstract: Large-scale training of Convolutional Neural Networks (CNN) is extremely demanding in terms of computational resources. Also, for specific applications, the standard use of transfer learning also tends to require far more resources than what may be needed. This work examines the impact of using AM-FM representations as input images for CNN classification applications. A comparison was made between… ▽ More Large-scale training of Convolutional Neural Networks (CNN) is extremely demanding in terms of computational resources. Also, for specific applications, the standard use of transfer learning also tends to require far more resources than what may be needed. This work examines the impact of using AM-FM representations as input images for CNN classification applications. A comparison was made between AM-FM components combinations and grayscale images as inputs for reduced and complete networks. The results showed that only the phase component produced significant predictions within a simple network. Neither IA or gray scale image were able to induce any learning in the system. Furthermore, the FM results were 7x faster during training and used 123x less parameters compared to state-of-the-art MobileNetV2 architecture, while maintaining comparable performance (AUC of 0.78 vs 0.79). △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2201.01380 [pdf, other]

doi 10.1109/TIP.2019.2944057

Image Processing Methods for Coronal Hole Segmentation, Matching, and Map Classification

Authors: V. Jatla, M. S. Pattichis, C. N. Arge

Abstract: The paper presents the results from a multi-year effort to develop and validate image processing methods for selecting the best physical models based on solar image observations. The approach consists of selecting the physical models based on their agreement with coronal holes extracted from the images. Ultimately, the goal is to use physical models to predict geomagnetic storms. We decompose the… ▽ More The paper presents the results from a multi-year effort to develop and validate image processing methods for selecting the best physical models based on solar image observations. The approach consists of selecting the physical models based on their agreement with coronal holes extracted from the images. Ultimately, the goal is to use physical models to predict geomagnetic storms. We decompose the problem into three subproblems: (i) coronal hole segmentation based on physical constraints, (ii) matching clusters of coronal holes between different maps, and (iii) physical map classification. For segmenting coronal holes, we develop a multi-modal method that uses segmentation maps from three different methods to initialize a level-set method that evolves the initial coronal hole segmentation to the magnetic boundary. Then, we introduce a new method based on Linear Programming for matching clusters of coronal holes. The final matching is then performed using Random Forests. The methods were carefully validated using consensus maps derived from multiple readers, manual clustering, manual map classification, and method validation for 50 maps. The proposed multi-modal segmentation method significantly outperformed SegNet, U-net, Henney-Harvey, and FCN by providing accurate boundary detection. Overall, the method gave a 95.5% map classification accuracy. △ Less

Submitted 4 January, 2022; originally announced January 2022.

Journal ref: IEEE Transactions on Image Processing 29 (2019): 1641-1653

arXiv:2112.13463 [pdf, other]

Bilingual Speech Recognition by Estimating Speaker Geometry from Video Data

Authors: Luis Sanchez Tapia, Antonio Gomez, Mario Esparza, Venkatesh Jatla, Marios Pattichis, Sylvia Celedón-Pattichis, Carlos LópezLeiva

Abstract: Speech recognition is very challenging in student learning environments that are characterized by significant cross-talk and background noise. To address this problem, we present a bilingual speech recognition system that uses an interactive video analysis system to estimate the 3D speaker geometry for realistic audio simulations. We demonstrate the use of our system in generating a complex audio… ▽ More Speech recognition is very challenging in student learning environments that are characterized by significant cross-talk and background noise. To address this problem, we present a bilingual speech recognition system that uses an interactive video analysis system to estimate the 3D speaker geometry for realistic audio simulations. We demonstrate the use of our system in generating a complex audio dataset that contains significant cross-talk and background noise that approximate real-life classroom recordings. We then test our proposed system with real-life recordings. In terms of the distance of the speakers from the microphone, our interactive video analysis system obtained a better average error rate of 10.83% compared to 33.12% for a baseline approach. Our proposed system gave an accuracy of 27.92% that is 1.5% better than Google Speech-to-text on the same dataset. In terms of 9 important keywords, our approach gave an average sensitivity of 38% compared to 24% for Google Speech-to-text, while both methods maintained high average specificity of 90% and 92%. On average, sensitivity improved from 24% to 38% for our proposed approach. On the other hand, specificity remained high for both methods (90% to 92%). △ Less

Submitted 26 December, 2021; originally announced December 2021.

Comments: 11 pages, 6 figures

Journal ref: The 19th International Conference on Computer Analysis of Images and Patterns (CAIP), 2021

arXiv:2112.13150 [pdf, other]

doi 10.1109/TIP.2017.2678799

Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures

Authors: Cesar Carranza, Daniel Llamocca, Marios Pattichis

Abstract: The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of… ▽ More The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of SVD-LU decompositions for low-rank kernels. The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices. Based on different types of available resources, for $P\times P$ blocks, 2D convolutions and cross-correlations can be computed in just $O(P)$ clock cycles up to $O(P^2)$ clock cycles. Thus, there is a trade-off between performance and required numbers and types of resources. We provide implementations of the proposed architectures using modern programmable devices (Virtex-7 and Zynq-SOC). Based on the amounts and types of required resources, we show that the proposed approaches significantly outperform current methods. △ Less

Submitted 24 December, 2021; originally announced December 2021.

Comments: The paper develops the fastest known methods for computing 2D convolutions in hardware

Journal ref: IEEE Transactions on Image Processing 26.5 (2017): 2230-2245

arXiv:2112.13149 [pdf, other]

doi 10.1109/TIP.2015.2501725

Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform

Authors: Cesar Carranza, Daniel Llamocca, Marios Pattichis

Abstract: The Discrete Periodic Radon Transform (DPRT) has been extensively used in applications that involve image reconstructions from projections. This manuscript introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory comp… ▽ More The Discrete Periodic Radon Transform (DPRT) has been extensively used in applications that involve image reconstructions from projections. This manuscript introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees, (iii) an image block-based approach to DPRT computation that can fit the proposed architecture to available resources, and (iv) fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an $N\times N$ image ($N$ prime), the proposed approach can compute up to $N^{2}$ additions per clock cycle. Compared to previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a $251\times 251$ image, for approximately $25\%$ fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized architectures that can compute the DPRT and its inverse in just $2N+\left\lceil \log_{2}N\right\rceil+1$ and $2N+3\left\lceil \log_{2}N\right\rceil+B+2$ cycles respectively, where $B$ is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-bit additions than for the systolic implementation and provides a trade-off between speed and additional 1-bit additions. All of the proposed DPRT architectures were implemented in VHDL and validated using an FPGA implementation. △ Less

Submitted 24 December, 2021; originally announced December 2021.

Comments: This paper has been published as follows: C. Carranza, D. Llamocca, and M. Pattichis. "Fast and scalable computation of the forward and inverse discrete periodic radon transform", IEEE Transactions on Image Processing, 25(1):119-133, Jan 2016

Journal ref: IEEE Transactions on Image Processing, 25(1):119-133, Jan 2016

arXiv:2112.12217 [pdf, other]

Person Detection in Collaborative Group Learning Environments Using Multiple Representations

Authors: Wen**g Shi, Marios S. Pattichis, Sylvia Celedón-Pattichis, Carlos LópezLeiva

Abstract: We introduce the problem of detecting a group of students from classroom videos. The problem requires the detection of students from different angles and the separation of the group from other groups in long videos (one to one and a half hours). We use multiple image representations to solve the problem. We use FM components to separate each group from background groups, AM-FM components for detec… ▽ More We introduce the problem of detecting a group of students from classroom videos. The problem requires the detection of students from different angles and the separation of the group from other groups in long videos (one to one and a half hours). We use multiple image representations to solve the problem. We use FM components to separate each group from background groups, AM-FM components for detecting the back-of-the-head, and YOLO for face detection. We use classroom videos from four different groups to validate our approach. Our use of multiple representations is shown to be significantly more accurate than the use of YOLO alone. △ Less

Submitted 22 December, 2021; originally announced December 2021.

arXiv:2110.13269 [pdf, other]

Facial Recognition in Collaborative Learning Videos

Authors: Phuong Tran, Marios Pattichis, Sylvia Celedón-Pattichis, Carlos LópezLeiva

Abstract: Face recognition in collaborative learning videos presents many challenges. In collaborative learning videos, students sit around a typical table at different positions to the recording camera, come and go, move around, get partially or fully occluded. Furthermore, the videos tend to be very long, requiring the development of fast and accurate methods. We develop a dynamic system of recognizing pa… ▽ More Face recognition in collaborative learning videos presents many challenges. In collaborative learning videos, students sit around a typical table at different positions to the recording camera, come and go, move around, get partially or fully occluded. Furthermore, the videos tend to be very long, requiring the development of fast and accurate methods. We develop a dynamic system of recognizing participants in collaborative learning systems. We address occlusion and recognition failures by using past information about the face detection history. We address the need for detecting faces from different poses and the need for speed by associating each participant with a collection of prototype faces computed through sampling or K-means clustering. Our results show that the proposed system is proven to be very fast and accurate. We also compare our system against a baseline system that uses InsightFace [2] and the original training video segments. We achieved an average accuracy of 86.2% compared to 70.8% for the baseline system. On average, our recognition rate was 28.1 times faster than the baseline system. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.07646 [pdf, other]

Talking Detection In Collaborative Learning Environments

Authors: Wen**g Shi, Marios S. Pattichis, Sylvia Celedón-Pattichis, Carlos LópezLeiva

Abstract: We study the problem of detecting talking activities in collaborative learning videos. Our approach uses head detection and projections of the log-magnitude of optical flow vectors to reduce the problem to a simple classification of small projection images without the need for training complex, 3-D activity classification systems. The small projection images are then easily classified using a simp… ▽ More We study the problem of detecting talking activities in collaborative learning videos. Our approach uses head detection and projections of the log-magnitude of optical flow vectors to reduce the problem to a simple classification of small projection images without the need for training complex, 3-D activity classification systems. The small projection images are then easily classified using a simple majority vote of standard classifiers. For talking detection, our proposed approach is shown to significantly outperform single activity systems. We have an overall accuracy of 59% compared to 42% for Temporal Segment Network (TSN) and 45% for Convolutional 3D (C3D). In addition, our method is able to detect multiple talking instances from multiple speakers, while also detecting the speakers themselves. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.07070 [pdf, other]

Fast Hand Detection in Collaborative Learning Environments

Authors: Sravani Teeparthi, Venkatesh Jatla, Marios S. Pattichis, Sylvia Celedon Pattichis, Carlos LopezLeiva

Abstract: Long-term object detection requires the integration of frame-based results over several seconds. For non-deformable objects, long-term detection is often addressed using object detection followed by video tracking. Unfortunately, tracking is inapplicable to objects that undergo dramatic changes in appearance from frame to frame. As a related example, we study hand detection over long video recordi… ▽ More Long-term object detection requires the integration of frame-based results over several seconds. For non-deformable objects, long-term detection is often addressed using object detection followed by video tracking. Unfortunately, tracking is inapplicable to objects that undergo dramatic changes in appearance from frame to frame. As a related example, we study hand detection over long video recordings in collaborative learning environments. More specifically, we develop long-term hand detection methods that can deal with partial occlusions and dramatic changes in appearance. Our approach integrates object-detection, followed by time projections, clustering, and small region removal to provide effective hand detection over long videos. The hand detector achieved average precision (AP) of 72% at 0.5 intersection over union (IoU). The detection results were improved to 81% by using our optimized approach for data augmentation. The method runs at 4.7x the real-time with AP of 81% at 0.5 intersection over the union. Our method reduced the number of false-positive hand detections by 80% by improving IoU ratios from 0.2 to 0.5. The overall hand detection system runs at 4x real-time. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2105.08191 [pdf]

doi 10.1109/ACCESS.2021.3077313

Adaptive Video Encoding For Different Video Codecs

Authors: Gangadharan Esakki, Andreas Panayides, Venkatesh Jatla, Marios Pattichis

Abstract: By 2022, we expect video traffic to reach 82% of the total internet traffic. Undoubtedly, the abundance of video-driven applications will likely lead internet video traffic percentage to a further increase in the near future, enabled by associate advances in video devices' capabilities. In response to this ever-growing demand, the Alliance for Open Media (AOM) and the Joint Video Experts Team (JVE… ▽ More By 2022, we expect video traffic to reach 82% of the total internet traffic. Undoubtedly, the abundance of video-driven applications will likely lead internet video traffic percentage to a further increase in the near future, enabled by associate advances in video devices' capabilities. In response to this ever-growing demand, the Alliance for Open Media (AOM) and the Joint Video Experts Team (JVET) have demonstrated strong and renewed interest in develo** new video codecs. In the fast-changing video codecs' landscape, there is thus, a genuine need to develop adaptive methods that can be universally applied to different codecs. In this study, we formulate video encoding as a multi-objective optimization process where video quality (as a function of VMAF and PSNR), bitrate demands, and encoding rate (in encoded frames per second) are jointly optimized, going beyond the standard video encoding approaches that focus on rate control targeting specific bandwidths. More specifically, we create a dense video encoding space (offline) and then employ regression to generate forward prediction models for each one of the afore-described optimization objectives, using only Pareto-optimal points. We demonstrate our adaptive video encoding approach that leverages the generated forward prediction models that qualify for real-time adaptation using different codecs (e.g., SVT-AV1 and x265) for a variety of video datasets and resolutions. To motivate our approach and establish the promise for future fast VVC encoders, we also perform a comparative performance evaluation using both subjective and objective metrics and report on bitrate savings among all possible pairs between VVC, SVT-AV1, x265, and VP9 codecs. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Video codecs, Video signal processing, Video coding, Video compression, Video quality, Video streaming, Adaptive video streaming, Versatile Video Coding, AV1, HEVC

Journal ref: IEEE Access 2021

arXiv:1911.04048 [pdf, other]

Multidataset Independent Subspace Analysis with Application to Multimodal Fusion

Authors: Rogers F. Silva, Sergey M. Plis, Tulay Adali, Marios S. Pattichis, Vince D. Calhoun

Abstract: In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivota… ▽ More In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivotal insights into complex systems. To take advantage of the complex multidimensional subspace structures that capture underlying modes of shared and unique variability across and within datasets, we present a direct, principled approach to multidataset combination. We design a new method called multidataset independent subspace analysis (MISA) that leverages joint information from multiple heterogeneous datasets in a flexible and synergistic fashion. Methodological innovations exploiting the Kotz distribution for subspace modeling in conjunction with a novel combinatorial optimization for evasion of local minima enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model. We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes and low signal-to-noise ratio scenarios, promoting novel applications in both unimodal and multimodal brain imaging data. △ Less

Submitted 10 November, 2019; originally announced November 2019.

Comments: For associated code, see https://github.com/rsilva8/MISA For associated data, see https://github.com/rsilva8/MISA-data Submitted to IEEE Transactions on Image Processing on Nov/7/2019: 13 pages, 8 figures Supplement: 16 pages, 5 figures

ACM Class: G.1.6; G.2.1; G.3; H.1.1; J.3; I.5.1; I.2.6

arXiv:1903.12613 [pdf, other]

doi 10.1007/s11207-019-1402-1

Estimating Total Open Heliospheric Magnetic Flux

Authors: S. Wallace, C. N. Arge, M. Pattichis, R. A. Hock-Mysliwiec, C. J. Henney

Abstract: Over the solar-activity cycle, there are extended periods where significant discrepancies occur between the spacecraft-observed total (unsigned) open magnetic flux and that determined from coronal models. In this article, the total open heliospheric magnetic flux is computed using two different methods and then compared with results obtained from in-situ interplanetary magnetic-field observations.… ▽ More Over the solar-activity cycle, there are extended periods where significant discrepancies occur between the spacecraft-observed total (unsigned) open magnetic flux and that determined from coronal models. In this article, the total open heliospheric magnetic flux is computed using two different methods and then compared with results obtained from in-situ interplanetary magnetic-field observations. The first method uses two different types of photospheric magnetic-field maps as input to the Wang Sheeley Arge (WSA) model: i) traditional Carrington or diachronic maps, and ii) Air Force Data Assimilative Photospheric Flux Transport model synchronic maps. The second method uses observationally derived helium and extreme-ultraviolet coronal-hole maps overlaid on the same magnetic-field maps in order to compute total open magnetic flux. The diachronic and synchronic maps are both constructed using magnetograms from the same source, namely the National Solar Observatory Kitt Peak Vacuum Telescope and Vector Spectromagnetograph. The results of this work show that the total open flux obtained from observationally derived coronal holes agrees remarkably well with that derived from WSA, especially near solar minimum. This suggests that, on average, coronal models capture well the observed large-scale coronal-hole structure over most of the solar cycle. Both methods show considerable deviations from total open flux deduced from spacecraft data, especially near solar maximum, pointing to something other than poorly determined coronal-hole area specification as the source of these discrepancies. △ Less

Submitted 29 March, 2019; originally announced March 2019.

Comments: 20 pages, 6 figures

Journal ref: Solar Phys (2019) 294:19

arXiv:1901.08125 [pdf, other]

Interpretable Neural Networks for Predicting Mortality Risk using Multi-modal Electronic Health Records

Authors: Alvaro E. Ulloa Cerna, Marios Pattichis, David P. vanMaanen, Linyuan **g, Aalpen A. Patel, Joshua V. Stough, Christopher M. Haggerty, Brandon K. Fornwalt

Abstract: We present an interpretable neural network for predicting an important clinical outcome (1-year mortality) from multi-modal Electronic Health Record (EHR) data. Our approach builds on prior multi-modal machine learning models by now enabling visualization of how individual factors contribute to the overall outcome risk, assuming other factors remain constant, which was previously impossible. We… ▽ More We present an interpretable neural network for predicting an important clinical outcome (1-year mortality) from multi-modal Electronic Health Record (EHR) data. Our approach builds on prior multi-modal machine learning models by now enabling visualization of how individual factors contribute to the overall outcome risk, assuming other factors remain constant, which was previously impossible. We demonstrate the value of this approach using a large multi-modal clinical dataset including both EHR data and 31,278 echocardiographic videos of the heart from 26,793 patients. We generated separate models for (i) clinical data only (CD) (e.g. age, sex, diagnoses and laboratory values), (ii) numeric variables derived from the videos, which we call echocardiography-derived measures (EDM), and (iii) CD+EDM+raw videos (pixel data). The interpretable multi-modal model maintained performance compared to non-interpretable models (Random Forest, XGBoost), and also performed significantly better than a model using a single modality (average AUC=0.82). Clinically relevant insights and multi-modal variable importance rankings were also facilitated by the new model, which have previously been impossible. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Comments: Submitted to IEEE JBHI

Journal ref: IEEE Journal of Biomedical and Health Informatics, 2019

arXiv:1811.10553 [pdf]

A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart

Authors: Alvaro Ulloa, Linyuan **g, Christopher W Good, David P vanMaanen, Sushravya Raghunath, Jonathan D Suever, Christopher D Nevius, Gregory J Wehner, Dustin Hartzel, Joseph B Leader, Amro Alsaid, Aalpen A Patel, H Lester Kirchner, Marios S Pattichis, Christopher M Haggerty, Brandon K Fornwalt

Abstract: Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. W… ▽ More Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions. △ Less

Submitted 14 May, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: We updated results with improved performance after dropout bug in tensorflow v1.12. We also added learning curves showing promise in video model with more samples

Showing 1–17 of 17 results for author: Pattichis, M