-
Automatic Infectious Disease Classification Analysis with Concept Discovery
Authors:
Elena Sizikova,
Joshua Vendrow,
Xu Cao,
Rachel Grotheer,
Jamie Haddock,
Lara Kassab,
Alona Kryshchenko,
Thomas Merkh,
R. W. M. A. Madushani,
Kenny Moise,
Annie Ulichney,
Huy V. Vo,
Chuntian Wang,
Megan Coffee,
Kathryn Leonard,
Deanna Needell
Abstract:
Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmiss…
▽ More
Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmission and improve clinical outcomes. In order to understand and trust neural network predictions, analysis of learned representations is necessary. In this work, we argue that automatic discovery of concepts, i.e., human interpretable attributes, allows for a deep understanding of learned information in medical image analysis tasks, generalizing beyond the training labels or protocols. We provide an overview of existing concept discovery approaches in medical image and computer vision communities, and evaluate representative methods on tuberculosis (TB) prediction and monkeypox prediction tasks. Finally, we propose NMFx, a general NMF formulation of interpretability by concept discovery that works in a unified way in unsupervised, weakly supervised, and supervised scenarios.
△ Less
Submitted 14 November, 2022; v1 submitted 28 August, 2022;
originally announced September 2022.
-
Semi-supervised Nonnegative Matrix Factorization for Document Classification
Authors:
Jamie Haddock,
Lara Kassab,
Sixian Li,
Alona Kryshchenko,
Rachel Grotheer,
Elena Sizikova,
Chuntian Wang,
Thomas Merkh,
RWMA Madushani,
Miju Ahn,
Deanna Needell,
Kathryn Leonard
Abstract:
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates f…
▽ More
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification, although the models are flexible to other supervised learning tasks such as regression. We illustrate the promise of these models and training methods on document classification datasets (e.g., 20 Newsgroups, Reuters).
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
Semi-supervised NMF Models for Topic Modeling in Learning Tasks
Authors:
Jamie Haddock,
Lara Kassab,
Sixian Li,
Alona Kryshchenko,
Rachel Grotheer,
Elena Sizikova,
Chuntian Wang,
Thomas Merkh,
R. W. M. A. Madushani,
Miju Ahn,
Deanna Needell,
Kathryn Leonard
Abstract:
We propose several new models for semi-supervised nonnegative matrix factorization (SSNMF) and provide motivation for SSNMF models as maximum likelihood estimators given specific distributions of uncertainty. We present multiplicative updates training methods for each new model, and demonstrate the application of these models to classification, although they are flexible to other supervised learni…
▽ More
We propose several new models for semi-supervised nonnegative matrix factorization (SSNMF) and provide motivation for SSNMF models as maximum likelihood estimators given specific distributions of uncertainty. We present multiplicative updates training methods for each new model, and demonstrate the application of these models to classification, although they are flexible to other supervised learning tasks. We illustrate the promise of these models and training methods on both synthetic and real data, and achieve high classification accuracy on the 20 Newsgroups dataset.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Sparseness-constrained Nonnegative Tensor Factorization for Detecting Topics at Different Time Scales
Authors:
Lara Kassab,
Alona Kryshchenko,
Hanbaek Lyu,
Denali Molitor,
Deanna Needell,
Elizaveta Rebrova,
Jiahong Yuan
Abstract:
Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this paper, we first show that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) is able to discover topics of variabl…
▽ More
Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this paper, we first show that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) is able to discover topics of variable persistence automatically. Then, we propose sparseness-constrained NCPD (S-NCPD) and its online variant in order to actively control the length of the learned topics effectively and efficiently. Further, we propose quantitative ways to measure the topic length and demonstrate the ability of S-NCPD (as well as its online variant) to discover short and long-lasting temporal topics in a controlled manner in semi-synthetic and real-world data including news headlines. We also demonstrate that the online variant of S-NCPD reduces the reconstruction error more rapidly than S-NCPD.
△ Less
Submitted 31 August, 2023; v1 submitted 4 October, 2020;
originally announced October 2020.
-
COVID-19 Literature Topic-Based Search via Hierarchical NMF
Authors:
Rachel Grotheer,
Yihuan Huang,
Pengyu Li,
Elizaveta Rebrova,
Deanna Needell,
Longxiu Huang,
Alona Kryshchenko,
Xia Li,
Kyung Ha,
Oleksandr Kryshchenko
Abstract:
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics…
▽ More
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Combining 3D Model Contour Energy and Keypoints for Object Tracking
Authors:
Bogdan Bugaev,
Anton Kryshchenko,
Roman Belov
Abstract:
We present a new combined approach for monocular model-based 3D tracking. A preliminary object pose is estimated by using a keypoint-based technique. The pose is then refined by optimizing the contour energy function. The energy determines the degree of correspondence between the contour of the model projection and the image edges. It is calculated based on both the intensity and orientation of th…
▽ More
We present a new combined approach for monocular model-based 3D tracking. A preliminary object pose is estimated by using a keypoint-based technique. The pose is then refined by optimizing the contour energy function. The energy determines the degree of correspondence between the contour of the model projection and the image edges. It is calculated based on both the intensity and orientation of the raw image gradient. For optimization, we propose a technique and search area constraints that allow overcoming the local optima and taking into account information obtained through keypoint-based pose estimation. Owing to its combined nature, our method eliminates numerous issues of keypoint-based and edge-based approaches. We demonstrate the efficiency of our method by comparing it with state-of-the-art methods on a public benchmark dataset that includes videos with various lighting conditions, movement patterns, and speed.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition
Authors:
Miju Ahn,
Nicole Eikmeier,
Jamie Haddock,
Lara Kassab,
Alona Kryshchenko,
Kathryn Leonard,
Deanna Needell,
R. W. M. A. Madushani,
Elena Sizikova,
Chuntian Wang
Abstract:
There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of t…
▽ More
There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower-dimensional nonnegative matrices. With this approach, however, information contained in the temporal dimension of the data is often neglected or underutilized. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAPAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors, thereby preserving the temporal information. The viability of NNCPD is demonstrated through application to both synthetic and real data, where significantly improved results are obtained compared to those of typical NMF-based methods. The advantages of NNCPD over such approaches are studied and discussed. To the best of our knowledge, this is the first time that NNCPD has been utilized for the purpose of dynamic topic modeling, and our findings will be transformative for both applications and further developments.
△ Less
Submitted 14 October, 2020; v1 submitted 2 January, 2020;
originally announced January 2020.