-
RaDe-GS: Rasterizing Depth in Gaussian Splatting
Authors:
Baowen Zhang,
Chuan Fang,
Rakesh Shrestha,
Yixun Liang,
Xiaoxiao Long,
** Tan
Abstract:
Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent tech…
▽ More
Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. It achieves a Chamfer distance error comparable to NeuraLangelo on the DTU dataset and maintains similar computational efficiency as the original 3D GS methods. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods.
△ Less
Submitted 24 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Authors:
Robik Shrestha,
Yang Zou,
Qiuyu Chen,
Zhiheng Li,
Yusheng Xie,
Siqi Deng
Abstract:
Existing text-to-image generative models reflect or even amplify societal biases ingrained in their training data. This is especially concerning for human image generation where models are biased against certain demographic groups. Existing attempts to rectify this issue are hindered by the inherent limitations of the pre-trained models and fail to substantially improve demographic diversity. In t…
▽ More
Existing text-to-image generative models reflect or even amplify societal biases ingrained in their training data. This is especially concerning for human image generation where models are biased against certain demographic groups. Existing attempts to rectify this issue are hindered by the inherent limitations of the pre-trained models and fail to substantially improve demographic diversity. In this work, we introduce Fair Retrieval Augmented Generation (FairRAG), a novel framework that conditions pre-trained generative models on reference images retrieved from an external image database to improve fairness in human generation. FairRAG enables conditioning through a lightweight linear module that projects reference images into the textual space. To enhance fairness, FairRAG applies simple-yet-effective debiasing strategies, providing images from diverse demographic groups during the generative process. Extensive experiments demonstrate that FairRAG outperforms existing methods in terms of demographic diversity, image-text alignment, and image fidelity while incurring minimal computational overhead during inference.
△ Less
Submitted 5 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
DomainLab: A modular Python package for domain generalization in deep learning
Authors:
Xudong Sun,
Carla Feistner,
Alexej Gossmann,
George Schwarz,
Rao Muhammad Umer,
Lisa Beer,
Patrick Rockenschaub,
Rahul Babu Shrestha,
Armin Gruber,
Nutan Chen,
Sayedali Shetab Boushehri,
Florian Buettner,
Carsten Marr
Abstract:
Poor generalization performance caused by distribution shifts in unseen domains often hinders the trustworthy deployment of deep neural networks. Many domain generalization techniques address this problem by adding a domain invariant regularization loss terms during training. However, there is a lack of modular software that allows users to combine the advantages of different methods with minimal…
▽ More
Poor generalization performance caused by distribution shifts in unseen domains often hinders the trustworthy deployment of deep neural networks. Many domain generalization techniques address this problem by adding a domain invariant regularization loss terms during training. However, there is a lack of modular software that allows users to combine the advantages of different methods with minimal effort for reproducibility. DomainLab is a modular Python package for training user specified neural networks with composable regularization loss terms. Its decoupled design allows the separation of neural networks from regularization loss construction. Hierarchical combinations of neural networks, different domain generalization methods, and associated hyperparameters, can all be specified together with other experimental setup in a single configuration file. Hierarchical combinations of neural networks, different domain generalization methods, and associated hyperparameters, can all be specified together with other experimental setup in a single configuration file. In addition, DomainLab offers powerful benchmarking functionality to evaluate the generalization performance of neural networks in out-of-distribution data. The package supports running the specified benchmark on an HPC cluster or on a standalone machine. The package is well tested with over 95 percent coverage and well documented. From the user perspective, it is closed to modification but open to extension. The package is under the MIT license, and its source code, tutorial and documentation can be found at https://github.com/marrlab/DomainLab.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Comparing the willingness to share for human-generated vs. AI-generated fake news
Authors:
Amirsiavosh Bashardoust,
Stefan Feuerriegel,
Yash Raj Shrestha
Abstract:
Generative artificial intelligence (AI) presents large risks for society when it is used to create fake news. A crucial factor for fake news to go viral on social media is that users share such content. Here, we aim to shed light on the sharing behavior of users across human-generated vs. AI-generated fake news. Specifically, we study: (1) What is the perceived veracity of human-generated fake new…
▽ More
Generative artificial intelligence (AI) presents large risks for society when it is used to create fake news. A crucial factor for fake news to go viral on social media is that users share such content. Here, we aim to shed light on the sharing behavior of users across human-generated vs. AI-generated fake news. Specifically, we study: (1) What is the perceived veracity of human-generated fake news vs. AI-generated fake news? (2) What is the user's willingness to share human-generated fake news vs. AI-generated fake news on social media? (3) What socio-economic characteristics let users fall for AI-generated fake news? To this end, we conducted a pre-registered, online experiment with $N=$ 988 subjects and 20 fake news from the COVID-19 pandemic generated by GPT-4 vs. humans. Our findings show that AI-generated fake news is perceived as less accurate than human-generated fake news, but both tend to be shared equally. Further, several socio-economic factors explain who falls for AI-generated fake news.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Residual Learning for Image Point Descriptors
Authors:
Rashik Shrestha,
Ajad Chhatkuli,
Menelaos Kanakis,
Luc Van Gool
Abstract:
Local image feature descriptors have had a tremendous impact on the development and application of computer vision methods. It is therefore unsurprising that significant efforts are being made for learning-based image point descriptors. However, the advantage of learned methods over handcrafted methods in real applications is subtle and more nuanced than expected. Moreover, handcrafted descriptors…
▽ More
Local image feature descriptors have had a tremendous impact on the development and application of computer vision methods. It is therefore unsurprising that significant efforts are being made for learning-based image point descriptors. However, the advantage of learned methods over handcrafted methods in real applications is subtle and more nuanced than expected. Moreover, handcrafted descriptors such as SIFT and SURF still perform better point localization in Structure-from-Motion (SfM) compared to many learned counterparts. In this paper, we propose a very simple and effective approach to learning local image descriptors by using a hand-crafted detector and descriptor. Specifically, we choose to learn only the descriptors, supported by handcrafted descriptors while discarding the point localization head. We optimize the final descriptor by leveraging the knowledge already present in the handcrafted descriptor. Such an approach of optimization allows us to discard learning knowledge already present in non-differentiable functions such as the hand-crafted descriptors and only learn the residual knowledge in the main network branch. This offers 50X convergence speed compared to the standard baseline architecture of SuperPoint while at inference the combined descriptor provides superior performance over the learned and hand-crafted descriptors. This is done with minor increase in the computations over the baseline learned descriptor. Our approach has potential applications in ensemble learning and learning with non-differentiable functions. We perform experiments in matching, camera localization and Structure-from-Motion in order to showcase the advantages of our approach.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
CaLDiff: Camera Localization in NeRF via Pose Diffusion
Authors:
Rashik Shrestha,
Bishad Koju,
Abhigyan Bhusal,
Danda Pani Paudel,
François Rameau
Abstract:
With the widespread use of NeRF-based implicit 3D representation, the need for camera localization in the same representation becomes manifestly apparent. Doing so not only simplifies the localization process -- by avoiding an outside-the-NeRF-based localization -- but also has the potential to offer the benefit of enhanced localization. This paper studies the problem of localizing cameras in NeRF…
▽ More
With the widespread use of NeRF-based implicit 3D representation, the need for camera localization in the same representation becomes manifestly apparent. Doing so not only simplifies the localization process -- by avoiding an outside-the-NeRF-based localization -- but also has the potential to offer the benefit of enhanced localization. This paper studies the problem of localizing cameras in NeRF using a diffusion model for camera pose adjustment. More specifically, given a pre-trained NeRF model, we train a diffusion model that iteratively updates randomly initialized camera poses, conditioned upon the image to be localized. At test time, a new camera is localized in two steps: first, coarse localization using the proposed pose diffusion process, followed by local refinement steps of a pose inversion process in NeRF. In fact, the proposed camera localization by pose diffusion (CaLDiff) method also integrates the pose inversion steps within the diffusion process. Such integration offers significantly better localization, thanks to our downstream refinement-aware diffusion process. Our exhaustive experiments on challenging real-world data validate our method by providing significantly better results than the compared methods and the established baselines. Our source code will be made publicly available.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Conditional Image Generation with Pretrained Generative Model
Authors:
Rajesh Shrestha,
Bowen Xie
Abstract:
In recent years, diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. However, like any other large generative models, these models require a huge amount of data, computational resources, and meticulous tuning for successful training. This poses a significant challenge, rendering it infeasible for most individuals. As a result, th…
▽ More
In recent years, diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. However, like any other large generative models, these models require a huge amount of data, computational resources, and meticulous tuning for successful training. This poses a significant challenge, rendering it infeasible for most individuals. As a result, the research community has devised methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative. These methods enable conditional image generations on diverse inputs and, most importantly, circumvent the need for training the diffusion model. In this paper, our objective is to reduce the time-required and computational overhead introduced by the addition of guidance in diffusion models -- while maintaining comparable image quality. We propose a set of methods based on our empirical analysis, demonstrating a reduction in computation time by approximately threefold.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
Authors:
Yunye Gong,
Robik Shrestha,
Jared Claypoole,
Michael Cogswell,
Arijit Ray,
Christopher Kanan,
Ajay Divakaran
Abstract:
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxo…
▽ More
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxonomy, a classic framework for learning assessment widely adopted in education research. Our data maps to a novel hierarchical graph representation which enables automatic data augmentation and novel measures characterizing model consistency. We perform graded evaluation and reliability analysis on recent multi-modal models. In comparison to low-level tasks, we observe decreased performance on tasks requiring advanced comprehension and cognitive skills with up to 38.0\% drop in VQA accuracy. In comparison to earlier models, GPT-4V demonstrates improved accuracy over all comprehension levels and shows a tendency of bypassing visual inputs especially for higher-level tasks. Current models also show consistency patterns misaligned with human comprehension in various scenarios, demonstrating the need for improvement based on theoretically-grounded criteria.
△ Less
Submitted 10 June, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Theoretical Analysis of the Radio Map Estimation Problem
Authors:
Daniel Romero,
Tien Ngoc Ha,
Raju Shrestha,
Massimo Franceschetti
Abstract:
Radio maps provide radio frequency metrics, such as the received signal strength, at every location of a geographic area. These maps, which are estimated using a set of measurements collected at multiple positions, find a wide range of applications in wireless communications, including the prediction of coverage holes, network planning, resource allocation, and path planning for mobile robots. Alt…
▽ More
Radio maps provide radio frequency metrics, such as the received signal strength, at every location of a geographic area. These maps, which are estimated using a set of measurements collected at multiple positions, find a wide range of applications in wireless communications, including the prediction of coverage holes, network planning, resource allocation, and path planning for mobile robots. Although a vast number of estimators have been proposed, the theoretical understanding of the radio map estimation (RME) problem has not been addressed. The present work aims at filling this gap along two directions. First, the complexity of the set of radio map functions is quantified by means of lower and upper bounds on their spatial variability, which offers valuable insight into the required spatial distribution of measurements and the estimators that can be used. Second, the reconstruction error for power maps in free space is upper bounded for three conventional spatial interpolators. The proximity coefficient, which is a decreasing function of the distance from the transmitters to the mapped region, is proposed to quantify the complexity of the RME problem. Numerical experiments assess the tightness of the obtained bounds and the validity of the main takeaways in complex environments.
△ Less
Submitted 23 March, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Radio Map Estimation: Empirical Validation and Analysis
Authors:
Raju Shrestha,
Tien Ngoc Ha,
Pham Q. Viet,
Daniel Romero
Abstract:
Radio maps quantify magnitudes such as the received signal strength at every location of a geographical region. Although the estimation of radio maps has attracted widespread interest, the vast majority of works rely on simulated data and, therefore, cannot establish the effectiveness and relative performance of existing algorithms in practice. To fill this gap, this paper presents the first compr…
▽ More
Radio maps quantify magnitudes such as the received signal strength at every location of a geographical region. Although the estimation of radio maps has attracted widespread interest, the vast majority of works rely on simulated data and, therefore, cannot establish the effectiveness and relative performance of existing algorithms in practice. To fill this gap, this paper presents the first comprehensive and rigorous study of radio map estimation (RME) in the real world. The main features of the RME problem are analyzed and the capabilities of existing estimators are compared using large measurement datasets collected in this work. By studying four performance metrics, recent theoretical findings are empirically corroborated and a large number of conclusions are drawn. Remarkably, the estimation error is seen to be reasonably small even with few measurements, which establishes the viability of RME in practice. Besides, from extensive comparisons, it is concluded that estimators based on deep neural networks necessitate large volumes of training data to exhibit a significant advantage over more traditional methods. Combining both types of schemes is seen to result in a novel estimator that features the best performance in most situations. The acquired datasets are made publicly available to enable further studies.
△ Less
Submitted 22 January, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Data Augmentation through Pseudolabels in Automatic Region Based Coronary Artery Segmentation for Disease Diagnosis
Authors:
Sandesh Pokhrel,
Sanjay Bhandari,
Eduard Vazquez,
Yash Raj Shrestha,
Binod Bhattarai
Abstract:
Coronary Artery Diseases(CADs) though preventable are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Segmentation of arteries in angiographic images has evolved as a tool for assistance, hel** clinicians in making accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset, the…
▽ More
Coronary Artery Diseases(CADs) though preventable are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Segmentation of arteries in angiographic images has evolved as a tool for assistance, hel** clinicians in making accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging. In this study, we introduce the idea of using pseudolabels as a data augmentation technique to improve the performance of the baseline Yolo model. This method increases the F1 score of the baseline by 9% in the validation dataset and by 3% in the test dataset.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
ConvNeXtv2 Fusion with Mask R-CNN for Automatic Region Based Coronary Artery Stenosis Detection for Disease Diagnosis
Authors:
Sandesh Pokhrel,
Sanjay Bhandari,
Eduard Vazquez,
Yash Raj Shrestha,
Binod Bhattarai
Abstract:
Coronary Artery Diseases although preventable are one of the leading cause of mortality worldwide. Due to the onerous nature of diagnosis, tackling CADs has proved challenging. This study addresses the automation of resource-intensive and time-consuming process of manually detecting stenotic lesions in coronary arteries in X-ray coronary angiography images. To overcome this challenge, we employ a…
▽ More
Coronary Artery Diseases although preventable are one of the leading cause of mortality worldwide. Due to the onerous nature of diagnosis, tackling CADs has proved challenging. This study addresses the automation of resource-intensive and time-consuming process of manually detecting stenotic lesions in coronary arteries in X-ray coronary angiography images. To overcome this challenge, we employ a specialized Convnext-V2 backbone based Mask RCNN model pre-trained for instance segmentation tasks. Our empirical findings affirm that the proposed model exhibits commendable performance in identifying stenotic lesions. Notably, our approach achieves a substantial F1 score of 0.5353 in this demanding task, underscoring its effectiveness in streamlining this intensive process.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
Authors:
Chuan Fang,
Yuan Dong,
Kunming Luo,
Xiaotao Hu,
Rakesh Shrestha,
** Tan
Abstract:
Text-driven 3D indoor scene generation is useful for gaming, the film industry, and AR/VR applications. However, existing methods cannot faithfully capture the room layout, nor do they allow flexible editing of individual objects in the room. To address these problems, we present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a te…
▽ More
Text-driven 3D indoor scene generation is useful for gaming, the film industry, and AR/VR applications. However, existing methods cannot faithfully capture the room layout, nor do they allow flexible editing of individual objects in the room. To address these problems, we present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt. Moreover, Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items. Our key insight is to separate the modeling of layouts and appearance. Our proposed method consists of two stages: a Layout Generation Stage and an Appearance Generation Stage. The Layout Generation Stage trains a text-conditional diffusion model to learn the layout distribution with our holistic scene code parameterization. Next, the Appearance Generation Stage employs a fine-tuned ControlNet to produce a vivid panoramic image of the room guided by the 3D scene layout and text prompt. We thus achieve a high-quality 3D room generation with convincing layouts and lively textures. Benefiting from the scene code parameterization, we can easily edit the generated room model through our mask-guided editing module, without expensive edit-specific training. Extensive experiments on the Structured3D dataset demonstrate that our method outperforms existing methods in producing more reasonable, view-consistent, and editable 3D rooms from natural language prompts.
△ Less
Submitted 1 July, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Development of a Feeding Assistive Robot Using a Six Degree of Freedom Robotic Arm
Authors:
Md Esharuzzaman Emu,
Samarjith Biswas,
Rajendra Shrestha
Abstract:
This project introduces a Feeding Assistive Robot tailored to individuals with physical disabilities, including those with limited arm function or hand control. The core component is a precise 6-degree freedom robotic arm, operated seamlessly through voice commands. Integration of an Arduino-based Braccio Arm, a distance sensor, and Bluetooth module enables voice-controlled movements. The primary…
▽ More
This project introduces a Feeding Assistive Robot tailored to individuals with physical disabilities, including those with limited arm function or hand control. The core component is a precise 6-degree freedom robotic arm, operated seamlessly through voice commands. Integration of an Arduino-based Braccio Arm, a distance sensor, and Bluetooth module enables voice-controlled movements. The primary goal is to empower users to independently select and consume meals, whether at a dining table or in bed. The system's adaptability, responsiveness, and versatility in serving three different food items mark a significant advancement in enhancing the quality of life for individuals with physical challenges, promoting autonomy in daily activities.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Large Language Models for Difficulty Estimation of Foreign Language Content with Application to Language Learning
Authors:
Michalis Vlachos,
Mircea Lungu,
Yash Raj Shrestha,
Johannes-Rudolf David
Abstract:
We use large language models to aid learners enhance proficiency in a foreign language. This is accomplished by identifying content on topics that the user is interested in, and that closely align with the learner's proficiency level in that foreign language. Our work centers on French content, but our approach is readily transferable to other languages. Our solution offers several distinctive cha…
▽ More
We use large language models to aid learners enhance proficiency in a foreign language. This is accomplished by identifying content on topics that the user is interested in, and that closely align with the learner's proficiency level in that foreign language. Our work centers on French content, but our approach is readily transferable to other languages. Our solution offers several distinctive characteristics that differentiate it from existing language-learning solutions, such as, a) the discovery of content across topics that the learner cares about, thus increasing motivation, b) a more precise estimation of the linguistic difficulty of the content than traditional readability measures, and c) the availability of both textual and video-based content. The linguistic complexity of video content is derived from the video captions. It is our aspiration that such technology will enable learners to remain engaged in the language-learning process by continuously adapting the topics and the difficulty of the content to align with the learners' evolving interests and learning objectives.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Distributionally Robust Optimization and Invariant Representation Learning for Addressing Subgroup Underrepresentation: Mechanisms and Limitations
Authors:
Nilesh Kumar,
Ruby Shrestha,
Zhiyuan Li,
Linwei Wang
Abstract:
Spurious correlation caused by subgroup underrepresentation has received increasing attention as a source of bias that can be perpetuated by deep neural networks (DNNs). Distributionally robust optimization has shown success in addressing this bias, although the underlying working mechanism mostly relies on upweighting under-performing samples as surrogates for those underrepresented in data. At t…
▽ More
Spurious correlation caused by subgroup underrepresentation has received increasing attention as a source of bias that can be perpetuated by deep neural networks (DNNs). Distributionally robust optimization has shown success in addressing this bias, although the underlying working mechanism mostly relies on upweighting under-performing samples as surrogates for those underrepresented in data. At the same time, while invariant representation learning has been a powerful choice for removing nuisance-sensitive features, it has been little considered in settings where spurious correlations are caused by significant underrepresentation of subgroups. In this paper, we take the first step to better understand and improve the mechanisms for debiasing spurious correlation due to subgroup underrepresentation in medical image classification. Through a comprehensive evaluation study, we first show that 1) generalized reweighting of under-performing samples can be problematic when bias is not the only cause for poor performance, while 2) naive invariant representation learning suffers from spurious correlations itself. We then present a novel approach that leverages robust optimization to facilitate the learning of invariant representations at the presence of spurious correlations. Finetuned classifiers utilizing such representation demonstrated improved abilities to reduce subgroup performance disparity, while maintaining high average and worst-group performance.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Deep-learning assisted detection and quantification of (oo)cysts of Giardia and Cryptosporidium on smartphone microscopy images
Authors:
Suprim Nakarmi,
Sanam Pudasaini,
Safal Thapaliya,
Pratima Upretee,
Retina Shrestha,
Basant Giri,
Bhanu Bhakta Neupane,
Bishesh Khanal
Abstract:
The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identificatio…
▽ More
The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identification by trained technicians, usually unavailable in resource-limited settings. Automatic detection of (oo)cysts using deep-learning-based object detection could offer a solution for this limitation. We evaluate the performance of three state-of-the-art object detectors to detect (oo)cysts of Giardia and Cryptosporidium on a custom dataset that includes both smartphone and brightfield microscopic images from vegetable samples. Faster RCNN, RetinaNet, and you only look once (YOLOv8s) deep-learning models were employed to explore their efficacy and limitations. Our results show that while the deep-learning models perform better with the brightfield microscopy image dataset than the smartphone microscopy image dataset, the smartphone microscopy predictions are still comparable to the prediction performance of non-experts.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Natural Gradient Methods: Perspectives, Efficient-Scalable Approximations, and Analysis
Authors:
Rajesh Shrestha
Abstract:
Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information Matrix is equivalent to the Generalized Gauss-Newton Method, that both approximate the Hessian. It is an appealing method to be used as an alternative to stochas…
▽ More
Natural Gradient Descent, a second-degree optimization method motivated by the information geometry, makes use of the Fisher Information Matrix instead of the Hessian which is typically used. However, in many cases, the Fisher Information Matrix is equivalent to the Generalized Gauss-Newton Method, that both approximate the Hessian. It is an appealing method to be used as an alternative to stochastic gradient descent, potentially leading to faster convergence. However, being a second-order method makes it infeasible to be used directly in problems with a huge number of parameters and data. This is evident from the community of deep learning sticking with the stochastic gradient descent method since the beginning. In this paper, we look at the different perspectives on the natural gradient method, study the current developments on its efficient-scalable empirical approximations, and finally examine their performance with extensive experiments.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Tumult Analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy
Authors:
Skye Berghel,
Philip Bohannon,
Damien Desfontaines,
Charles Estes,
Sam Haney,
Luke Hartman,
Michael Hay,
Ashwin Machanavajjhala,
Tom Magerlein,
Gerome Miklau,
Amritha Pai,
William Sexton,
Ruchit Shrestha
Abstract:
In this short paper, we outline the design of Tumult Analytics, a Python framework for differential privacy used at institutions such as the U.S. Census Bureau, the Wikimedia Foundation, or the Internal Revenue Service.
In this short paper, we outline the design of Tumult Analytics, a Python framework for differential privacy used at institutions such as the U.S. Census Bureau, the Wikimedia Foundation, or the Internal Revenue Service.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Treatment classification of posterior capsular opacification (PCO) using automated ground truths
Authors:
Raisha Shrestha,
Waree Kongprawechnon,
Teesid Leelasawassuk,
Nattapon Wongcumchang,
Oliver Findl,
Nino Hirnschall
Abstract:
Determination of treatment need of posterior capsular opacification (PCO)-- one of the most common complication of cataract surgery -- is a difficult process due to its local unavailability and the fact that treatment is provided only after PCO occurs in the central visual axis. In this paper we propose a deep learning (DL)-based method to first segment PCO images then classify the images into \te…
▽ More
Determination of treatment need of posterior capsular opacification (PCO)-- one of the most common complication of cataract surgery -- is a difficult process due to its local unavailability and the fact that treatment is provided only after PCO occurs in the central visual axis. In this paper we propose a deep learning (DL)-based method to first segment PCO images then classify the images into \textit{treatment required} and \textit{not yet required} cases in order to reduce frequent hospital visits. To train the model, we prepare a training image set with ground truths (GT) obtained from two strategies: (i) manual and (ii) automated. So, we have two models: (i) Model 1 (trained with image set containing manual GT) (ii) Model 2 (trained with image set containing automated GT). Both models when evaluated on validation image set gave Dice coefficient value greater than 0.8 and intersection-over-union (IoU) score greater than 0.67 in our experiments. Comparison between gold standard GT and segmented results from our models gave a Dice coefficient value greater than 0.7 and IoU score greater than 0.6 for both the models showing that automated ground truths can also result in generation of an efficient model. Comparison between our classification result and clinical classification shows 0.98 F2-score for outputs from both the models.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Authors:
Selim Fekih,
Nicolò Tamagnone,
Benjamin Minixhofer,
Ranjan Shrestha,
Ximena Contla,
Ewan Oglethorpe,
Navid Rekabsaz
Abstract:
Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian respons…
▽ More
Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HUMSET provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HUMSET also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of experiments on Pre-trained Language Models (PLM) to establish strong baselines for future research in this domain. The dataset is available at https://blog.thedeep.io/humset/.
△ Less
Submitted 6 November, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Absolute Security in High-Frequency Wireless Links
Authors:
Alejandro Cohen,
Rafael G. L. D'Oliveira,
Chia-Yi Yeh,
Hichem Guerboukha,
Rabi Shrestha,
Zhaoji Fang,
Edward Knightly,
Muriel Médard,
Daniel M. Mittleman
Abstract:
Security against eavesdrop** is one of the key concerns in the design of any communication system. Many common considerations of the security of a wireless communication channel rely on comparing the signal level measured by Bob (the intended receiver) to that accessible to Eve (an eavesdropper). Frameworks such as Wyner's wiretap model ensure the security of a link, in an average sense, when Bo…
▽ More
Security against eavesdrop** is one of the key concerns in the design of any communication system. Many common considerations of the security of a wireless communication channel rely on comparing the signal level measured by Bob (the intended receiver) to that accessible to Eve (an eavesdropper). Frameworks such as Wyner's wiretap model ensure the security of a link, in an average sense, when Bob's signal-to-noise ratio exceeds Eve's. Unfortunately, because these guarantees rely on statistical assumptions about noise, Eve can still occasionally succeed in decoding information. The goal of achieving exactly zero probability of intercept over an engineered region of the broadcast sector, which we term absolute security, remains elusive. Here, we describe the first architecture for a wireless link which provides absolute security. Our approach relies on the inherent properties of broadband and high-gain antennas, and is therefore ideally suited for implementation in millimeter-wave and terahertz wireless systems, where such antennas will generally be employed. We exploit spatial minima of the antenna pattern at different frequencies, the union of which defines a wide region where Eve is guaranteed to fail regardless of her computational capabilities, and regardless of the noise in the channels. Unlike conventional zero-forcing beam forming methods, we show that, for realistic assumptions about the antenna configuration and power budget, this absolute security guarantee can be achieved over most possible eavesdropper locations. Since we use relatively simple frequency-multiplexed coding, together with the underlying physics of a diffracting aperture, this idea is broadly applicable in many contexts.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Precision-based attacks and interval refining: how to break, then fix, differential privacy on finite computers
Authors:
Samuel Haney,
Damien Desfontaines,
Luke Hartman,
Ruchit Shrestha,
Michael Hay
Abstract:
Despite being raised as a problem over ten years ago, the imprecision of floating point arithmetic continues to cause privacy failures in the implementations of differentially private noise mechanisms. In this paper, we highlight a new class of vulnerabilities, which we call \emph{precision-based attacks}, and which affect several open source libraries. To address this vulnerability and implement…
▽ More
Despite being raised as a problem over ten years ago, the imprecision of floating point arithmetic continues to cause privacy failures in the implementations of differentially private noise mechanisms. In this paper, we highlight a new class of vulnerabilities, which we call \emph{precision-based attacks}, and which affect several open source libraries. To address this vulnerability and implement differentially private mechanisms on floating-point space in a safe way, we propose a novel technique, called \emph{interval refining}. This technique has minimal error, provable privacy, and broad applicability. We use interval refining to design and implement a variant of the Laplace mechanism that is equivalent to sampling from the Laplace distribution and rounding to a float. We report on the performance of this approach, and discuss how interval refining can be used to implement other mechanisms safely, including the Gaussian mechanism and the exponential mechanism.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Localizing Router Configuration Errors Using Minimal Correction Sets
Authors:
Aaron Gember-Jacobson,
Ruchit Shrestha,
Xiaolin Sun
Abstract:
Router configuration errors are unfortunately common and difficult to localize using current network verifiers. We introduce a novel configuration error localizer (CEL) that precisely identifies which configuration segments contribute to the violation of forwarding requirements. In particular, CEL generates a system of satisfiability modulo theories (SMT) constraints-which encode a network's confi…
▽ More
Router configuration errors are unfortunately common and difficult to localize using current network verifiers. We introduce a novel configuration error localizer (CEL) that precisely identifies which configuration segments contribute to the violation of forwarding requirements. In particular, CEL generates a system of satisfiability modulo theories (SMT) constraints-which encode a network's configurations, control logic, and forwarding requirements-and uses a domain-specific minimal correction set (MCS) enumeration algorithm to identify problematic configuration segments. CEL efficiently locates several configuration errors in real university networks and identifies all routing-related and at least half of all ACL-related errors we introduce.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
Authors:
Robik Shrestha,
Kushal Kafle,
Christopher Kanan
Abstract:
Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we prop…
▽ More
Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we propose OccamNets, which are biased to favor simpler solutions by design. OccamNets have two inductive biases. First, they are biased to use as little network depth as needed for an individual example. Second, they are biased toward using fewer image locations for prediction. While OccamNets are biased toward simpler hypotheses, they can learn more complex hypotheses if necessary. In experiments, OccamNets outperform or rival state-of-the-art methods run on architectures that do not incorporate these inductive biases. Furthermore, we demonstrate that when the state-of-the-art debiasing methods are combined with OccamNets results further improve.
△ Less
Submitted 14 April, 2024; v1 submitted 5 April, 2022;
originally announced April 2022.
-
A Real World Dataset for Multi-view 3D Reconstruction
Authors:
Rakesh Shrestha,
Siqi Hu,
Minghao Gou,
Ziyuan Liu,
** Tan
Abstract:
We present a dataset of 998 3D models of everyday tabletop objects along with their 847,000 real world RGB and depth images. Accurate annotations of camera poses and object poses for each image are performed in a semi-automated fashion to facilitate the use of the dataset for myriad 3D applications like shape reconstruction, object pose estimation, shape retrieval etc. We primarily focus on learne…
▽ More
We present a dataset of 998 3D models of everyday tabletop objects along with their 847,000 real world RGB and depth images. Accurate annotations of camera poses and object poses for each image are performed in a semi-automated fashion to facilitate the use of the dataset for myriad 3D applications like shape reconstruction, object pose estimation, shape retrieval etc. We primarily focus on learned multi-view 3D reconstruction due to the lack of appropriate real world benchmark for the task and demonstrate that our dataset can fill that gap. The entire annotated dataset along with the source code for the annotation tools and evaluation baselines is available at http://www.ocrtoc.org/3d-reconstruction.html.
△ Less
Submitted 8 August, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs
Authors:
Raju Shrestha,
Daniel Romero,
Sundeep Prabhakar Chepuri
Abstract:
Radio maps find numerous applications in wireless communications and mobile robotics tasks, including resource allocation, interference coordination, and mission planning. Although numerous techniques have been proposed to construct radio maps from spatially distributed measurements, the locations of such measurements are assumed predetermined beforehand. In contrast, this paper proposes spectrum…
▽ More
Radio maps find numerous applications in wireless communications and mobile robotics tasks, including resource allocation, interference coordination, and mission planning. Although numerous techniques have been proposed to construct radio maps from spatially distributed measurements, the locations of such measurements are assumed predetermined beforehand. In contrast, this paper proposes spectrum surveying, where a mobile robot such as an unmanned aerial vehicle (UAV) collects measurements at a set of locations that are actively selected to obtain high-quality map estimates in a short surveying time. This is performed in two steps. First, two novel algorithms, a model-based online Bayesian estimator and a data-driven deep learning algorithm, are devised for updating a map estimate and an uncertainty metric that indicates the informativeness of measurements at each possible location. These algorithms offer complementary benefits and feature constant complexity per measurement. Second, the uncertainty metric is used to plan the trajectory of the UAV to gather measurements at the most informative locations. To overcome the combinatorial complexity of this problem, a dynamic programming approach is proposed to obtain lists of waypoints through areas of large uncertainty in linear time. Numerical experiments conducted on a realistic dataset confirm that the proposed scheme constructs accurate radio maps quickly.
△ Less
Submitted 13 January, 2022; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Towards Automatic Bias Detection in Knowledge Graphs
Authors:
Daphna Keidar,
Mian Zhong,
Ce Zhang,
Yash Raj Shrestha,
Bibek Paudel
Abstract:
With the recent surge in social applications relying on knowledge graphs, the need for techniques to ensure fairness in KG based methods is becoming increasingly evident. Previous works have demonstrated that KGs are prone to various social biases, and have proposed multiple methods for debiasing them. However, in such studies, the focus has been on debiasing techniques, while the relations to be…
▽ More
With the recent surge in social applications relying on knowledge graphs, the need for techniques to ensure fairness in KG based methods is becoming increasingly evident. Previous works have demonstrated that KGs are prone to various social biases, and have proposed multiple methods for debiasing them. However, in such studies, the focus has been on debiasing techniques, while the relations to be debiased are specified manually by the user. As manual specification is itself susceptible to human cognitive bias, there is a need for a system capable of quantifying and exposing biases, that can support more informed decisions on what to debias. To address this gap in the literature, we describe a framework for identifying biases present in knowledge graph embeddings, based on numerical bias metrics. We illustrate the framework with three different bias measures on the task of profession prediction, and it can be flexibly extended to further bias definitions and applications. The relations flagged as biased can then be handed to decision makers for judgement upon subsequent debiasing.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
Are Bias Mitigation Techniques for Deep Learning Effective?
Authors:
Robik Shrestha,
Kushal Kafle,
Christopher Kanan
Abstract:
A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms…
▽ More
A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms of bias, and systems have access to hidden knowledge or are tuned specifically to the test set. To address this, we introduce an improved evaluation protocol, sensible metrics, and a new dataset, which enables us to ask and answer critical questions about bias mitigation algorithms. We evaluate seven state-of-the-art algorithms using the same network architecture and hyperparameter selection policy across three benchmark datasets. We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources. We use Biased MNIST and a visual question answering (VQA) benchmark to assess robustness to hidden biases. Rather than only tuning to the test set distribution, we study robustness across different tuning distributions, which is critical because for many applications the test distribution may not be known during development. We find that algorithms exploit hidden biases, are unable to scale to multiple forms of bias, and are highly sensitive to the choice of tuning set. Based on our findings, we implore the community to adopt more rigorous assessment of future bias mitigation methods. All data, code, and results are publicly available at: https://github.com/erobic/bias-mitigators.
△ Less
Submitted 23 April, 2024; v1 submitted 31 March, 2021;
originally announced April 2021.
-
Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems
Authors:
Usman Mahmood,
Robik Shrestha,
David D. B. Bates,
Lorenzo Mannelli,
Giuseppe Corrias,
Yusuf Erdi,
Christopher Kanan
Abstract:
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safe…
▽ More
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Morning or Evening? An Examination of Circadian Rhythms of CS1 Students
Authors:
Albina Zavgorodniaia,
Raj Shrestha,
Juho Leinonen,
Arto Hellas,
John Edwards
Abstract:
Circadian rhythms are the cycles of our internal clock that play a key role in governing when we sleep and when we are active. A related concept is chronotype, which is a person's natural tendency toward activity at certain times of day and typically governs when the individual is most alert and productive. In this work we investigate chronotypes in the setting of an Introductory Computer Programm…
▽ More
Circadian rhythms are the cycles of our internal clock that play a key role in governing when we sleep and when we are active. A related concept is chronotype, which is a person's natural tendency toward activity at certain times of day and typically governs when the individual is most alert and productive. In this work we investigate chronotypes in the setting of an Introductory Computer Programming (CS1) course. Using keystroke data collected from students we investigate the existence of chronotypes through unsupervised learning. The chronotypes we find align with those of typical populations reported in the literature and our results support correlations of certain chronotypes to academic achievement. We also find a lack of support for the still-popular stereotype of a computer programmer as a night owl. The analyses are conducted on data from two universities, one in the US and one in Europe, that use different teaching methods. In comparison of the two contexts, we look into programming assignment design and administration that may promote better programming practices among students in terms of procrastination and effort.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Augmenting Organizational Decision-Making with Deep Learning Algorithms: Principles, Promises, and Challenges
Authors:
Yash Raj Shrestha,
Vaibhav Krishna,
Georg von Krogh
Abstract:
The current expansion of theory and research on artificial intelligence in management and organization studies has revitalized the theory and research on decision-making in organizations. In particular, recent advances in deep learning (DL) algorithms promise benefits for decision-making within organizations, such as assisting employees with information processing, thereby augment their analytical…
▽ More
The current expansion of theory and research on artificial intelligence in management and organization studies has revitalized the theory and research on decision-making in organizations. In particular, recent advances in deep learning (DL) algorithms promise benefits for decision-making within organizations, such as assisting employees with information processing, thereby augment their analytical capabilities and perhaps help their transition to more creative work.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
MeshMVS: Multi-View Stereo Guided Mesh Reconstruction
Authors:
Rakesh Shrestha,
Zhiwen Fan,
Qingkun Su,
Zuozhuo Dai,
Siyu Zhu,
** Tan
Abstract:
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry…
▽ More
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry information explicitly by using the features from intermediate depth representations of multi-view stereo and regularizing the 3D shapes against these depth images. First, our system predicts a coarse 3D volume from the color images by probabilistically merging voxel occupancy grids from the prediction of individual views. Then the depth images from multi-view stereo along with the rendered depth images of the coarse shape are used as a contrastive input whose features guide the refinement of the coarse shape through a series of graph convolution networks. Notably, we achieve superior results than state-of-the-art multi-view shape generation methods with 34% decrease in Chamfer distance to ground truth and 14% increase in F1-score on ShapeNet dataset.Our source code is available at https://git.io/Jmalg
△ Less
Submitted 11 April, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
IoT Based Smart Home using Blynk Framework
Authors:
Bharat Bohara,
Sunil Maharjan,
Bibek Raj Shrestha
Abstract:
The project discussed in this paper is targeted at solving sundry problems faced by Nepalese people in their daily life. It is designed to control and monitor appliances via smartphone using Wi-Fi as communication protocol and raspberry pi as private server. All the appliances and sensors are connected to the internet via NodeMcu microcontroller, which serves as the gateway to the internet. Even i…
▽ More
The project discussed in this paper is targeted at solving sundry problems faced by Nepalese people in their daily life. It is designed to control and monitor appliances via smartphone using Wi-Fi as communication protocol and raspberry pi as private server. All the appliances and sensors are connected to the internet via NodeMcu microcontroller, which serves as the gateway to the internet. Even if the user goes offline, the system is designed to switch to automated state controlling the appliances automatically as per the sensors readings. Also, the data are logged on to the server for future data mining. The core system of this project is adopted from the Blynk framework.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records
Authors:
Leopold Franz,
Yash Raj Shrestha,
Bibek Paudel
Abstract:
Augmentation of disease diagnosis and decision-making in healthcare with machine learning algorithms is gaining much impetus in recent years. In particular, in the current epidemiological situation caused by COVID-19 pandemic, swift and accurate prediction of disease diagnosis with machine learning algorithms could facilitate identification and care of vulnerable clusters of population, such as th…
▽ More
Augmentation of disease diagnosis and decision-making in healthcare with machine learning algorithms is gaining much impetus in recent years. In particular, in the current epidemiological situation caused by COVID-19 pandemic, swift and accurate prediction of disease diagnosis with machine learning algorithms could facilitate identification and care of vulnerable clusters of population, such as those having multi-morbidity conditions. In order to build a useful disease diagnosis prediction system, advancement in both data representation and development of machine learning architectures are imperative. First, with respect to data collection and representation, we face severe problems due to multitude of formats and lack of coherency prevalent in Electronic Health Records (EHRs). This causes hindrance in extraction of valuable information contained in EHRs. Currently, no universal global data standard has been established. As a useful solution, we develop and publish a Python package to transform public health dataset into an easy to access universal format. This data transformation to an international health data format facilitates researchers to easily combine EHR datasets with clinical datasets of diverse formats. Second, machine learning algorithms that predict multiple disease diagnosis categories simultaneously remain underdeveloped. We propose two novel model architectures in this regard. First, DeepObserver, which uses structured numerical data to predict the diagnosis categories and second, ClinicalBERT_Multi, that incorporates rich information available in clinical notes via natural language processing methods and also provides interpretable visualizations to medical practitioners. We show that both models can predict multiple diagnoses simultaneously with high accuracy.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Adversarial Learning for Debiasing Knowledge Graph Embeddings
Authors:
Mario Arduini,
Lorenzo Noci,
Federico Pirovano,
Ce Zhang,
Yash Raj Shrestha,
Bibek Paudel
Abstract:
Knowledge Graphs (KG) are gaining increasing attention in both academia and industry. Despite their diverse benefits, recent research have identified social and cultural biases embedded in the representations learned from KGs. Such biases can have detrimental consequences on different population and minority groups as applications of KG begin to intersect and interact with social spheres. This pap…
▽ More
Knowledge Graphs (KG) are gaining increasing attention in both academia and industry. Despite their diverse benefits, recent research have identified social and cultural biases embedded in the representations learned from KGs. Such biases can have detrimental consequences on different population and minority groups as applications of KG begin to intersect and interact with social spheres. This paper aims at identifying and mitigating such biases in Knowledge Graph (KG) embeddings. As a first step, we explore popularity bias -- the relationship between node popularity and link prediction accuracy. In case of node2vec graph embeddings, we find that prediction accuracy of the embedding is negatively correlated with the degree of the node. However, in case of knowledge-graph embeddings (KGE), we observe an opposite trend. As a second step, we explore gender bias in KGE, and a careful examination of popular KGE algorithms suggest that sensitive attribute like the gender of a person can be predicted from the embedding. This implies that such biases in popular KGs is captured by the structural properties of the embedding. As a preliminary solution to debiasing KGs, we introduce a novel framework to filter out the sensitive attribute information from the KG embeddings, which we call FAN (Filtering Adversarial Network). We also suggest the applicability of FAN for debiasing other network embeddings which could be explored in future work.
△ Less
Submitted 17 February, 2021; v1 submitted 29 June, 2020;
originally announced June 2020.
-
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Authors:
Damien Teney,
Kushal Kafle,
Robik Shrestha,
Ehsan Abbasnejad,
Christopher Kanan,
Anton van den Hengel
Abstract:
Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices…
▽ More
Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Visual Grounding Methods for VQA are Working for the Wrong Reasons!
Authors:
Robik Shrestha,
Kushal Kafle,
Christopher Kanan
Abstract:
Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performan…
▽ More
Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to linguistic priors. For instance, we find that it is not actually necessary to provide proper, human-based cues; random, insensible cues also result in similar improvements. Based on this observation, we propose a simpler regularization scheme that does not require any external annotations and yet achieves near state-of-the-art performance on VQA-CPv2.
△ Less
Submitted 23 April, 2024; v1 submitted 12 April, 2020;
originally announced April 2020.
-
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Authors:
Tyler L. Hayes,
Kushal Kafle,
Robik Shrestha,
Manoj Acharya,
Christopher Kanan
Abstract:
People learn throughout life. However, incrementally updating conventional neural networks leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the brain consolidates memory. Replay involves fine-tuning a network on a mixture of new and old instances. While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutio…
▽ More
People learn throughout life. However, incrementally updating conventional neural networks leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the brain consolidates memory. Replay involves fine-tuning a network on a mixture of new and old instances. While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutional networks replay raw images. Here, we propose REMIND, a brain-inspired approach that enables efficient replay with compressed representations. REMIND is trained in an online manner, meaning it learns one example at a time, which is closer to how humans learn. Under the same constraints, REMIND outperforms other methods for incremental class learning on the ImageNet ILSVRC-2012 dataset. We probe REMIND's robustness to data ordering schemes known to induce catastrophic forgetting. We demonstrate REMIND's generality by pioneering online learning for Visual Question Answering (VQA).
△ Less
Submitted 13 July, 2020; v1 submitted 6 October, 2019;
originally announced October 2019.
-
Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Authors:
Kushal Kafle,
Robik Shrestha,
Brian Price,
Scott Cohen,
Christopher Kanan
Abstract:
Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Withou…
▽ More
Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.
△ Less
Submitted 22 July, 2020; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Challenges and Prospects in Vision and Language Research
Authors:
Kushal Kafle,
Robik Shrestha,
Christopher Kanan
Abstract:
Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving go…
▽ More
Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.
△ Less
Submitted 24 May, 2019; v1 submitted 19 April, 2019;
originally announced April 2019.
-
Answer Them All! Toward Universal Visual Question Answering Models
Authors:
Robik Shrestha,
Kushal Kafle,
Christopher Kanan
Abstract:
Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both…
▽ More
Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, e.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.
△ Less
Submitted 5 April, 2019; v1 submitted 1 March, 2019;
originally announced March 2019.
-
How Credible is the Prediction of a Party-Based Election?
Authors:
Jiong Guo,
Yash Raj Shrestha,
Yongjie Yang
Abstract:
In a party-based election system, the voters are grouped into parties and all voters of a party are assumed to vote according to the party preferences over the candidates. Hence, once the party preferences are declared the outcome of the election can be determined. However, in the actual election, the members of some "instable" parties often leave their own party to join other parties. We introduc…
▽ More
In a party-based election system, the voters are grouped into parties and all voters of a party are assumed to vote according to the party preferences over the candidates. Hence, once the party preferences are declared the outcome of the election can be determined. However, in the actual election, the members of some "instable" parties often leave their own party to join other parties. We introduce two parameters to measure the credibility of the prediction based on party preferences: Min is the minimum number of voters leaving the instable parties such that the prediction is no longer true, while Max is the maximum number of voters leaving the instable parties such that the prediction remains valid. Concerning the complexity of computing Min and Max, we consider both positional scoring rules (Plurality, Veto, r-Approval and Borda) and Condorcet-consistent rules (Copeland and Maximin). We show that for all considered scoring rules, Min is polynomial-time computable, while it is NP-hard to compute Min for Copeland and Maximin. With the only exception of Borda, Max can be computed in polynomial time for other scoring rules. We have NP-hardness results for the computation of Max under Borda, Maximin and Copeland.
△ Less
Submitted 9 April, 2014;
originally announced April 2014.
-
Parameterized Complexity of Edge Interdiction Problems
Authors:
Jiong Guo,
Yash Raj Shrestha
Abstract:
We study the parameterized complexity of interdiction problems in graphs. For an optimization problem on graphs, one can formulate an interdiction problem as a game consisting of two players, namely, an interdictor and an evader, who compete on an objective with opposing interests. In edge interdiction problems, every edge of the input graph has an interdiction cost associated with it and the inte…
▽ More
We study the parameterized complexity of interdiction problems in graphs. For an optimization problem on graphs, one can formulate an interdiction problem as a game consisting of two players, namely, an interdictor and an evader, who compete on an objective with opposing interests. In edge interdiction problems, every edge of the input graph has an interdiction cost associated with it and the interdictor interdicts the graph by modifying the edges in the graph, and the number of such modifications is constrained by the interdictor's budget. The evader then solves the given optimization problem on the modified graph. The action of the interdictor must impede the evader as much as possible. We focus on edge interdiction problems related to minimum spanning tree, maximum matching and shortest paths. These problems arise in different real world scenarios. We derive several fixed-parameter tractability and W[1]-hardness results for these interdiction problems with respect to various parameters. Next, we show close relation between interdiction problems and partial cover problems on bipartite graphs where the goal is not to cover all elements but to minimize/maximize the number of covered elements with specific number of sets. Hereby, we investigate the parameterized complexity of several partial cover problems on bipartite graphs.
△ Less
Submitted 11 January, 2014;
originally announced January 2014.