-
Differentially Private Data Generation with Missing Data
Authors:
Shubhankar Mohapatra,
Jianqiao Zong,
Florian Kerschbaum,
Xi He
Abstract:
Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of DP synthetic data with missing values and propose three effective adaptive strategies that significantly improve the utility of the synthetic data…
▽ More
Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the problems of DP synthetic data with missing values and propose three effective adaptive strategies that significantly improve the utility of the synthetic data on four real-world datasets with different types and levels of missing data and privacy requirements. We also identify the relationship between privacy impact for the complete ground truth data and incomplete data for these DP synthetic data generation algorithms. We model the missing mechanisms as a sampling process to obtain tighter upper bounds for the privacy guarantees to the ground truth data. Overall, this study contributes to a better understanding of the challenges and opportunities for using private synthetic data generation algorithms in the presence of missing data.
△ Less
Submitted 30 May, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Automated Ensemble-Based Segmentation of Adult Brain Tumors: A Novel Approach Using the BraTS AFRICA Challenge Data
Authors:
Chiranjeewee Prasad Koirala,
Sovesh Mohapatra,
Advait Gosai,
Gottfried Schlaug
Abstract:
Brain tumors, particularly glioblastoma, continue to challenge medical diagnostics and treatments globally. This paper explores the application of deep learning to multi-modality magnetic resonance imaging (MRI) data for enhanced brain tumor segmentation precision in the Sub-Saharan Africa patient population. We introduce an ensemble method that comprises eleven unique variations based on three co…
▽ More
Brain tumors, particularly glioblastoma, continue to challenge medical diagnostics and treatments globally. This paper explores the application of deep learning to multi-modality magnetic resonance imaging (MRI) data for enhanced brain tumor segmentation precision in the Sub-Saharan Africa patient population. We introduce an ensemble method that comprises eleven unique variations based on three core architectures: UNet3D, ONet3D, SphereNet3D and modified loss functions. The study emphasizes the need for both age- and population-based segmentation models, to fully account for the complexities in the brain. Our findings reveal that the ensemble approach, combining different architectures, outperforms single models, leading to improved evaluation metrics. Specifically, the results exhibit Dice scores of 0.82, 0.82, and 0.87 for enhancing tumor, tumor core, and whole tumor labels respectively. These results underline the potential of tailored deep learning techniques in precisely segmenting brain tumors and lay groundwork for future work to fine-tune models and assess performance across different brain regions.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Automated ensemble method for pediatric brain tumor segmentation
Authors:
Shashidhar Reddy Javaji,
Sovesh Mohapatra,
Advait Gosai,
Gottfried Schlaug
Abstract:
Brain tumors remain a critical global health challenge, necessitating advancements in diagnostic techniques and treatment methodologies. A tumor or its recurrence often needs to be identified in imaging studies and differentiated from normal brain tissue. In response to the growing need for age-specific segmentation models, particularly for pediatric patients, this study explores the deployment of…
▽ More
Brain tumors remain a critical global health challenge, necessitating advancements in diagnostic techniques and treatment methodologies. A tumor or its recurrence often needs to be identified in imaging studies and differentiated from normal brain tissue. In response to the growing need for age-specific segmentation models, particularly for pediatric patients, this study explores the deployment of deep learning techniques using magnetic resonance imaging (MRI) modalities. By introducing a novel ensemble approach using ONet and modified versions of UNet, coupled with innovative loss functions, this study achieves a precise segmentation model for the BraTS-PEDs 2023 Challenge. Data augmentation, including both single and composite transformations, ensures model robustness and accuracy across different scanning protocols. The ensemble strategy, integrating the ONet and UNet models, shows greater effectiveness in capturing specific features and modeling diverse aspects of the MRI images which result in lesion wise Dice scores of 0.52, 0.72 and 0.78 on unseen validation data and scores of 0.55, 0.70, 0.79 on final testing data for the "enhancing tumor", "tumor core" and "whole tumor" labels respectively. Visual comparisons further confirm the superiority of the ensemble method in accurate tumor region coverage. The results indicate that this advanced ensemble approach, building upon the unique strengths of individual models, offers promising prospects for enhanced diagnostic accuracy and effective treatment planning and monitoring for brain tumors in pediatric brains.
△ Less
Submitted 14 March, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving
Authors:
Sambit Mohapatra,
Senthil Yogamani,
Varun Ravi Kumar,
Stefan Milz,
Heinrich Gotzig,
Patrick Mäder
Abstract:
LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network f…
▽ More
LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at https://youtu.be/H-hWRzv2lIY.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Meta-Analysis of Transfer Learning for Segmentation of Brain Lesions
Authors:
Sovesh Mohapatra,
Advait Gosai,
Anant Shinde,
Aleksei Rutkovskii,
Sirisha Nouduri,
Gottfried Schlaug
Abstract:
A major challenge in stroke research and stroke recovery predictions is the determination of a stroke lesion's extent and its impact on relevant brain systems. Manual segmentation of stroke lesions from 3D magnetic resonance (MR) imaging volumes, the current gold standard, is not only very time-consuming, but its accuracy highly depends on the operator's experience. As a result, there is a need fo…
▽ More
A major challenge in stroke research and stroke recovery predictions is the determination of a stroke lesion's extent and its impact on relevant brain systems. Manual segmentation of stroke lesions from 3D magnetic resonance (MR) imaging volumes, the current gold standard, is not only very time-consuming, but its accuracy highly depends on the operator's experience. As a result, there is a need for a fully automated segmentation method that can efficiently and objectively measure lesion extent and the impact of each lesion to predict impairment and recovery potential which might be beneficial for clinical, translational, and research settings. We have implemented and tested a fully automatic method for stroke lesion segmentation which was developed using eight different 2D-model architectures trained via transfer learning (TL) and mixed data approaches. Additionally, the final prediction was made using a novel ensemble method involving stacking and agreement window. Our novel method was evaluated in a novel in-house dataset containing 22 T1w brain MR images, which were challenging in various perspectives, but mostly because they included T1w MR images from the subacute (which typically less well defined T1 lesions) and chronic stroke phase (which typically means well defined T1-lesions). Cross-validation results indicate that our new method can efficiently and automatically segment lesions fast and with high accuracy compared to ground truth. In addition to segmentation, we provide lesion volume and weighted lesion load of relevant brain systems based on the lesions' overlap with a canonical structural motor system that stretches from the cortical motor region to the lowest end of the brain stem.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
SAM vs BET: A Comparative Study for Brain Extraction and Segmentation of Magnetic Resonance Images using Deep Learning
Authors:
Sovesh Mohapatra,
Advait Gosai,
Gottfried Schlaug
Abstract:
Brain extraction is a critical preprocessing step in various neuroimaging studies, particularly enabling accurate separation of brain from non-brain tissue and segmentation of relevant within-brain tissue compartments and structures using Magnetic Resonance Imaging (MRI) data. FSL's Brain Extraction Tool (BET), although considered the current gold standard for automatic brain extraction, presents…
▽ More
Brain extraction is a critical preprocessing step in various neuroimaging studies, particularly enabling accurate separation of brain from non-brain tissue and segmentation of relevant within-brain tissue compartments and structures using Magnetic Resonance Imaging (MRI) data. FSL's Brain Extraction Tool (BET), although considered the current gold standard for automatic brain extraction, presents limitations and can lead to errors such as over-extraction in brains with lesions affecting the outer parts of the brain, inaccurate differentiation between brain tissue and surrounding meninges, and susceptibility to image quality issues. Recent advances in computer vision research have led to the development of the Segment Anything Model (SAM) by Meta AI, which has demonstrated remarkable potential in zero-shot segmentation of objects in real-world scenarios. In the current paper, we present a comparative analysis of brain extraction techniques comparing SAM with a widely used and current gold standard technique called BET on a variety of brain scans with varying image qualities, MR sequences, and brain lesions affecting different brain regions. We find that SAM outperforms BET based on average Dice coefficient, IoU and accuracy metrics, particularly in cases where image quality is compromised by signal inhomogeneities, non-isotropic voxel resolutions, or the presence of brain lesions that are located near (or involve) the outer regions of the brain and the meninges. In addition, SAM has also unsurpassed segmentation properties allowing a fine grain separation of different issue compartments and different brain structures. These results suggest that SAM has the potential to emerge as a more accurate, robust and versatile tool for a broad range of brain extraction and segmentation applications.
△ Less
Submitted 19 April, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
From Robots to Books: An Introduction to Smart Applications of AI in Education (AIEd)
Authors:
Shubham Ojha,
Aditya Narendra,
Siddharth Mohapatra,
Ipsit Misra
Abstract:
The world around us has undergone a radical transformation due to rapid technological advancement in recent decades. The industry of the future generation is evolving, and artificial intelligence is the following change in the making popularly known as Industry 4.0. Indeed, experts predict that artificial intelligence(AI) will be the main force behind the following significant virtual shift in the…
▽ More
The world around us has undergone a radical transformation due to rapid technological advancement in recent decades. The industry of the future generation is evolving, and artificial intelligence is the following change in the making popularly known as Industry 4.0. Indeed, experts predict that artificial intelligence(AI) will be the main force behind the following significant virtual shift in the way we stay, converse, study, live, communicate and conduct business. All facets of our social connection are being transformed by this growing technology. One of the newest areas of educational technology is Artificial Intelligence in the field of Education(AIEd).This study emphasizes the different applications of artificial intelligence in education from both an industrial and academic standpoint. It highlights the most recent contextualized learning novel transformative evaluations and advancements in sophisticated tutoring systems. It analyses the AIEd's ethical component and the influence of the transition on people, particularly students and instructors as well. Finally, this article touches on AIEd's potential future research and practices. The goal of this study is to introduce the present-day applications to its intended audience.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
The (In)Effectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning
Authors:
Sovesh Mohapatra,
Somesh Mohapatra
Abstract:
Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable knowledge-based fine-tuning for a number of tasks, adaptation of models for different domains and even languages. However, it remains an open question, if and when transfer learning will work, i.e. leading to positive or negative transfer. In this paper, we analyze the knowledge transfer across three…
▽ More
Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable knowledge-based fine-tuning for a number of tasks, adaptation of models for different domains and even languages. However, it remains an open question, if and when transfer learning will work, i.e. leading to positive or negative transfer. In this paper, we analyze the knowledge transfer across three natural language processing (NLP) tasks - text classification, sentimental analysis, and sentence similarity, using three LLMs - BERT, RoBERTa, and XLNet - and analyzing their performance, by fine-tuning on target datasets for domain and cross-lingual adaptation tasks, with and without an intermediate task training on a larger dataset. Our experiments showed that fine-tuning without an intermediate task training can lead to a better performance for most tasks, while more generalized tasks might necessitate a preceding intermediate task training step. We hope that this work will act as a guide on transfer learning to NLP practitioners.
△ Less
Submitted 4 November, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Sentiment is all you need to win US Presidential elections
Authors:
Sovesh Mohapatra,
Somesh Mohapatra
Abstract:
Election speeches play an integral role in communicating the vision and mission of the candidates. From lofty promises to mud-slinging, the electoral candidate accounts for all. However, there remains an open question about what exactly wins over the voters. In this work, we used state-of-the-art natural language processing methods to study the speeches and sentiments of the Republican candidate,…
▽ More
Election speeches play an integral role in communicating the vision and mission of the candidates. From lofty promises to mud-slinging, the electoral candidate accounts for all. However, there remains an open question about what exactly wins over the voters. In this work, we used state-of-the-art natural language processing methods to study the speeches and sentiments of the Republican candidate, Donald Trump, and Democratic candidate, Joe Biden, fighting for the 2020 US Presidential election. Comparing the racial dichotomy of the United States, we analyze what led to the victory and defeat of the different candidates. We believe this work will inform the election campaigning strategy and provide a basis for communicating to diverse crowds.
△ Less
Submitted 15 October, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
SpikiLi: A Spiking Simulation of LiDAR based Real-time Object Detection for Autonomous Driving
Authors:
Sambit Mohapatra,
Thomas Mesquida,
Mona Hodaei,
Senthil Yogamani,
Heinrich Gotzig,
Patrick Mader
Abstract:
Spiking Neural Networks are a recent and new neural network design approach that promises tremendous improvements in power efficiency, computation efficiency, and processing latency. They do so by using asynchronous spike-based data flow, event-based signal generation, processing, and modifying the neuron model to resemble biological neurons closely. While some initial works have shown significant…
▽ More
Spiking Neural Networks are a recent and new neural network design approach that promises tremendous improvements in power efficiency, computation efficiency, and processing latency. They do so by using asynchronous spike-based data flow, event-based signal generation, processing, and modifying the neuron model to resemble biological neurons closely. While some initial works have shown significant initial evidence of applicability to common deep learning tasks, their applications in complex real-world tasks has been relatively low. In this work, we first illustrate the applicability of spiking neural networks to a complex deep learning task namely Lidar based 3D object detection for automated driving. Secondly, we make a step-by-step demonstration of simulating spiking behavior using a pre-trained convolutional neural network. We closely model essential aspects of spiking neural networks in simulation and achieve equivalent run-time and accuracy on a GPU. When the model is realized on a neuromorphic hardware, we expect to have significantly improved power efficiency.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Intelligent Acoustic Module for Autonomous Vehicles using Fast Gated Recurrent approach
Authors:
Raghav Rawat,
Shreyash Gupta,
Shreyas Mohapatra,
Sujata Priyambada Mishra,
Sreesankar Rajagopal
Abstract:
This paper elucidates a model for acoustic single and multi-tone classification in resource constrained edge devices. The proposed model is of State-of-the-art Fast Accurate Stable Tiny Gated Recurrent Neural Network. This model has resulted in improved performance metrics and lower size compared to previous hypothesized methods by using lesser parameters with higher efficiency and employment of a…
▽ More
This paper elucidates a model for acoustic single and multi-tone classification in resource constrained edge devices. The proposed model is of State-of-the-art Fast Accurate Stable Tiny Gated Recurrent Neural Network. This model has resulted in improved performance metrics and lower size compared to previous hypothesized methods by using lesser parameters with higher efficiency and employment of a noise reduction algorithm. The model is implemented as an acoustic AI module, focused for the application of sound identification, localization, and deployment on AI systems like that of an autonomous car. Further, the inclusion of localization techniques carries the potential of adding a new dimension to the multi-tone classifiers present in autonomous vehicles, as its demand increases in urban cities and develo** countries in the future.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection
Authors:
Shubhankar Mohapatra,
Sa** Sasy,
Xi He,
Gautam Kamath,
Om Thakkar
Abstract:
Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private m…
▽ More
Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private machine learning, in which the process of hyperparameter tuning is accounted for in the overall privacy budget. To this end, we i) show that standard composition tools outperform more advanced techniques in many settings, ii) empirically and theoretically demonstrate an intrinsic connection between the learning rate and clip** norm hyperparameters, iii) show that adaptive optimizers like DPAdam enjoy a significant advantage in the process of honest hyperparameter tuning, and iv) draw upon novel limiting behaviour of Adam in the DP setting to design a new and more efficient optimizer.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
LiMoSeg: Real-time Bird's Eye View based LiDAR Motion Segmentation
Authors:
Sambit Mohapatra,
Mona Hodaei,
Senthil Yogamani,
Stefan Milz,
Heinrich Gotzig,
Martin Simon,
Hazem Rashed,
Patrick Maeder
Abstract:
Moving object detection and segmentation is an essential task in the Autonomous Driving pipeline. Detecting and isolating static and moving components of a vehicle's surroundings are particularly crucial in path planning and localization tasks. This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data. We use three successive scans of Li…
▽ More
Moving object detection and segmentation is an essential task in the Autonomous Driving pipeline. Detecting and isolating static and moving components of a vehicle's surroundings are particularly crucial in path planning and localization tasks. This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data. We use three successive scans of LiDAR data in 2D Bird's Eye View (BEV) representation to perform pixel-wise classification as static or moving. Furthermore, we propose a novel data augmentation technique to reduce the significant class imbalance between static and moving objects. We achieve this by artificially synthesizing moving objects by cutting and pasting static vehicles. We demonstrate a low latency of 8 ms on a commonly used automotive embedded platform, namely Nvidia Jetson Xavier. To the best of our knowledge, this is the first work directly performing motion segmentation in LiDAR BEV space. We provide quantitative results on the challenging SemanticKITTI dataset, and qualitative results are provided in https://youtu.be/2aJ-cL8b0LI.
△ Less
Submitted 22 January, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags
Authors:
Christian D. Newman,
Michael J. Decker,
Reem S. AlSuhaibani,
Anthony Peruma,
Satyajit Mohapatra,
Tejal Vishnoi,
Marcos Zampieri,
Mohamed W. Mkaouer,
Timothy J. Sheldon,
Emily Hill
Abstract:
This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE…
▽ More
This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to promote the future amelioration of these problems through further research. Our results show that the ensemble achieves 75\% accuracy at the identifier level and 84-86\% accuracy at the word level. This is an increase of +17\% points at the identifier level from the closest independent part-of-speech tagger.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
ICLR 2021 Challenge for Computational Geometry & Topology: Design and Results
Authors:
Nina Miolane,
Matteo Caorsi,
Umberto Lupo,
Marius Guerard,
Nicolas Guigui,
Johan Mathe,
Yann Cabanes,
Wojciech Reise,
Thomas Davies,
António Leitão,
Somesh Mohapatra,
Saiteja Utpala,
Shailja Shailja,
Gabriele Corso,
Guoxi Liu,
Federico Iuricich,
Andrei Manolache,
Mihaela Nistor,
Matei Bejan,
Armand Mihai Nicolicioiu,
Bogdan-Alexandru Luchian,
Mihai-Sorin Stupariu,
Florent Michel,
Khanh Dao Duc,
Bilal Abdulrahman
, et al. (8 additional authors not shown)
Abstract:
This paper presents the computational challenge on differential geometry and topology that happened within the ICLR 2021 workshop "Geometric and Topological Representation Learning". The competition asked participants to provide creative contributions to the fields of computational geometry and topology through the open-source repositories Geomstats and Giotto-TDA. The challenge attracted 16 teams…
▽ More
This paper presents the computational challenge on differential geometry and topology that happened within the ICLR 2021 workshop "Geometric and Topological Representation Learning". The competition asked participants to provide creative contributions to the fields of computational geometry and topology through the open-source repositories Geomstats and Giotto-TDA. The challenge attracted 16 teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings.
△ Less
Submitted 25 August, 2021; v1 submitted 22 August, 2021;
originally announced August 2021.
-
BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object Detection for Autonomous Driving
Authors:
Sambit Mohapatra,
Senthil Yogamani,
Heinrich Gotzig,
Stefan Milz,
Patrick Mader
Abstract:
3D object detection based on LiDAR point clouds is a crucial module in autonomous driving particularly for long range sensing. Most of the research is focused on achieving higher accuracy and these models are not optimized for deployment on embedded systems from the perspective of latency and power efficiency. For high speed driving scenarios, latency is a crucial parameter as it provides more tim…
▽ More
3D object detection based on LiDAR point clouds is a crucial module in autonomous driving particularly for long range sensing. Most of the research is focused on achieving higher accuracy and these models are not optimized for deployment on embedded systems from the perspective of latency and power efficiency. For high speed driving scenarios, latency is a crucial parameter as it provides more time to react to dangerous situations. Typically a voxel or point-cloud based 3D convolution approach is utilized for this module. Firstly, they are inefficient on embedded platforms as they are not suitable for efficient parallelization. Secondly, they have a variable runtime due to level of sparsity of the scene which is against the determinism needed in a safety system. In this work, we aim to develop a very low latency algorithm with fixed runtime. We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction using binned classification in a simpler Bird's Eye View (BEV) 2D representation. The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation. The proposed model has a latency of 4 ms on the embedded Nvidia Xavier platform. The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
△ Less
Submitted 10 July, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
GLAMOUR: Graph Learning over Macromolecule Representations
Authors:
Somesh Mohapatra,
Joyce An,
Rafael Gómez-Bombarelli
Abstract:
The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed GLAMOUR, a fram…
▽ More
The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed GLAMOUR, a framework for chemistry-informed graph representation of macromolecules that enables quantifying structural similarity, and interpretable supervised learning for macromolecules.
△ Less
Submitted 23 August, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Kamino: Constraint-Aware Differentially Private Data Synthesis
Authors:
Chang Ge,
Shubhankar Mohapatra,
Xi He,
Ihab F. Ilyas
Abstract:
Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring the privacy of individual data records. Existing differentially private data synthesis methods aim to generate useful data based on applications, but…
▽ More
Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring the privacy of individual data records. Existing differentially private data synthesis methods aim to generate useful data based on applications, but they fail in kee** one of the most fundamental data properties of the structured data -- the underlying correlations and dependencies among tuples and attributes (i.e., the structure of the data). This structure is often expressed as integrity and schema constraints, or with a probabilistic generative process. As a result, the synthesized data is not useful for any downstream tasks that require this structure to be preserved.
This work presents Kamino, a data synthesis system to ensure differential privacy and to preserve the structure and correlations present in the original dataset. Kamino takes as input of a database instance, along with its schema (including integrity constraints), and produces a synthetic database instance with differential privacy and structure preservation guarantees. We empirically show that while preserving the structure of the data, Kamino achieves comparable and even better usefulness in applications of training classification models and answering marginal queries than the state-of-the-art methods of differentially private data synthesis.
△ Less
Submitted 15 April, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
KryptoOracle: A Real-Time Cryptocurrency Price Prediction Platform Using Twitter Sentiments
Authors:
Shubhankar Mohapatra,
Nauman Ahmed,
Paulo Alencar
Abstract:
Cryptocurrencies, such as Bitcoin, are becoming increasingly popular, having been widely used as an exchange medium in areas such as financial transaction and asset transfer verification. However, there has been a lack of solutions that can support real-time price prediction to cope with high currency volatility, handle massive heterogeneous data volumes, including social media sentiments, while s…
▽ More
Cryptocurrencies, such as Bitcoin, are becoming increasingly popular, having been widely used as an exchange medium in areas such as financial transaction and asset transfer verification. However, there has been a lack of solutions that can support real-time price prediction to cope with high currency volatility, handle massive heterogeneous data volumes, including social media sentiments, while supporting fault tolerance and persistence in real time, and provide real-time adaptation of learning algorithms to cope with new price and sentiment data. In this paper we introduce KryptoOracle, a novel real-time and adaptive cryptocurrency price prediction platform based on Twitter sentiments. The integrative and modular platform is based on (i) a Spark-based architecture which handles the large volume of incoming data in a persistent and fault tolerant way; (ii) an approach that supports sentiment analysis which can respond to large amounts of natural language processing queries in real time; and (iii) a predictive method grounded on online learning in which a model adapts its weights to cope with new prices and sentiments. Besides providing an architectural design, the paper also describes the KryptoOracle platform implementation and experimental evaluation. Overall, the proposed platform can help accelerate decision-making, uncover new opportunities and provide more timely insights based on the available and ever-larger financial data volume and variety.
△ Less
Submitted 21 February, 2020;
originally announced March 2020.
-
Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods
Authors:
Anirudh Itagi,
Ritam Sil,
Saurav Mohapatra,
Subham Rout,
Bharath K P,
Karthik R,
Rajesh Kumar Muthu
Abstract:
Misclassification of medicine is perilous to the health of a patient, more so if the said patient is visually impaired or simply did not recognize the color, shape or type of medicine strip. This paper proposes a method for identification of medicine strips by 2-D cepstral analysis of their images followed by performing classification that has been done using the K-Nearest Neighbor (KNN), Support…
▽ More
Misclassification of medicine is perilous to the health of a patient, more so if the said patient is visually impaired or simply did not recognize the color, shape or type of medicine strip. This paper proposes a method for identification of medicine strips by 2-D cepstral analysis of their images followed by performing classification that has been done using the K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Logistic Regression (LR) Classifiers. The 2-D cepstral features extracted are extremely distinct to a medicine strip and consequently make identifying them exceptionally accurate. This paper also proposes the Color Gradient and Pill shape Feature (CGPF) extraction procedure and discusses the Binary Robust Invariant Scalable Keypoints (BRISK) algorithm as well. The mentioned algorithms were implemented and their identification results have been compared.
△ Less
Submitted 3 February, 2020;
originally announced March 2020.
-
Triclustering of Gene Expression Microarray Data Using Coarse-Grained Parallel Genetic Algorithm
Authors:
Shubhankar Mohapatra,
Moumita Sarkar,
Anjali Mohapatra,
Bhawani Sankar Biswal
Abstract:
Microarray data analysis is one of the major area of research in the field computational biology. Numerous techniques like clustering, biclustering are often applied to microarray data to extract meaningful outcomes which play key roles in practical healthcare affairs like disease identification, drug discovery etc. But these techniques become obsolete when time as an another factor is considered…
▽ More
Microarray data analysis is one of the major area of research in the field computational biology. Numerous techniques like clustering, biclustering are often applied to microarray data to extract meaningful outcomes which play key roles in practical healthcare affairs like disease identification, drug discovery etc. But these techniques become obsolete when time as an another factor is considered for evaluation in such data. This problem motivates to use triclustering method on gene expression 3D microarray data. In this article, a new methodology based on coarse-grained parallel genetic approach is proposed to locate meaningful triclusters in gene expression data. The outcomes are quite impressive as they are more effective as compared to traditional state of the art genetic approaches previously applied for triclustering of 3D GCT microarray data.
△ Less
Submitted 31 August, 2019;
originally announced September 2019.
-
Hybrid Heterogeneous Routing Scheme for Improved Network Performance in WSNs for Animal Tracking
Authors:
Trupti. M. Behera,
Sushanta. K. Mohapatra,
Umesh. C. Samal,
Mohammad. S. Khan
Abstract:
Wireless Sensor Networks (WSNs) experiences several technical challenges such as limited energy, short transmission range, limited storage capacities, and limited computational capabilities. Moreover, the sensor nodes are deployed randomly and massively over an inaccessible or hostile region. Hence WSNs are vulnerable to adversaries and are usually operated in a dynamic and unreliable environment.…
▽ More
Wireless Sensor Networks (WSNs) experiences several technical challenges such as limited energy, short transmission range, limited storage capacities, and limited computational capabilities. Moreover, the sensor nodes are deployed randomly and massively over an inaccessible or hostile region. Hence WSNs are vulnerable to adversaries and are usually operated in a dynamic and unreliable environment. Animal tracking using wireless sensors is one such application of WSN where power management plays a vital role. In this paper, an energy-efficient hybrid routing method is proposed that divides the whole network into smaller regions based on sensor location and chooses the routing scheme accordingly. The sensor network consists of a base station (BS) located at a distant place outside the network, and a relay node is placed inside the network for direct communications from nodes nearer to it. The nodes are further divided into two categories based on the supplied energy; such that the ones located far away from BS and relay have higher energy than the nodes nearer to them. The network performance of the proposed method is compared with protocols like LEACH, SEP, and SNRP, considering parameters like stability period, throughput and energy consumption. Simulation result shows that the proposed method outperforms other methods with better network performance.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Exploring Deep Spiking Neural Networks for Automated Driving Applications
Authors:
Sambit Mohapatra,
Heinrich Gotzig,
Senthil Yogamani,
Stefan Milz,
Raoul Zollner
Abstract:
Neural networks have become the standard model for various computer vision tasks in automated driving including semantic segmentation, moving object detection, depth estimation, visual odometry, etc. The main flavors of neural networks which are used commonly are convolutional (CNN) and recurrent (RNN). In spite of rapid progress in embedded processors, power consumption and cost is still a bottle…
▽ More
Neural networks have become the standard model for various computer vision tasks in automated driving including semantic segmentation, moving object detection, depth estimation, visual odometry, etc. The main flavors of neural networks which are used commonly are convolutional (CNN) and recurrent (RNN). In spite of rapid progress in embedded processors, power consumption and cost is still a bottleneck. Spiking Neural Networks (SNNs) are gradually progressing to achieve low-power event-driven hardware architecture which has a potential for high efficiency. In this paper, we explore the role of deep spiking neural networks (SNN) for automated driving applications. We provide an overview of progress on SNN and argue how it can be a good fit for automated driving applications.
△ Less
Submitted 11 January, 2019;
originally announced March 2019.
-
Residual Energy Based Cluster-head Selection in WSNs for IoT Application
Authors:
Trupti Mayee Behera,
Sushanta Kumar Mohapatra,
Umesh Chandra Samal,
Mohammad. S. Khan,
Mahmoud Daneshmand,
Amir H. Gandomi
Abstract:
Wireless sensor networks (WSN) groups specialized transducers that provide sensing services to Internet of Things (IoT) devices with limited energy and storage resources. Since replacement or recharging of batteries in sensor nodes is almost impossible, power consumption becomes one of the crucial design issues in WSN. Clustering algorithm plays an important role in power conservation for the ener…
▽ More
Wireless sensor networks (WSN) groups specialized transducers that provide sensing services to Internet of Things (IoT) devices with limited energy and storage resources. Since replacement or recharging of batteries in sensor nodes is almost impossible, power consumption becomes one of the crucial design issues in WSN. Clustering algorithm plays an important role in power conservation for the energy constrained network. Choosing a cluster head can appropriately balance the load in the network thereby reducing energy consumption and enhancing lifetime. The paper focuses on an efficient cluster head election scheme that rotates the cluster head position among the nodes with higher energy level as compared to other. The algorithm considers initial energy, residual energy and an optimum value of cluster heads to elect the next group of cluster heads for the network that suits for IoT applications such as environmental monitoring, smart cities, and systems. Simulation analysis shows the modified version performs better than the LEACH protocol by enhancing the throughput by 60%, lifetime by 66%, and residual energy by 64%.
△ Less
Submitted 4 February, 2019;
originally announced February 2019.
-
A new quadratic-time number-theoretic algorithm to solve matrix multiplication problem
Authors:
Shrohan Mohapatra
Abstract:
There have been several algorithms designed to optimise matrix multiplication. From schoolbook method with complexity $O(n^3)$ to advanced tensor-based tools with time complexity $O(n^{2.3728639})$ (lowest possible bound achieved), a lot of work has been done to reduce the steps used in the recursive version. Some group-theoretic and computer algebraic estimations also conjecture the existence of…
▽ More
There have been several algorithms designed to optimise matrix multiplication. From schoolbook method with complexity $O(n^3)$ to advanced tensor-based tools with time complexity $O(n^{2.3728639})$ (lowest possible bound achieved), a lot of work has been done to reduce the steps used in the recursive version. Some group-theoretic and computer algebraic estimations also conjecture the existence of an $O(n^2)$ algorithm. This article discusses a quadratic-time number-theoretic approach that converts large vectors in the operands to a single large entity and combines them to make the dot-product. For two $n \times n$ matrices, this dot-product is iteratively used for each such vector. Preprocessing and computation makes it a quadratic time algorithm with a considerable constant of proportionality. Special strategies for integers, floating point numbers and complex numbers are also discussed, with a theoretical estimation of time and space complexity.
△ Less
Submitted 28 January, 2019; v1 submitted 10 June, 2018;
originally announced June 2018.