Search | arXiv e-print repository

CodeGemma: Open Code Models Based on Gemma

Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, **gyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and open-ended generation in latency-sensitive settings. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

arXiv:2404.04692 [pdf, other]

Securing the Skies: An IRS-Assisted AoI-Aware Secure Multi-UAV System with Efficient Task Offloading

Authors: Poorvi Joshi, Alakesh Kalita, Mohan Gurusamy

Abstract: Unmanned Aerial Vehicles (UAVs) are integral in various sectors like agriculture, surveillance, and logistics, driven by advancements in 5G. However, existing research lacks a comprehensive approach addressing both data freshness and security concerns. In this paper, we address the intricate challenges of data freshness, and security, especially in the context of eavesdrop** and jamming in moder… ▽ More Unmanned Aerial Vehicles (UAVs) are integral in various sectors like agriculture, surveillance, and logistics, driven by advancements in 5G. However, existing research lacks a comprehensive approach addressing both data freshness and security concerns. In this paper, we address the intricate challenges of data freshness, and security, especially in the context of eavesdrop** and jamming in modern UAV networks. Our framework incorporates exponential AoI metrics and emphasizes secrecy rate to tackle eavesdrop** and jamming threats. We introduce a transformer-enhanced Deep Reinforcement Learning (DRL) approach to optimize task offloading processes. Comparative analysis with existing algorithms showcases the superiority of our scheme, indicating its promising advancements in UAV network management. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 7 pages, 5 figures, to be published in IEEE 99th Vehicular Technology Conference (VTC2024-Spring)

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.05303 [pdf, other]

Reliable and Efficient Data Collection in UAV-based IoT Networks

Authors: Poorvi Joshi, Alakesh Kalita, Mohan Gurusamy

Abstract: Internet of Things (IoT) involves sensors for monitoring and wireless networks for efficient communication. However, resource-constrained IoT devices and limitations in existing wireless technologies hinder its full potential. Integrating Unmanned Aerial Vehicles (UAVs) into IoT networks can address some challenges by expanding its' coverage, providing security, and bringing computing closer to Io… ▽ More Internet of Things (IoT) involves sensors for monitoring and wireless networks for efficient communication. However, resource-constrained IoT devices and limitations in existing wireless technologies hinder its full potential. Integrating Unmanned Aerial Vehicles (UAVs) into IoT networks can address some challenges by expanding its' coverage, providing security, and bringing computing closer to IoT devices. Nevertheless, effective data collection in UAV-assisted IoT networks is hampered by factors, including dynamic UAV behavior, environmental variables, connectivity instability, and security considerations. In this survey, we first explore UAV-based IoT networks, focusing on communication and networking aspects. Next, we cover various UAV-based data collection methods their advantages and disadvantages, followed by a discussion on performance metrics for data collection. As this article primarily emphasizes reliable and efficient data collection in UAV-assisted IoT networks, we briefly discuss existing research on data accuracy and consistency, network connectivity, and data security and privacy to provide insights into reliable data collection. Additionally, we discuss efficient data collection strategies in UAV-based IoT networks, covering trajectory and path planning, collision avoidance, sensor network clustering, data aggregation, UAV swarm formations, and artificial intelligence for optimization. We also present two use cases of UAVs as a service for enhancing data collection reliability and efficiency. Finally, we discuss future challenges in data collection for UAV-assisted IoT networks. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 32 pages, 7 figures, 7 tables

arXiv:2310.08743 [pdf]

Development and Validation of a Deep Learning-Based Microsatellite Instability Predictor from Prostate Cancer Whole-Slide Images

Authors: Qiyuan Hu, Abbas A. Rizvi, Geoffery Schau, Kshitij Ingale, Yoni Muller, Rachel Baits, Sebastian Pretzer, Aïcha BenTaieb, Abigail Gordhamer, Roberto Nussenzveig, Adam Cole, Matthew O. Leavitt, Rohan P. Joshi, Nike Beaubier, Martin C. Stumpe, Kunal Nagpal

Abstract: Microsatellite instability-high (MSI-H) is a tumor agnostic biomarker for immune checkpoint inhibitor therapy. However, MSI status is not routinely tested in prostate cancer, in part due to low prevalence and assay cost. As such, prediction of MSI status from hematoxylin and eosin (H&E) stained whole-slide images (WSIs) could identify prostate cancer patients most likely to benefit from confirmato… ▽ More Microsatellite instability-high (MSI-H) is a tumor agnostic biomarker for immune checkpoint inhibitor therapy. However, MSI status is not routinely tested in prostate cancer, in part due to low prevalence and assay cost. As such, prediction of MSI status from hematoxylin and eosin (H&E) stained whole-slide images (WSIs) could identify prostate cancer patients most likely to benefit from confirmatory testing and becoming eligible for immunotherapy. Prostate biopsies and surgical resections from de-identified records of consecutive prostate cancer patients referred to our institution were analyzed. Their MSI status was determined by next generation sequencing. Patients before a cutoff date were split into an algorithm development set (n=4015, MSI-H 1.8%) and a paired validation set (n=173, MSI-H 19.7%) that consisted of two serial sections from each sample, one stained and scanned internally and the other at an external site. Patients after the cutoff date formed the temporal validation set (n=1350, MSI-H 2.3%). Attention-based multiple instance learning models were trained to predict MSI-H from H&E WSIs. The MSI-H predictor achieved area under the receiver operating characteristic curve values of 0.78 (95% CI [0.69-0.86]), 0.72 (95% CI [0.63-0.81]), and 0.72 (95% CI [0.62-0.82]) on the internally prepared, externally prepared, and temporal validation sets, respectively. While MSI-H status is significantly correlated with Gleason score, the model remained predictive within each Gleason score subgroup. In summary, we developed and validated an AI-based MSI-H diagnostic model on a large real-world cohort of routine H&E slides, which effectively generalized to externally stained and scanned samples and a temporally independent validation cohort. This algorithm has the potential to direct prostate cancer patients toward immunotherapy and to identify MSI-H cases secondary to Lynch syndrome. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.07682 [pdf]

Prediction of MET Overexpression in Non-Small Cell Lung Adenocarcinomas from Hematoxylin and Eosin Images

Authors: Kshitij Ingale, Sun Hae Hong, Josh S. K. Bell, Abbas Rizvi, Amy Welch, Lingdao Sha, Irvin Ho, Kunal Nagpal, Aicha BenTaieb, Rohan P Joshi, Martin C Stumpe

Abstract: MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithm… ▽ More MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithms using routinely available digitized hematoxylin and eosin (H&E)-stained slides to predict MET overexpression could promote testing for those who will benefit most. While assessment of MET expression using IHC is currently not routinely performed in NSCLC, next-generation sequencing is common and in some cases includes RNA expression panel testing. In this work, we leveraged a large database of matched H&E slides and RNA expression data to train a weakly supervised model to predict MET RNA overexpression directly from H&E images. This model was evaluated on an independent holdout test set of 300 over-expressed and 289 normal patients, demonstrating an ROC-AUC of 0.70 (95th percentile interval: 0.66 - 0.74) with stable performance characteristics across different patient clinical variables and robust to synthetic noise on the test set. These results suggest that H&E-based predictive models could be useful to prioritize patients for confirmatory testing of MET protein or MET gene expression status. △ Less

Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.06881 [pdf]

doi 10.1093/bioadv/vbae043

CAFA-evaluator: A Python Tool for Benchmarking Ontological Classification Methods

Authors: Damiano Piovesan, Davide Zago, Parnal Joshi, M. Clara De Paolis Kaluza, Mahta Mehdiabadi, Rashika Ramola, Alexander Miguel Monzon, Walter Reade, Iddo Friedberg, Predrag Radivojac, Silvio C. E. Tosatto

Abstract: We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requiremen… ▽ More We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software. △ Less

Submitted 12 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 5 pages

arXiv:2307.13266 [pdf, other]

Federated Split Learning with Only Positive Labels for resource-constrained IoT environment

Authors: Praveen Joshi, Chandra Thapa, Mohammed Hasanuzzaman, Ted Scully, Haithem Afli

Abstract: Distributed collaborative machine learning (DCML) is a promising method in the Internet of Things (IoT) domain for training deep learning models, as data is distributed across multiple devices. A key advantage of this approach is that it improves data privacy by removing the necessity for the centralized aggregation of raw data but also empowers IoT devices with low computational power. Among vari… ▽ More Distributed collaborative machine learning (DCML) is a promising method in the Internet of Things (IoT) domain for training deep learning models, as data is distributed across multiple devices. A key advantage of this approach is that it improves data privacy by removing the necessity for the centralized aggregation of raw data but also empowers IoT devices with low computational power. Among various techniques in a DCML framework, federated split learning, known as splitfed learning (SFL), is the most suitable for efficient training and testing when devices have limited computational capabilities. Nevertheless, when resource-constrained IoT devices have only positive labeled data, multiclass classification deep learning models in SFL fail to converge or provide suboptimal results. To overcome these challenges, we propose splitfed learning with positive labels (SFPL). SFPL applies a random shuffling function to the smashed data received from clients before supplying it to the server for model training. Additionally, SFPL incorporates the local batch normalization for the client-side model portion during the inference phase. Our results demonstrate that SFPL outperforms SFL: (i) by factors of 51.54 and 32.57 for ResNet-56 and ResNet-32, respectively, with the CIFAR-100 dataset, and (ii) by factors of 9.23 and 8.52 for ResNet-32 and ResNet-8, respectively, with CIFAR-10 dataset. Overall, this investigation underscores the efficacy of the proposed SFPL framework in DCML. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 11 pages, 3 figures

arXiv:2305.07118 [pdf, other]

Commitment over Gaussian Unfair Noisy Channels

Authors: Amitalok J. Budkuley, Pranav Joshi, Manideep Mamindlapally, Anuj Kumar Yadav

Abstract: Commitment is a key primitive which resides at the heart of several cryptographic protocols. Noisy channels can help realize information-theoretically secure commitment schemes, however, their imprecise statistical characterization can severely impair such schemes, especially their security guarantees. Kee** our focus on channel unreliability in this work, we study commitment over unreliable con… ▽ More Commitment is a key primitive which resides at the heart of several cryptographic protocols. Noisy channels can help realize information-theoretically secure commitment schemes, however, their imprecise statistical characterization can severely impair such schemes, especially their security guarantees. Kee** our focus on channel unreliability in this work, we study commitment over unreliable continuous alphabet channels called the Gaussian unfair noisy channels or Gaussian UNCs. We present the first results on the optimal throughput or commitment capacity of Gaussian UNCs. It is known that classical Gaussian channels have infinite commitment capacity, even under finite transmit power constraints. For unreliable Gaussian UNCs, we prove the surprising result that their commitment capacity may be finite, and in some cases, zero. When commitment is possible, we present achievable rate lower bounds by constructing positive - throughput protocols under given input power constraint, and (two-sided) channel elasticity at committer Alice and receiver Bob. Our achievability results establish an interesting fact - Gaussian UNCs with zero elasticity have infinite commitment capacity - which brings a completely new perspective to why classic Gaussian channels, i.e., Gaussian UNCs with zero elasticity, have infinite capacity. Finally, we precisely characterize the positive commitment capacity threshold for a Gaussian UNC in terms of the channel elasticity, when the transmit power tends to infinity. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: The paper follows alphabetical author order. AKY, MM, and PJ have equally contributed to this work

arXiv:2211.07514 [pdf, other]

CST5: Data Augmentation for Code-Switched Semantic Parsing

Authors: Anmol Agarwal, Jigar Gupta, Rahul Goel, Shyam Upadhyay, Pankaj Joshi, Rengarajan Aravamudhan

Abstract: Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set ($\approx$100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data,… ▽ More Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set ($\approx$100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data, both intrinsically (per human evaluation) and extrinsically by comparing baseline models which are trained without data augmentation to models which are trained with augmented data. Empirically we observe that using CST5, one can achieve the same semantic parsing performance by using up to 20x less labeled data. To aid further research in this area, we are also releasing (a) Hinglish-TOP, the largest human annotated code-switched semantic parsing dataset to date, containing 10k human annotated Hindi-English (Hinglish) code-switched utterances, and (b) Over 170K CST5 generated code-switched utterances from the TOPv2 dataset. Human evaluation shows that both the human annotated data as well as the CST5 generated data is of good quality. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2208.07855 [pdf]

doi 10.1007/s11042-022-13412-y

Deep Learning for Size and Microscope Feature Extraction and Classification in Oral Cancer: Enhanced Convolution Neural Network

Authors: Prakrit Joshi, Omar Hisham Alsadoon, Abeer Alsadoon, Nada AlSallami, Tarik A. Rashid, P. W. C. Prasad, Sami Haddad

Abstract: Background and Aim: Over-fitting issue has been the reason behind deep learning technology not being successfully implemented in oral cancer images classification. The aims of this research were reducing overfitting for accurately producing the required dimension reduction feature map through Deep Learning algorithm using Convolutional Neural Network. Methodology: The proposed system consists of E… ▽ More Background and Aim: Over-fitting issue has been the reason behind deep learning technology not being successfully implemented in oral cancer images classification. The aims of this research were reducing overfitting for accurately producing the required dimension reduction feature map through Deep Learning algorithm using Convolutional Neural Network. Methodology: The proposed system consists of Enhanced Convolutional Neural Network that uses an autoencoder technique to increase the efficiency of the feature extraction process and compresses information. In this technique, unpooling and deconvolution is done to generate the input data to minimize the difference between input and output data. Moreover, it extracts characteristic features from the input data set to regenerate input data from those features by learning a network to reduce overfitting. Results: Different accuracy and processing time value is achieved while using different sample image group of Confocal Laser Endomicroscopy (CLE) images. The results showed that the proposed solution is better than the current system. Moreover, the proposed system has improved the classification accuracy by 5~ 5.5% on average and reduced the average processing time by 20 ~ 30 milliseconds. Conclusion: The proposed system focuses on the accurate classification of oral cancer cells of different anatomical locations from the CLE images. Finally, this study enhances the accuracy and processing time using the autoencoder method that solves the overfitting problem. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 21 pages

Journal ref: Multimed Tools Appl., 2022

arXiv:2206.02794 [pdf, other]

Machine learning models for determination of weldbead shape parameters for gas metal arc welded T-joints -- A comparative study

Authors: R. Pradhan, A. P Joshi, M. R Sunny, A. Sarkar

Abstract: The shape of a weld bead is critical in assessing the quality of the welded joint. In particular, this has a major impact in the accuracy of the results obtained from a numerical analysis. This study focuses on the statistical design techniques and the artificial neural networks, to predict the weld bead shape parameters of shielded Gas Metal Arc Welded (GMAW) fillet joints. Extensive testing was… ▽ More The shape of a weld bead is critical in assessing the quality of the welded joint. In particular, this has a major impact in the accuracy of the results obtained from a numerical analysis. This study focuses on the statistical design techniques and the artificial neural networks, to predict the weld bead shape parameters of shielded Gas Metal Arc Welded (GMAW) fillet joints. Extensive testing was carried out on low carbon mild steel plates of thicknesses ranging from 3mm to 10mm. Welding voltage, welding current, and moving heat source speed were considered as the welding parameters. Three types of multiple linear regression models (MLR) were created to establish an empirical equation for defining GMAW bead shape parameters considering interactive and higher order terms. Additionally, artificial neural network (ANN) models were created based on similar scheme, and the relevance of specific features was investigated using SHapley Additive exPlanations (SHAP). The results reveal that MLR-based approach performs better than the ANN based models in terms of predictability and error assessment. This study shows the usefulness of the predictive tools to aid numerical analysis of welding. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.09402 [pdf]

Predictive Maintenance using Machine Learning

Authors: Archit P. Kane, Ashutosh S. Kore, Advait N. Khandale, Sarish S. Nigade, Pranjali P. Joshi

Abstract: Predictive maintenance (PdM) is a concept, which is implemented to effectively manage maintenance plans of the assets by predicting their failures with data driven techniques. In these scenarios, data is collected over a certain period of time to monitor the state of equipment. The objective is to find some correlations and patterns that can help predict and ultimately prevent failures. Equipment… ▽ More Predictive maintenance (PdM) is a concept, which is implemented to effectively manage maintenance plans of the assets by predicting their failures with data driven techniques. In these scenarios, data is collected over a certain period of time to monitor the state of equipment. The objective is to find some correlations and patterns that can help predict and ultimately prevent failures. Equipment in manufacturing industry are often utilized without a planned maintenance approach. Such practise frequently results in unexpected downtime, owing to certain unexpected failures. In scheduled maintenance, the condition of the manufacturing equipment is checked after fixed time interval and if any fault occurs, the component is replaced to avoid unexpected equipment stoppages. On the flip side, this leads to increase in time for which machine is non-functioning and cost of carrying out the maintenance. The emergence of Industry 4.0 and smart systems have led to increasing emphasis on predictive maintenance (PdM) strategies that can reduce the cost of downtime and increase the availability (utilization rate) of manufacturing equipment. PdM also has the potential to bring about new sustainable practices in manufacturing by fully utilizing the useful lives of components. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.07453 [pdf, other]

Regression Test Suite for Payment Switch using jPOS

Authors: Atharv Sardesai, Hatim Piplodwala, Sidhant Hargunani, Swarnim Sonawane, Prof. Pranjali Joshi, Vikram Mohite

Abstract: The Payment Switch is an integral component of all modern payment and banking systems in India. The NPCI currently provides a simulator to test payment switches. However, this system has a few disadvantages viz. it lacks an API, it requires manual generation of each test case and during high server loads, the testing process may take a long time. Currently there aren't any open source alternatives… ▽ More The Payment Switch is an integral component of all modern payment and banking systems in India. The NPCI currently provides a simulator to test payment switches. However, this system has a few disadvantages viz. it lacks an API, it requires manual generation of each test case and during high server loads, the testing process may take a long time. Currently there aren't any open source alternatives to the NPCI simulator. We propose a system which solves these shortcomings. Our proposed system simulates the NPCI system. It allows connection with switches that are to be tested and automates the process of generation and execution of test cases. It also has the capability to generate test reports and can be run locally. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2205.03702 [pdf, other]

Keratoconus Classifier for Smartphone-based Corneal Topographer

Authors: Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, espe… ▽ More Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, especially in the Global South. Low-cost smartphone-based corneal topographers, such as SmartKC, have been proposed to make keratoconus diagnosis accessible. Similar to medical-grade topographers, SmartKC outputs curvature heatmaps and quantitative metrics that need to be evaluated by doctors for keratoconus diagnosis. An automatic scheme for evaluation of these heatmaps and quantitative values can play a crucial role in screening keratoconus in areas where doctors are not available. In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: 4 pages

arXiv:2204.03326 [pdf, other]

Enabling All In-Edge Deep Learning: A Literature Review

Authors: Praveen Joshi, Mohammed Hasanuzzaman, Chandra Thapa, Haithem Afli, Ted Scully

Abstract: In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition and natural language understanding. One of the significant contributors to its success is the proliferation of end devices that acted as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference is the main challenge. Usually,… ▽ More In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition and natural language understanding. One of the significant contributors to its success is the proliferation of end devices that acted as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference is the main challenge. Usually, central cloud servers are used for the computation, but it opens up other significant challenges, such as high latency, increased communication costs, and privacy concerns. To mitigate these drawbacks, considerable efforts have been made to push the processing of DL models to edge servers. Moreover, the confluence point of DL and edge has given rise to edge intelligence (EI). This survey paper focuses primarily on the fifth level of EI, called all in-edge level, where DL training and inference (deployment) are performed solely by edge servers. All in-edge is suitable when the end devices have low computing resources, e.g., Internet-of-Things, and other requirements such as latency and communication cost are important in mission-critical applications, e.g., health care. Firstly, this paper presents all in-edge computing architectures, including centralized, decentralized, and distributed. Secondly, this paper presents enabling technologies, such as model parallelism and split learning, which facilitate DL training and deployment at edge servers. Thirdly, model adaptation techniques based on model compression and conditional computation are described because the standard cloud-based DL deployment cannot be directly applied to all in-edge due to its limited computational resources. Fourthly, this paper discusses eleven key performance metrics to evaluate the performance of DL at all in-edge efficiently. Finally, several open research challenges in the area of all in-edge are presented. △ Less

Submitted 12 December, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: 21 pages

arXiv:2204.00054 [pdf, ps, other]

Distributed Robust Geocast Multicast Routing for Inter-Vehicle Communication

Authors: Harshvardhan P. Joshi, Mihail L. Sichitiu, Maria Kihl

Abstract: Numerous protocols for geocast have been proposed in literature. It has been shown that explicit route setup approaches perform poorly with VANETs due to limited route lifetime and frequent network fragmentation. The broadcast based approaches have considerable redundancy and add significantly to the overhead of the protocol. A completely distributed and robust geocast approach is presented in thi… ▽ More Numerous protocols for geocast have been proposed in literature. It has been shown that explicit route setup approaches perform poorly with VANETs due to limited route lifetime and frequent network fragmentation. The broadcast based approaches have considerable redundancy and add significantly to the overhead of the protocol. A completely distributed and robust geocast approach is presented in this paper, that is resilient to frequent topology changes and network fragmentation. A distance-based backoff algorithm is used to reduce the number of hops and a novel mechanism to reduce redundant broadcasts is introduced. The performance of the proposed protocol is evaluated for various scenarios and compared with simple flooding and a protocol based on explicit route setup. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: 12 pages

Journal ref: Proceedings of WEIRD workshop on WiMax, Wireless and Mobility, 2007

arXiv:2203.13948 [pdf]

AI-augmented histopathologic review using image analysis to optimize DNA yield and tumor purity from FFPE slides

Authors: Bolesław L. Osinski, Aïcha BenTaieb, Irvin Ho, Ryan D. Jones, Rohan P. Joshi, Andrew Westley, Michael Carlson, Caleb Willis, Luke Schleicher, Brett M. Mahon, Martin C. Stumpe

Abstract: To achieve minimum DNA input and tumor purity requirements for next-generation sequencing (NGS), pathologists visually estimate macrodissection and slide count decisions. Misestimation may cause tissue waste and increased laboratory costs. We developed an AI-augmented smart pathology review system (SmartPath) to empower pathologists with quantitative metrics for determining tissue extraction param… ▽ More To achieve minimum DNA input and tumor purity requirements for next-generation sequencing (NGS), pathologists visually estimate macrodissection and slide count decisions. Misestimation may cause tissue waste and increased laboratory costs. We developed an AI-augmented smart pathology review system (SmartPath) to empower pathologists with quantitative metrics for determining tissue extraction parameters. Using digitized H&E-stained FFPE slides as inputs, SmartPath segments tumors, extracts cell-based features, and suggests macrodissection areas. To predict DNA yield per slide, the extracted features are correlated with known DNA yields. Then, a pathologist-defined target yield divided by the predicted DNA yield/slide gives the number of slides to scrape. Following model development, an internal validation trial was conducted within the Tempus Labs molecular sequencing laboratory. We evaluated our system on 501 clinical colorectal cancer slides, where half received SmartPath-augmented review and half traditional pathologist review. The SmartPath cohort had 25% more DNA yields within a desired target range of 100-2000ng. The SmartPath system recommended fewer slides to scrape for large tissue sections, saving tissue in these cases. Conversely, SmartPath recommended more slides to scrape for samples with scant tissue sections, hel** prevent costly re-extraction due to insufficient extraction yield. A statistical analysis was performed to measure the impact of covariates on the results, offering insights on how to improve future applications of SmartPath. Overall, the study demonstrated that AI-augmented histopathologic review using SmartPath could decrease tissue waste, sequencing time, and laboratory costs by optimizing DNA yields and tumor purity. △ Less

Submitted 7 April, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.12793 [pdf, other]

doi 10.1109/ICC45855.2022.9838876

A Reinforcement Approach for Detecting P2P Botnet Communities in Dynamic Communication Graphs

Authors: Harshvardhan P. Joshi, Rudra Dutta

Abstract: Peer-to-peer (P2P) botnets use decentralized command and control networks that make them resilient to disruptions. The P2P botnet overlay networks manifest structures in mutual-contact graphs, also called communication graphs, formed using network traffic information. It has been shown that these structures can be detected using community detection techniques from graph theory. These previous work… ▽ More Peer-to-peer (P2P) botnets use decentralized command and control networks that make them resilient to disruptions. The P2P botnet overlay networks manifest structures in mutual-contact graphs, also called communication graphs, formed using network traffic information. It has been shown that these structures can be detected using community detection techniques from graph theory. These previous works, however, treat the communication graphs and the P2P botnet structures as static. In reality, communication graphs are dynamic as they represent the continuously changing network traffic flows. Similarly, the P2P botnets also evolve with time, as new bots join and existing bots leave either temporarily or permanently. In this paper we address the problem of detecting such evolving P2P botnet communities in dynamic communication graphs. We propose a reinforcement-based approach, suitable for large communication graphs, that improves precision and recall of P2P botnet community detection in dynamic communication graphs. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.10062 [pdf]

Imaging-based histological features are predictive of MET alterations in Non-Small Cell Lung Cancer

Authors: Rohan P. Joshi, Bolesław L. Osinski, Niha Beig, Lingdao Sha, Kshitij Ingale, Martin C. Stumpe

Abstract: MET is a proto-oncogene whose somatic activation in non-small cell lung cancer leads to increased cell growth and tumor progression. The two major classes of MET alterations are gene amplification and exon 14 deletion, both of which are therapeutic targets and detectable using existing molecular assays. However, existing tests are limited by their consumption of valuable tissue, cost and complexit… ▽ More MET is a proto-oncogene whose somatic activation in non-small cell lung cancer leads to increased cell growth and tumor progression. The two major classes of MET alterations are gene amplification and exon 14 deletion, both of which are therapeutic targets and detectable using existing molecular assays. However, existing tests are limited by their consumption of valuable tissue, cost and complexity that prevent widespread use. MET alterations could have an effect on cell morphology, and quantifying these associations could open new avenues for research and development of morphology-based screening tools. Using H&E-stained whole slide images (WSIs), we investigated the association of distinct cell-morphological features with MET amplifications and MET exon 14 deletions. We found that cell shape, color, grayscale intensity and texture-based features from both tumor infiltrating lymphocytes and tumor cells distinguished MET wild-type from MET amplified or MET exon 14 deletion cases. The association of individual cell features with MET alterations suggested a predictive model could distinguish MET wild-type from MET amplification or MET exon 14 deletion. We therefore developed an L1-penalized logistic regression model, achieving a mean Area Under the Receiver Operating Characteristic Curve (ROC-AUC) of 0.77 +/- 0.05sd in cross-validation and 0.77 on an independent holdout test set. A sparse set of 43 features differentiated these classes, which included features similar to what was found in the univariate analysis as well as the percent of tumor cells in the tissue. Our study demonstrates that MET alterations result in a detectable morphological signal in tumor cells and lymphocytes. These results suggest that development of low-cost predictive models based on H&E-stained WSIs may improve screening for MET altered tumors. △ Less

Submitted 29 March, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 30 pages, 4 figures

arXiv:2111.08477 [pdf, other]

On Reverse Elastic Channels and the Asymmetry of Commitment Capacity under Channel Elasticity

Authors: Amitalok J. Budkuley, Pranav Joshi, Manideep Mamindlapally, Anuj Kumar Yadav

Abstract: Commitment is an important cryptographic primitive. It is well known that noisy channels are a promising resource to realize commitment in an information-theoretically secure manner. However, oftentimes, channel behaviour may be poorly characterized thereby limiting the commitment throughput and/or degrading the security guarantees; particularly problematic is when a dishonest party, unbeknown to… ▽ More Commitment is an important cryptographic primitive. It is well known that noisy channels are a promising resource to realize commitment in an information-theoretically secure manner. However, oftentimes, channel behaviour may be poorly characterized thereby limiting the commitment throughput and/or degrading the security guarantees; particularly problematic is when a dishonest party, unbeknown to the honest one, can maliciously alter the channel characteristics. Reverse elastic channels (RECs) are an interesting class of such unreliable channels, where only a dishonest committer, say, Alice can maliciously alter the channel. RECs have attracted recent interest in the study of several cryptographic primitives. Our principal contribution is the REC commitment capacity characterization; this proves a recent related conjecture. A key result is our tight converse which analyses a specific cheating strategy by Alice. RECs are closely related to the classic unfair noisy channels (UNCs); elastic channels (ECs), where only a dishonest receiver Bob can alter the channel, are similarly related. In stark contrast to UNCs, both RECs and ECs always exhibit positive commitment throughput for all non-trivial parameters. Interestingly, our results show that channels with exclusive one-sided elasticity for dishonest parties, exhibit a fundamental asymmetry where a committer with one-sided elasticity has a more debilitating effect on the commitment throughput than a receiver. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: 16 pages, 3 figures

arXiv:2111.01354 [pdf, other]

SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection

Authors: Siddhartha Gairola, Murtuza Bohra, Nadeem Shaheer, Navya Jayaprakash, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it… ▽ More Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it a leading cause for partial/complete blindness among such populations. We propose SmartKC, a low-cost, smartphone-based keratoconus diagnosis system comprising of a 3D-printed placido's disc attachment, an LED light strip, and an intelligent smartphone app to capture the reflection of the placido rings on the cornea. An image processing pipeline analyzes the corneal image and uses the smartphone's camera parameters, the placido rings' 3D location, the pixel location of the reflected placido rings and the setup's working distance to construct the corneal surface, via the Arc-Step method and Zernike polynomials based surface fitting. In a clinical study with 101 distinct eyes, we found that SmartKC achieves a sensitivity of 94.1% and a specificity of 100.0%. Moreover, the quantitative curvature estimates (sim-K) strongly correlate with a gold-standard medical device (Pearson correlation coefficient =0.78). Our results indicate that SmartKC has the potential to be used as a keratoconus screening tool under real-world medical settings. △ Less

Submitted 21 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Change Log: + Fixed sim-K computation (updated Section 5.5.3); re-ran our pipeline with the updated sim-K values (updated Figure 7); + Conducted the comparative evaluation with doctors again (total 4 doctors), and got improved results (updated Section 7.2 and Table 2); [Note: This is an updated version of the paper that was accepted for publication in IMWUT 2021.]

arXiv:2109.09246 [pdf, other]

Splitfed learning without client-side synchronization: Analyzing client-side split network portion size to overall performance

Authors: Praveen Joshi, Chandra Thapa, Seyit Camtepe, Mohammed Hasanuzzamana, Ted Scully, Haithem Afli

Abstract: Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL) are three recent developments in distributed machine learning that are gaining attention due to their ability to preserve the privacy of raw data. Thus, they are widely applicable in various domains where data is sensitive, such as large-scale medical image classification, internet-of-medical-things, and cross-organization p… ▽ More Federated Learning (FL), Split Learning (SL), and SplitFed Learning (SFL) are three recent developments in distributed machine learning that are gaining attention due to their ability to preserve the privacy of raw data. Thus, they are widely applicable in various domains where data is sensitive, such as large-scale medical image classification, internet-of-medical-things, and cross-organization phishing email detection. SFL is developed on the confluence point of FL and SL. It brings the best of FL and SL by providing parallel client-side machine learning model updates from the FL paradigm and a higher level of model privacy (while training) by splitting the model between the clients and server coming from SL. However, SFL has communication and computation overhead at the client-side due to the requirement of client-side model synchronization. For the resource-constrained client-side, removal of such requirements is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as Multi-head Split Learning. Our empirical studies considering the ResNet18 model on MNIST data under IID data distribution among distributed clients find that Multi-head Split Learning is feasible. Its performance is comparable to the SFL. Moreover, SFL provides only 1%-2% better accuracy than Multi-head Split Learning on the MNIST test set. To further strengthen our results, we study the Multi-head Split Learning with various client-side model portions and its impact on the overall performance. To this end, our results find a minimal impact on the overall performance of the model. △ Less

Submitted 19 September, 2021; originally announced September 2021.

Comments: CERC 2021

arXiv:2106.09536 [pdf, ps, other]

Single Event Transient Fault Analysis of ELEPHANT cipher

Authors: Priyanka Joshi, Bodhistwa Mazumdar

Abstract: In this paper, we propose a novel fault attack termed as Single Event Transient Fault Analysis (SETFA) attack, which is well suited for hardware implementations. The proposed approach pinpoints hotspots in the cypher's Sbox combinational logic circuit that significantly reduce the key entropy when subjected to faults. ELEPHANT is a parallel authenticated encryption and associated data (AEAD) schem… ▽ More In this paper, we propose a novel fault attack termed as Single Event Transient Fault Analysis (SETFA) attack, which is well suited for hardware implementations. The proposed approach pinpoints hotspots in the cypher's Sbox combinational logic circuit that significantly reduce the key entropy when subjected to faults. ELEPHANT is a parallel authenticated encryption and associated data (AEAD) scheme targeted to hardware implementations, a finalist in the Lightweight cryptography (LWC) competition launched by NIST. In this work, we investigate vulnerabilities of ELEPHANT against fault analysis. We observe that the use of 128-bit random nonce makes it resistant against many cryptanalysis techniques like differential, linear, etc., and their variants. However, the relaxed nature of Statistical Fault Analysis (SFA) methods makes them widely applicable in restrictive environments. We propose a SETFA-based key recovery attack on Elephant. We performed Single experiments with random plaintexts and keys, on Dumbo, a Sponge-based instance of the Elephant-AEAD scheme. Our proposed approach could recover the secret key in 85-250 ciphertexts. In essence, this work investigates new vulnerabilities towards fault analysis that may require to be addressed to ensure secure computations and communications in IoT scenarios. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: 5 pages, 2 figures

arXiv:2102.09866 [pdf]

KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Code-Mixed Social Media text

Authors: Varsha Pathak, Manish Joshi, Prasad Joshi, Monica Mundada, Tanmay Joshi

Abstract: This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers… ▽ More This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers. These datasets are used to train the machine using different machine learning algorithms, based on classification and regression models. The datasets consist of tweets or YouTube comments with two class labels offensive and not offensive. The machine is trained to classify such social media messages in these two categories. Appropriate n-gram feature sets are extracted to learn the specific characteristics of the Hate Speech text messages. These feature models are based on TFIDF weights of n-gram. The referred work and respective experiments show that the features such as word, character and combined model of word and character n-grams could be used to identify the term patterns of offensive text contents. As a part of the HASOC shared task, the test data sets are made available by the HASOC track organizers. The best performing classification models developed for both languages are applied on test datasets. The model which gives the highest accuracy result on training dataset for Malayalam language was experimented to predict the categories of respective test data. This system has obtained an F1 score of 0.77. Similarly the best performing model for Tamil language has obtained an F1 score of 0.87. This work has received 2nd and 3rd rank in this shared Task 2 for Malayalam and Tamil language respectively. The proposed system is named HASOC_kbcnmujal. △ Less

Submitted 19 February, 2021; originally announced February 2021.

arXiv:2102.06045 [pdf, other]

Artificial Intelligence based Autonomous Molecular Design for Medical Therapeutic: A Perspective

Authors: Rajendra P. Joshi, Neeraj Kumar

Abstract: Domain-aware machine learning (ML) models have been increasingly adopted for accelerating small molecule therapeutic design in the recent years. These models have been enabled by significant advancement in state-of-the-art artificial intelligence (AI) and computing infrastructures. Several ML architectures are pre-dominantly and independently used either for predicting the properties of small mole… ▽ More Domain-aware machine learning (ML) models have been increasingly adopted for accelerating small molecule therapeutic design in the recent years. These models have been enabled by significant advancement in state-of-the-art artificial intelligence (AI) and computing infrastructures. Several ML architectures are pre-dominantly and independently used either for predicting the properties of small molecules, or for generating lead therapeutic candidates. Synergetically using these individual components along with robust representation and data generation techniques autonomously in closed loops holds enormous promise for accelerated drug design which is a time consuming and expensive task otherwise. In this perspective, we present the most recent breakthrough achieved by each of the components, and how such autonomous AI and ML workflow can be realized to radically accelerate the hit identification and lead optimization. Taken together, this could significantly shorten the timeline for end-to-end antiviral discovery and optimization times to weeks upon the arrival of a novel zoonotic transmission event. Our perspective serves as a guide for researchers to practice autonomous molecular design in therapeutic discovery. △ Less

Submitted 9 February, 2021; originally announced February 2021.

arXiv:2009.14505 [pdf, other]

TaxiNLI: Taking a Ride up the NLU Hill

Authors: Pratik Joshi, Somak Aditya, Aalok Sathe, Monojit Choudhury

Abstract: Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this questi… ▽ More Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this question, we propose a taxonomic hierarchy of categories that are relevant for the NLI task. We introduce TAXINLI, a new dataset, that has 10k examples from the MNLI dataset (Williams et al., 2018) with these taxonomic labels. Through various experiments on TAXINLI, we observe that whereas for certain taxonomic categories SOTA neural models have achieved near perfect accuracies - a large jump over the previous models - some categories still remain difficult. Our work adds to the growing body of literature that shows the gaps in the current NLI systems and datasets through a systematic presentation and analysis of reasoning categories. △ Less

Submitted 9 October, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

Comments: 15 pages, 9 figures, 4 tables. Accepted at CoNLL 2020

arXiv:2006.05787 [pdf, other]

Image Enhancement and Object Recognition for Night Vision Surveillance

Authors: Aashish Bhandari, Aayush Kafle, Pranjal Dhakal, Prateek Raj Joshi, Dinesh Baniya Kshatri

Abstract: Object recognition is a critical part of any surveillance system. It is the matter of utmost concern to identify intruders and foreign objects in the area where surveillance is done. The performance of surveillance system using the traditional camera in daylight is vastly superior as compared to night. The main problem for surveillance during the night is the objects captured by traditional camera… ▽ More Object recognition is a critical part of any surveillance system. It is the matter of utmost concern to identify intruders and foreign objects in the area where surveillance is done. The performance of surveillance system using the traditional camera in daylight is vastly superior as compared to night. The main problem for surveillance during the night is the objects captured by traditional cameras have low contrast against the background because of the absence of ambient light in the visible spectrum. Due to that reason, the image is taken in low light condition using an Infrared Camera and the image is enhanced to obtain an image with higher contrast using different enhancing algorithms based on the spatial domain. The enhanced image is then sent to the classification process. The classification is done by using convolutional neural network followed by a fully connected layer of neurons. The accuracy of classification after implementing different enhancement algorithms is compared in this paper. △ Less

Submitted 10 June, 2020; originally announced June 2020.

Comments: International Conference on Recent Trends in Computational Engineering and Technologies, 2018

arXiv:2004.09095 [pdf, other]

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Authors: Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury

Abstract: Language technologies contribute to promoting multilingualism and linguistic diversity around the world. However, only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and applications. In this paper we look at the relation between the types of languages, resources, and their representation in NLP conferences to understand th… ▽ More Language technologies contribute to promoting multilingualism and linguistic diversity around the world. However, only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and applications. In this paper we look at the relation between the types of languages, resources, and their representation in NLP conferences to understand the trajectory that different languages have followed over time. Our quantitative investigation underlines the disparity between languages, especially in terms of their resources, and calls into question the "language agnostic" status of current models and systems. Through this paper, we attempt to convince the ACL community to prioritise the resolution of the predicaments highlighted here, so that no language is left behind. △ Less

Submitted 26 January, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted at ACL 2020 (10 pages + 2 pages Appendix). P.J., S.S. and A.B. contributed equally

arXiv:1912.03457 [pdf, other]

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

Authors: Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

Abstract: In this paper, we examine and analyze the challenges associated with develo** and introducing language technologies to low-resource language communities. While doing so, we bring to light the successes and failures of past work in this area, challenges being faced in doing so, and what they have achieved. Throughout this paper, we take a problem-facing approach and describe essential factors whi… ▽ More In this paper, we examine and analyze the challenges associated with develo** and introducing language technologies to low-resource language communities. While doing so, we bring to light the successes and failures of past work in this area, challenges being faced in doing so, and what they have achieved. Throughout this paper, we take a problem-facing approach and describe essential factors which the success of such technologies hinges upon. We present the various aspects in a manner which clarify and lay out the different tasks involved, which can aid organizations looking to make an impact in this area. We take the example of Gondi, an extremely-low resource Indian language, to reinforce and complement our discussion. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: Accepted at ICON 2019; 9 pages

arXiv:1910.06859 [pdf]

Computational Psychology to Embed Emotions into News or Advertisements to Increase Reader Affinity

Authors: Hrishikesh Kulkarni, P Joshi, P Chande

Abstract: Readers take decisions about going through the complete news based on many factors. The emotional impact of the news title on reader is one of the most important factors. Cognitive ergonomics tries to strike the balance between work, product and environment with human needs and capabilities. The utmost need to integrate emotions in the news as well as advertisements cannot be denied. The idea is t… ▽ More Readers take decisions about going through the complete news based on many factors. The emotional impact of the news title on reader is one of the most important factors. Cognitive ergonomics tries to strike the balance between work, product and environment with human needs and capabilities. The utmost need to integrate emotions in the news as well as advertisements cannot be denied. The idea is that news or advertisement should be able to engage the reader on emotional and behavioral platform. While achieving this objective there is need to learn about reader behavior and use computational psychology while presenting as well as writing news or advertisements. This paper based on Machine Learning, tries to map behavior of the reader with the news/advertisements and also provide inputs for affective value for building personalized news or advertisements presentations. The affective value of the news is determined and news artifacts are mapped to reader. The algorithm suggests the most suitable news for readers while understanding emotional traits required for personalization. This work can be used to improve reader satisfaction through embedding emotions in the reading material and prioritizing news presentations. It can be used to map personal reading material range, personalized programs and ranking programs, advertisements with reference to individuals. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1908.04332 [pdf]

LSTM vs. GRU vs. Bidirectional RNN for script generation

Authors: Sanidhya Mangal, Poorva Joshi, Rahul Modak

Abstract: Scripts are an important part of any TV series. They narrate movements, actions and expressions of characters. In this paper, a case study is presented on how different sequence to sequence deep learning models perform in the task of generating new conversations between characters as well as new scenarios on the basis of a script (previous conversations). A comprehensive comparison between these m… ▽ More Scripts are an important part of any TV series. They narrate movements, actions and expressions of characters. In this paper, a case study is presented on how different sequence to sequence deep learning models perform in the task of generating new conversations between characters as well as new scenarios on the basis of a script (previous conversations). A comprehensive comparison between these models, namely, LSTM, GRU and Bidirectional RNN is presented. All the models are designed to learn the sequence of recurring characters from the input sequence. Each input sequence will contain, say "n" characters, and the corresponding targets will contain the same number of characters, except, they will be shifted one character to the right. In this manner, input and output sequences are generated and used to train the models. A closer analysis of explored models performance and efficiency is delineated with the help of graph plots and generated texts by taking some input string. These graphs describe both, intraneural performance and interneural model performance for each model. △ Less

Submitted 12 August, 2019; originally announced August 2019.

Comments: 7 pages, 7 figures

arXiv:1908.01080 [pdf]

doi 10.17148/IARJSET.2019.6508

LSTM Based Music Generation System

Authors: Sanidhya Mangal, Rahul Modak, Poorva Joshi

Abstract: Traditionally, music was treated as an analogue signal and was generated manually. In recent years, music is conspicuous to technology which can generate a suite of music automatically without any human intervention. To accomplish this task, we need to overcome some technical challenges which are discussed descriptively in this paper. A brief introduction about music and its components is provided… ▽ More Traditionally, music was treated as an analogue signal and was generated manually. In recent years, music is conspicuous to technology which can generate a suite of music automatically without any human intervention. To accomplish this task, we need to overcome some technical challenges which are discussed descriptively in this paper. A brief introduction about music and its components is provided in the paper along with the citation and analysis of related work accomplished by different authors in this domain. Main objective of this paper is to propose an algorithm which can be used to generate musical notes using Recurrent Neural Networks (RNN), principally Long Short-Term Memory (LSTM) networks. A model is designed to execute this algorithm where data is represented with the help of musical instrument digital interface (MIDI) file format for easier access and better understanding. Preprocessing of data before feeding it into the model, revealing methods to read, process and prepare MIDI files for input are also discussed. The model used in this paper is used to learn the sequences of polyphonic musical notes over a single-layered LSTM network. The model must have the potential to recall past details of a musical sequence and its structure for better learning. Description of layered architecture used in LSTM model and its intertwining connections to develop a neural network is presented in this work. This paper imparts a peek view of distributions of weights and biases in every layer of the model along with a precise representation of losses and accuracy at each step and batches. When the model was thoroughly analyzed, it produced stellar results in composing new melodies. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: 6 pages, 11 figures

Journal ref: IARJSET: Vol. 6, Issue 5 (2019) 47-54

arXiv:1902.05390 [pdf]

DeepIrisNet2: Learning Deep-IrisCodes from Scratch for Segmentation-Robust Visible Wavelength and Near Infrared Iris Recognition

Authors: Abhishek Gangwar, Akanksha Joshi, Padmaja Joshi, R. Raghavendra

Abstract: We first, introduce a deep learning based framework named as DeepIrisNet2 for visible spectrum and NIR Iris representation. The framework can work without classical iris normalization step or very accurate iris segmentation; allowing to work under non-ideal situation. The framework contains spatial transformer layers to handle deformation and supervision branches after certain intermediate layers… ▽ More We first, introduce a deep learning based framework named as DeepIrisNet2 for visible spectrum and NIR Iris representation. The framework can work without classical iris normalization step or very accurate iris segmentation; allowing to work under non-ideal situation. The framework contains spatial transformer layers to handle deformation and supervision branches after certain intermediate layers to mitigate overfitting. In addition, we present a dual CNN iris segmentation pipeline comprising of a iris/pupil bounding boxes detection network and a semantic pixel-wise segmentation network. Furthermore, to get compact templates, we present a strategy to generate binary iris codes using DeepIrisNet2. Since, no ground truth dataset are available for CNN training for iris segmentation, We build large scale hand labeled datasets and make them public; i) iris, pupil bounding boxes, ii) labeled iris texture. The networks are evaluated on challenging ND-IRIS-0405, UBIRIS.v2, MICHE-I, and CASIA v4 Interval datasets. Proposed approach significantly improves the state-of-the-art and achieve outstanding performance surpassing all previous methods. △ Less

Submitted 6 February, 2019; originally announced February 2019.

Comments: 10 pages, 4 Figures

arXiv:1804.06438 [pdf, other]

Vision Based Dynamic Offside Line Marker for Soccer Games

Authors: Karthik Muthuraman, Pranav Joshi, Suraj Kiran Raman

Abstract: Offside detection in soccer has emerged as one of the most important decisions with an average of 50 offside decisions every game. False detections and rash calls adversely affect game conditions and in many cases drastically change the outcome of the game. The human eye has finite precision and can only discern a limited amount of detail in a given instance. Current offside decisions are made man… ▽ More Offside detection in soccer has emerged as one of the most important decisions with an average of 50 offside decisions every game. False detections and rash calls adversely affect game conditions and in many cases drastically change the outcome of the game. The human eye has finite precision and can only discern a limited amount of detail in a given instance. Current offside decisions are made manually by sideline referees and tend to remain controversial in many games. This calls for automated offside detection techniques in order to assist accurate refereeing. In this work, we have explicitly used computer vision and image processing techniques like Hough transform, color similarity (quantization), graph connected components, and vanishing point ideas to identify the probable offside regions. Keywords: Hough transform, connected components, KLT tracking, color similarity. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1710.05363 [pdf]

Link Before You Share: Managing Privacy Policies through Blockchain

Authors: Agniva Banerjee, Karuna Pande Joshi

Abstract: With the advent of numerous online content providers, utilities and applications, each with their own specific version of privacy policies and its associated overhead, it is becoming increasingly difficult for concerned users to manage and track the confidential information that they share with the providers. Users consent to providers to gather and share their Personally Identifiable Information… ▽ More With the advent of numerous online content providers, utilities and applications, each with their own specific version of privacy policies and its associated overhead, it is becoming increasingly difficult for concerned users to manage and track the confidential information that they share with the providers. Users consent to providers to gather and share their Personally Identifiable Information (PII). We have developed a novel framework to automatically track details about how a users' PII data is stored, used and shared by the provider. We have integrated our Data Privacy ontology with the properties of blockchain, to develop an automated access control and audit mechanism that enforces users' data privacy policies when sharing their data across third parties. We have also validated this framework by implementing a working system LinkShare. In this paper, we describe our framework on detail along with the LinkShare system. Our approach can be adopted by Big Data users to automatically apply their privacy policy on data operations and track the flow of that data across various stakeholders. △ Less

Submitted 15 October, 2017; originally announced October 2017.

Comments: 10 pages, 6 figures, Published in: 4th International Workshop on Privacy and Security of Big Data (PSBD 2017) in conjunction with 2017 IEEE International Conference on Big Data (IEEE BigData 2017) December 14, 2017, Boston, MA, USA

arXiv:1609.05296 [pdf]

Development of a Fuzzy Expert System based Liveliness Detection Scheme for Biometric Authentication

Authors: Avinash Kumar Singh, Piyush Joshi, G C Nandi

Abstract: Liveliness detection acts as a safe guard against spoofing attacks. Most of the researchers used vision based techniques to detect liveliness of the user, but they are highly sensitive to illumination effects. Therefore it is very hard to design a system, which will work robustly under all circumstances. Literature shows that most of the research utilize eye blink or mouth movement to detect the l… ▽ More Liveliness detection acts as a safe guard against spoofing attacks. Most of the researchers used vision based techniques to detect liveliness of the user, but they are highly sensitive to illumination effects. Therefore it is very hard to design a system, which will work robustly under all circumstances. Literature shows that most of the research utilize eye blink or mouth movement to detect the liveliness, while the other group used face texture to distinguish between real and imposter. The classification results of all these approaches decreases drastically in variable light conditions. Hence in this paper we are introducing fuzzy expert system which is sufficient enough to handle most of the cases comes in real time. We have used two testing parameters, (a) under bad illumination and (b) less movement in eyes and mouth in case of real user to evaluate the performance of the system. The system is behaving well in all, while in first case its False Rejection Rate (FRR) is 0.28, and in second case its FRR is 0.4. △ Less

Submitted 17 September, 2016; originally announced September 2016.

arXiv:1512.01568 [pdf, other]

doi 10.1007/978-3-319-21024-7_14

Hybrid Approach for Inductive Semi Supervised Learning using Label Propagation and Support Vector Machine

Authors: Aruna Govada, Pravin Joshi, Sahil Mittal, Sanjay K Sahay

Abstract: Semi supervised learning methods have gained importance in today's world because of large expenses and time involved in labeling the unlabeled data by human experts. The proposed hybrid approach uses SVM and Label Propagation to label the unlabeled data. In the process, at each step SVM is trained to minimize the error and thus improve the prediction quality. Experiments are conducted by using SVM… ▽ More Semi supervised learning methods have gained importance in today's world because of large expenses and time involved in labeling the unlabeled data by human experts. The proposed hybrid approach uses SVM and Label Propagation to label the unlabeled data. In the process, at each step SVM is trained to minimize the error and thus improve the prediction quality. Experiments are conducted by using SVM and logistic regression(Logreg). Results prove that SVM performs tremendously better than Logreg. The approach is tested using 12 datasets of different sizes ranging from the order of 1000s to the order of 10000s. Results show that the proposed approach outperforms Label Propagation by a large margin with F-measure of almost twice on average. The parallel version of the proposed approach is also designed and implemented, the analysis shows that the training time decreases significantly when parallel version is used. △ Less

Submitted 2 December, 2015; originally announced December 2015.

Comments: Presented in the 11th International Conference, MLDM, Germany, July 20 - 21, 2015. Springer, Machine Learning and Data Mining in Pattern Recognition, LNAI Vol. 9166, p. 199-213, 2015

arXiv:1508.04131 [pdf]

Contextually proximate approach to develop smart user interface

Authors: Pushkar Joshi

Abstract: Researchers and experts are taking efforts in delivering an optimal user experience from a long time. Computer interfaces are being developed to keep user 'in the flow' as well as for making users more connected to the real world wile using virtual environment. Develo** ubiquitous user interfaces for novices and experts at the same time is crucial work for interaction designers. This paper molds… ▽ More Researchers and experts are taking efforts in delivering an optimal user experience from a long time. Computer interfaces are being developed to keep user 'in the flow' as well as for making users more connected to the real world wile using virtual environment. Develo** ubiquitous user interfaces for novices and experts at the same time is crucial work for interaction designers. This paper molds the designing approach of user interfaces in bit different parameters by reviewing the existing literature and proposing a different way to develop a smart user interface to make user more familiar with the design and to keep user 'in the flow'. Contextually proximate approach (CPA) will help users to minimize their feeling of insecurity as designing process includes local resources of users to develop the user interfaces. These various resources and parameters are explained further in the paper by giving different examples. △ Less

Submitted 17 August, 2015; originally announced August 2015.

Comments: 4 pages 2 figures

Showing 1–40 of 40 results for author: Joshi, P