Search | arXiv e-print repository

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.13334 [pdf, other]

Transparency and Privacy: The Role of Explainable AI and Federated Learning in Financial Fraud Detection

Authors: Tomisin Awosika, Raj Mani Shukla, Bernardi Pranggono

Abstract: Fraudulent transactions and how to detect them remain a significant problem for financial institutions around the world. The need for advanced fraud detection systems to safeguard assets and maintain customer trust is paramount for financial institutions, but some factors make the development of effective and efficient fraud detection systems a challenge. One of such factors is the fact that fraud… ▽ More Fraudulent transactions and how to detect them remain a significant problem for financial institutions around the world. The need for advanced fraud detection systems to safeguard assets and maintain customer trust is paramount for financial institutions, but some factors make the development of effective and efficient fraud detection systems a challenge. One of such factors is the fact that fraudulent transactions are rare and that many transaction datasets are imbalanced; that is, there are fewer significant samples of fraudulent transactions than legitimate ones. This data imbalance can affect the performance or reliability of the fraud detection model. Moreover, due to the data privacy laws that all financial institutions are subject to follow, sharing customer data to facilitate a higher-performing centralized model is impossible. Furthermore, the fraud detection technique should be transparent so that it does not affect the user experience. Hence, this research introduces a novel approach using Federated Learning (FL) and Explainable AI (XAI) to address these challenges. FL enables financial institutions to collaboratively train a model to detect fraudulent transactions without directly sharing customer data, thereby preserving data privacy and confidentiality. Meanwhile, the integration of XAI ensures that the predictions made by the model can be understood and interpreted by human experts, adding a layer of transparency and trust to the system. Experimental results, based on realistic transaction datasets, reveal that the FL-based fraud detection system consistently demonstrates high performance metrics. This study grounds FL's potential as an effective and privacy-preserving tool in the fight against fraud. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Paper submitted to a journal for review

arXiv:2311.08621 [pdf, other]

Cross Device Federated Intrusion Detector for Early Stage Botnet Propagation in IoT

Authors: Angela Grace Famera, Raj Mani Shukla, Suman Bhunia

Abstract: A botnet is an army of zombified computers infected with malware and controlled by malicious actors to carry out tasks such as Distributed Denial of Service (DDoS) attacks. Billions of Internet of Things (IoT) devices are primarily targeted to be infected as bots since they are configured with weak credentials or contain common vulnerabilities. Detecting botnet propagation by monitoring the networ… ▽ More A botnet is an army of zombified computers infected with malware and controlled by malicious actors to carry out tasks such as Distributed Denial of Service (DDoS) attacks. Billions of Internet of Things (IoT) devices are primarily targeted to be infected as bots since they are configured with weak credentials or contain common vulnerabilities. Detecting botnet propagation by monitoring the network traffic is difficult as they easily blend in with regular network traffic. The traditional machine learning (ML) based Intrusion Detection System (IDS) requires the raw data to be captured and sent to the ML processor to detect intrusion. In this research, we examine the viability of a cross-device federated intrusion detection mechanism where each device runs the ML model on its data and updates the model weights to the central coordinator. This mechanism ensures the client's data is not shared with any third party, terminating privacy leakage. The model examines each data packet separately and predicts anomalies. We evaluate our proposed mechanism on a real botnet propagation dataset called MedBIoT. Overall, the proposed method produces an average accuracy of 71%, precision 78%, recall 71%, and f1-score 68%. In addition, we also examined whether any device taking part in federated learning can employ a poisoning attack on the overall system. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: Paper submitted to conference

arXiv:2310.18953 [pdf, other]

TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression

Authors: Megh Shukla, Mathieu Salzmann, Alexandre Alahi

Abstract: Deep heteroscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. However, recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. While the literature addresses this by proposing alternate formulations to mitigate the impact of the predicted cov… ▽ More Deep heteroscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. However, recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. While the literature addresses this by proposing alternate formulations to mitigate the impact of the predicted covariance, we focus on improving the predicted covariance itself. We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean? (2) In the absence of supervision, how can we quantify the accuracy of covariance estimation? We address (1) with a Taylor Induced Covariance (TIC), which captures the randomness of the predicted mean by incorporating its gradient and curvature through the second order Taylor polynomial. Furthermore, we tackle (2) by introducing a Task Agnostic Correlations (TAC) metric, which combines the notion of correlations and absolute error to evaluate the covariance. We evaluate TIC-TAC across multiple experiments spanning synthetic and real-world datasets. Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood. Our code is available at https://github.com/vita-epfl/TIC-TAC △ Less

Submitted 31 May, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: ICML 2024. Please feel free to provide feedback!

arXiv:2310.07380 [pdf, other]

Histopathological Image Classification and Vulnerability Analysis using Federated Learning

Authors: Sankalp Vyas, Amar Nath Patra, Raj Mani Shukla

Abstract: Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue… ▽ More Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices. However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flip** increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted in IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

arXiv:2310.07354 [pdf, other]

Give and Take: Federated Transfer Learning for Industrial IoT Network Intrusion Detection

Authors: Lochana Telugu Rajesh, Tapadhir Das, Raj Mani Shukla, Shamik Sengupta

Abstract: The rapid growth in Internet of Things (IoT) technology has become an integral part of today's industries forming the Industrial IoT (IIoT) initiative, where industries are leveraging IoT to improve communication and connectivity via emerging solutions like data analytics and cloud computing. Unfortunately, the rapid use of IoT has made it an attractive target for cybercriminals. Therefore, protec… ▽ More The rapid growth in Internet of Things (IoT) technology has become an integral part of today's industries forming the Industrial IoT (IIoT) initiative, where industries are leveraging IoT to improve communication and connectivity via emerging solutions like data analytics and cloud computing. Unfortunately, the rapid use of IoT has made it an attractive target for cybercriminals. Therefore, protecting these systems is of utmost importance. In this paper, we propose a federated transfer learning (FTL) approach to perform IIoT network intrusion detection. As part of the research, we also propose a combinational neural network as the centerpiece for performing FTL. The proposed technique splits IoT data between the client and server devices to generate corresponding models, and the weights of the client models are combined to update the server model. Results showcase high performance for the FTL setup between iterations on both the IIoT clients and the server. Additionally, the proposed FTL setup achieves better overall performance than contemporary machine learning algorithms at performing network intrusion detection. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted in IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

arXiv:2310.04610 [pdf, other]

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research. △ Less

Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2307.08327 [pdf, other]

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Authors: Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak

Abstract: Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has show… ▽ More Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.04777 [pdf, other]

MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment

Authors: Manan Shukla, Oshani Seneviratne

Abstract: Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized… ▽ More Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized learning mechanism that combines transfer and federated machine learning concepts using smart contracts, allowing data to remain on users' devices and enabling effective tracking of mental health conditions for psychiatric treatment and management in a privacy-aware and accountable manner. We evaluate our model using a popular mental health dataset that demonstrates promising results. By utilizing connected health systems and machine learning models, our approach offers a novel solution to the challenge of providing psychiatrists with further insight into their patients' mental health outside of traditional office visits. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: Accepted at AMIA 2023 Annual Symposium

arXiv:2307.03197 [pdf, ps, other]

Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks

Authors: Aysha Thahsin Zahir Ismail, Raj Mani Shukla

Abstract: Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest… ▽ More Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2305.12528 [pdf, other]

IR Models and the COVID-19 Pandemic: A Comparative Study of Performance and Challenges

Authors: Moksh Shukla, Nitik Jain, Shubham Gupta

Abstract: This research study investigates the efficiency of different information retrieval (IR) systems in accessing relevant information from the scientific literature during the COVID-19 pandemic. The study applies the TREC framework to the COVID-19 Open Research Dataset (CORD-19) and evaluates BM25, Contriever, and Bag of Embeddings IR frameworks. The objective is to build a test collection for search… ▽ More This research study investigates the efficiency of different information retrieval (IR) systems in accessing relevant information from the scientific literature during the COVID-19 pandemic. The study applies the TREC framework to the COVID-19 Open Research Dataset (CORD-19) and evaluates BM25, Contriever, and Bag of Embeddings IR frameworks. The objective is to build a test collection for search engines that tackle the complex information landscape during a pandemic. The study uses the CORD-19 dataset to train and evaluate the IR models and compares the results to those manually labeled in the TREC-COVID IR Challenge. The results indicate that advanced IR models like BERT and Contriever better retrieve relevant information during a pandemic. However, the study also highlights the challenges in processing large datasets and the need for strategies to focus on abstracts or summaries. Overall, the research highlights the importance of effectively tailored IR systems in dealing with information overload during crises like COVID-19 and can guide future research and development in this field. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 7 pages, 2 figures

arXiv:2305.07161 [pdf, other]

A Deep Learning-based Compression and Classification Technique for Whole Slide Histopathology Images

Authors: Agnes Barsi, Suvendu Chandan Nayak, Sasmita Parida, Raj Mani Shukla

Abstract: This paper presents an autoencoder-based neural network architecture to compress histopathological images while retaining the denser and more meaningful representation of the original images. Current research into improving compression algorithms is focused on methods allowing lower compression rates for Regions of Interest (ROI-based approaches). Neural networks are great at extracting meaningful… ▽ More This paper presents an autoencoder-based neural network architecture to compress histopathological images while retaining the denser and more meaningful representation of the original images. Current research into improving compression algorithms is focused on methods allowing lower compression rates for Regions of Interest (ROI-based approaches). Neural networks are great at extracting meaningful semantic representations from images, therefore are able to select the regions to be considered of interest for the compression process. In this work, we focus on the compression of whole slide histopathology images. The objective is to build an ensemble of neural networks that enables a compressive autoencoder in a supervised fashion to retain a denser and more meaningful representation of the input histology images. Our proposed system is a simple and novel method to supervise compressive neural networks. We test the compressed images using transfer learning-based classifiers and show that they provide promising accuracy and classification performance. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2210.06028 [pdf, other]

VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose Estimation

Authors: Megh Shukla, Roshan Roy, Pankaj Singh, Shuaib Ahmed, Alexandre Alahi

Abstract: Advances in computing have enabled widespread access to pose estimation, creating new sources of data streams. Unlike mock set-ups for data collection, tap** into these data streams through on-device active learning allows us to directly sample from the real world to improve the spread of the training distribution. However, on-device computing power is limited, implying that any candidate active… ▽ More Advances in computing have enabled widespread access to pose estimation, creating new sources of data streams. Unlike mock set-ups for data collection, tap** into these data streams through on-device active learning allows us to directly sample from the real world to improve the spread of the training distribution. However, on-device computing power is limited, implying that any candidate active learning algorithm should have a low compute footprint while also being reliable. Although multiple algorithms cater to pose estimation, they either use extensive compute to power state-of-the-art results or are not competitive in low-resource settings. We address this limitation with VL4Pose (Visual Likelihood For Pose Estimation), a first principles approach for active learning through out-of-distribution detection. We begin with a simple premise: pose estimators often predict incoherent poses for out-of-distribution samples. Hence, can we identify a distribution of poses the model has been trained on, to identify incoherent poses the model is unsure of? Our solution involves modelling the pose through a simple parametric Bayesian network trained via maximum likelihood estimation. Therefore, poses incurring a low likelihood within our framework are out-of-distribution samples making them suitable candidates for annotation. We also observe two useful side-outcomes: VL4Pose in-principle yields better uncertainty estimates by unifying joint and pose level ambiguity, as well as the unintentional but welcome ability of VL4Pose to perform pose refinement in limited scenarios. We perform qualitative and quantitative experiments on three datasets: MPII, LSP and ICVL, spanning human and hand pose estimation. Finally, we note that VL4Pose is simple, computationally inexpensive and competitive, making it suitable for challenging tasks such as on-device active learning. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted: BMVC 2022

arXiv:2110.10123 [pdf, other]

BlockIoT: Blockchain-based Health Data Integration using IoT Devices

Authors: Manan Shukla, Jian**g Lin, Oshani Seneviratne

Abstract: The development and adoption of Electronic Health Records (EHR) and health monitoring Internet of Things (IoT) Devices have enabled digitization of patient records and has also substantially transformed the healthcare delivery system in aspects such as remote patient monitoring, healthcare decision making, and medical research. However, data tends to be fragmented among health infrastructures and… ▽ More The development and adoption of Electronic Health Records (EHR) and health monitoring Internet of Things (IoT) Devices have enabled digitization of patient records and has also substantially transformed the healthcare delivery system in aspects such as remote patient monitoring, healthcare decision making, and medical research. However, data tends to be fragmented among health infrastructures and prevents interoperability of medical data at the point of care. In order to address this gap, we introduce BlockIoT that uses blockchain technology to transfer previously inaccessible and centralized data from medical devices to EHR systems, which provides greater insight to providers who can, in turn, provide better outcomes for patients. This notion of interoperability of medical device data is possible through an Application Programming Interface (API), which serves as a versatile endpoint for all incoming medical device data, a distributed file system that ensures data resilience, and knowledge templates that analyze, identify, and represent medical device data to providers. Our participatory design survey on BlockIoT demonstrates that BlockIoT is a suitable system to supplement physicians' clinical practice and increases efficiency in most healthcare specialties, including cardiology, pulmonology, endocrinology, and primary care. △ Less

Submitted 19 October, 2021; originally announced October 2021.

arXiv:2108.03760 [pdf]

Symptom based Hierarchical Classification of Diabetes and Thyroid disorders using Fuzzy Cognitive Maps

Authors: Anand M. Shukla, Pooja D. Pandit, Vasudev M. Purandare, Anuradha Srinivasaraghavan

Abstract: Fuzzy Cognitive Maps (FCMs) are soft computing technique that follows an approach similar to human reasoning and human decision-making process, making them a valuable modeling and simulation methodology. Medical Decision Systems are complex systems consisting of many factors that may be complementary, contradictory, and competitive; these factors influence each other and determine the overall diag… ▽ More Fuzzy Cognitive Maps (FCMs) are soft computing technique that follows an approach similar to human reasoning and human decision-making process, making them a valuable modeling and simulation methodology. Medical Decision Systems are complex systems consisting of many factors that may be complementary, contradictory, and competitive; these factors influence each other and determine the overall diagnosis with a different degree. Thus, FCMs are suitable to model Medical Decision Support Systems. The proposed work therefore uses FCMs arranged in hierarchical structure to classify between Diabetes, Thyroid disorders and their subtypes. Subtypes include type 1 and type 2 for diabetes and hyperthyroidism and hypothyroidism for thyroid. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2105.14803 [pdf, other]

Gradient-based Data Subversion Attack Against Binary Classifiers

Authors: Rosni K Vasu, Sanjay Seetharaman, Shubham Malaviya, Manish Shukla, Sachin Lodha

Abstract: Machine learning based data-driven technologies have shown impressive performances in a variety of application domains. Most enterprises use data from multiple sources to provide quality applications. The reliability of the external data sources raises concerns for the security of the machine learning techniques adopted. An attacker can tamper the training or test datasets to subvert the predictio… ▽ More Machine learning based data-driven technologies have shown impressive performances in a variety of application domains. Most enterprises use data from multiple sources to provide quality applications. The reliability of the external data sources raises concerns for the security of the machine learning techniques adopted. An attacker can tamper the training or test datasets to subvert the predictions of models generated by these techniques. Data poisoning is one such attack wherein the attacker tries to degrade the performance of a classifier by manipulating the training data. In this work, we focus on label contamination attack in which an attacker poisons the labels of data to compromise the functionality of the system. We develop Gradient-based Data Subversion strategies to achieve model degradation under the assumption that the attacker has limited-knowledge of the victim model. We exploit the gradients of a differentiable convex loss function (residual errors) with respect to the predicted label as a warm-start and formulate different strategies to find a set of data instances to contaminate. Further, we analyze the transferability of attacks and the susceptibility of binary classifiers. Our experiments show that the proposed approach outperforms the baselines and is computationally efficient. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 26 pages, 3 Figures, 8 tables, adversarial attacks, data poisoning attacks, label contamination, transferability of attack, susceptibility

arXiv:2105.03204 [pdf]

doi 10.1109/ICCIC.2013.6724156

Computational Intelligence based Intrusion Detection Systems for Wireless Communication

Authors: Abhishek Gupta, Om Jee Pandey, Mahendra Shukla, Anjali Dadhich, Samar Mathur, Anup Ingle

Abstract: The emerging trend of ubiquitous and pervasive computing aims at embedding everyday devices such as wristwatches, smart phones, home video systems, autofocus cameras, intelligent vehicles, musical instruments, kitchen appliances etc. with microprocessors and imparts them with wireless communication capability. This advanced computing paradigm, also known as the Internet of Things or cyber-physical… ▽ More The emerging trend of ubiquitous and pervasive computing aims at embedding everyday devices such as wristwatches, smart phones, home video systems, autofocus cameras, intelligent vehicles, musical instruments, kitchen appliances etc. with microprocessors and imparts them with wireless communication capability. This advanced computing paradigm, also known as the Internet of Things or cyber-physical computing, leads internet and computing to appear everywhere and anywhere using any device and location. With maximum appreciation and due regards to the evolutionary arc, depth and scope of ceaseless internet utilities, it is equally necessary to envisage the security and data confidentiality challenges posed by the free and ubiquitous availability of internet. This paper analyses the role of computational intelligence techniques to design adaptive and cognitive intrusion detection systems that can efficiently detect malicious network activities and proposes novel three-tier architecture for designing intelligent intrusion detection systems for wireless networks. △ Less

Submitted 22 April, 2021; originally announced May 2021.

arXiv:2104.13230 [pdf, other]

Influence Based Defense Against Data Poisoning Attacks in Online Learning

Authors: Sanjay Seetharaman, Shubham Malaviya, Rosni KV, Manish Shukla, Sachin Lodha

Abstract: Data poisoning is a type of adversarial attack on training data where an attacker manipulates a fraction of data to degrade the performance of machine learning model. Therefore, applications that rely on external data-sources for training data are at a significantly higher risk. There are several known defensive mechanisms that can help in mitigating the threat from such attacks. For example, data… ▽ More Data poisoning is a type of adversarial attack on training data where an attacker manipulates a fraction of data to degrade the performance of machine learning model. Therefore, applications that rely on external data-sources for training data are at a significantly higher risk. There are several known defensive mechanisms that can help in mitigating the threat from such attacks. For example, data sanitization is a popular defensive mechanism wherein the learner rejects those data points that are sufficiently far from the set of training instances. Prior work on data poisoning defense primarily focused on offline setting, wherein all the data is assumed to be available for analysis. Defensive measures for online learning, where data points arrive sequentially, have not garnered similar interest. In this work, we propose a defense mechanism to minimize the degradation caused by the poisoned training data on a learner's model in an online setup. Our proposed method utilizes an influence function which is a classic technique in robust statistics. Further, we supplement it with the existing data sanitization methods for filtering out some of the poisoned data points. We study the effectiveness of our defense mechanism on multiple datasets and across multiple attack strategies against an online learner. △ Less

Submitted 24 April, 2021; originally announced April 2021.

Comments: 18 pages, 3 Figures, 2 Tables, Adversarial Machine Learning, Data Poisoning, Online Learning, Defense, Influence Function

arXiv:2104.09493 [pdf, other]

doi 10.1109/WACV51458.2022.00208

Bayesian Uncertainty and Expected Gradient Length -- Regression: Two Sides Of The Same Coin?

Authors: Megh Shukla

Abstract: Active learning algorithms select a subset of data for annotation to maximize the model performance on a budget. One such algorithm is Expected Gradient Length, which as the name suggests uses the approximate gradient induced per example in the sampling process. While Expected Gradient Length has been successfully used for classification and regression, the formulation for regression remains intui… ▽ More Active learning algorithms select a subset of data for annotation to maximize the model performance on a budget. One such algorithm is Expected Gradient Length, which as the name suggests uses the approximate gradient induced per example in the sampling process. While Expected Gradient Length has been successfully used for classification and regression, the formulation for regression remains intuitively driven. Hence, our theoretical contribution involves deriving this formulation, thereby supporting the experimental evidence. Subsequently, we show that expected gradient length in regression is equivalent to Bayesian uncertainty. If certain assumptions are infeasible, our algorithmic contribution (EGL++) approximates the effect of ensembles with a single deterministic network. Instead of computing multiple possible inferences per input, we leverage previously annotated samples to quantify the probability of previous labels being the true label. Such an approach allows us to extend expected gradient length to a new task: human pose estimation. We perform experimental validation on two human pose datasets (MPII and LSP/LSPET), highlighting the interpretability and competitiveness of EGL++ with different active learning algorithms for human pose estimation. △ Less

Submitted 22 October, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: Accepted: WACV 2022, Algorithms track

arXiv:2104.09315 [pdf, other]

doi 10.1109/CVPRW53098.2021.00370

A Mathematical Analysis of Learning Loss for Active Learning in Regression

Authors: Megh Shukla, Shuaib Ahmed

Abstract: Active learning continues to remain significant in the industry since it is data efficient. Not only is it cost effective on a constrained budget, continuous refinement of the model allows for early detection and resolution of failure scenarios during the model development stage. Identifying and fixing failures with the model is crucial as industrial applications demand that the underlying model p… ▽ More Active learning continues to remain significant in the industry since it is data efficient. Not only is it cost effective on a constrained budget, continuous refinement of the model allows for early detection and resolution of failure scenarios during the model development stage. Identifying and fixing failures with the model is crucial as industrial applications demand that the underlying model performs accurately in all foreseeable use cases. One popular state-of-the-art technique that specializes in continuously refining the model via failure identification is Learning Loss. Although simple and elegant, this approach is empirically motivated. Our paper develops a foundation for Learning Loss which enables us to propose a novel modification we call LearningLoss++. We show that gradients are crucial in interpreting how Learning Loss works, with rigorous analysis and comparison of the gradients between Learning Loss and LearningLoss++. We also propose a convolutional architecture that combines features at different scales to predict the loss. We validate LearningLoss++ for regression on the task of human pose estimation (using MPII and LSP datasets), as done in Learning Loss. We show that LearningLoss++ outperforms in identifying scenarios where the model is likely to perform poorly, which on model refinement translates into reliable performance in the open world. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: Accepted: 2021 IEEE CVPR Workshop on Fair, Data Efficient and Trusted Computer Vision

arXiv:2011.12466 [pdf, other]

Learning Curves for Drug Response Prediction in Cancer Cell Lines

Authors: Alexander Partin, Thomas Brettin, Yvonne A. Evrard, Yitan Zhu, Hyunseung Yoo, Fangfang Xia, Songhao Jiang, Austin Clyde, Maulik Shukla, Michael Fonstein, James H. Doroshow, Rick Stevens

Abstract: Motivated by the size of cell line drug sensitivity data, researchers have been develo** machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves fo… ▽ More Motivated by the size of cell line drug sensitivity data, researchers have been develo** machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these predictors. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, suggesting that the shape of these curves depends on the unique model-dataset pair. The multi-input NN (mNN), in which gene expressions and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training sizes for two of the datasets, whereas the mNN performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate predictors, providing a broader perspective on the overall data scaling characteristics. The fitted power law curves provide a forward-looking performance metric and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: 14 pages, 7 figures

arXiv:2005.10634 [pdf, other]

A Note on Cryptographic Algorithms for Private Data Analysis in Contact Tracing Applications

Authors: Rajan M A, Manish Shukla, Sachin Lodha

Abstract: Contact tracing is an important measure to counter the COVID-19 pandemic. In the early phase, many countries employed manual contact tracing to contain the rate of disease spread, however it has many issues. The manual approach is cumbersome, time consuming and also requires active participation of a large number of people to realize it. In order to overcome these drawbacks, digital contact tracin… ▽ More Contact tracing is an important measure to counter the COVID-19 pandemic. In the early phase, many countries employed manual contact tracing to contain the rate of disease spread, however it has many issues. The manual approach is cumbersome, time consuming and also requires active participation of a large number of people to realize it. In order to overcome these drawbacks, digital contact tracing has been proposed that typically involves deploying a contact tracing application on people's mobile devices which can track their movements and close social interactions. While studies suggest that digital contact tracing is more effective than manual contact tracing, it has been observed that higher adoption rates of the contact tracing app may result in a better controlled epidemic. This also increases the confidence in the accuracy of the collected data and the subsequent analytics. One key reason for low adoption rate of contact tracing applications is the concern about individual privacy. In fact, several studies report that contact tracing applications deployed in multiple countries are not privacy friendly and have potential to be used for mass surveillance by the concerned governments. Hence, privacy respecting contact tracing application is the need of the hour that can lead to highly effective, efficient contact tracing. As part of this study, we focus on various cryptographic techniques that can help in addressing the Private Set Intersection problem which lies at the heart of privacy respecting contact tracing. We analyze the computation and communication complexities of these techniques under the typical client-server architecture utilized by contact tracing applications. Further we evaluate those computation and communication complexity expressions for India scenario and thus identify cryptographic techniques that can be more suitably deployed there. △ Less

Submitted 19 May, 2020; originally announced May 2020.

Comments: 12 Pages, 3 Figures

arXiv:2005.09572 [pdf]

Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug Response

Authors: Yitan Zhu, Thomas Brettin, Yvonne A. Evrard, Alexander Partin, Fangfang Xia, Maulik Shukla, Hyunseung Yoo, James H. Doroshow, Rick Stevens

Abstract: Transfer learning has been shown to be effective in many applications in which training data for the target problem are limited but data for a related (source) problem are abundant. In this paper, we apply transfer learning to the prediction of anti-cancer drug response. Previous transfer learning studies for drug response prediction focused on building models that predict the response of tumor ce… ▽ More Transfer learning has been shown to be effective in many applications in which training data for the target problem are limited but data for a related (source) problem are abundant. In this paper, we apply transfer learning to the prediction of anti-cancer drug response. Previous transfer learning studies for drug response prediction focused on building models that predict the response of tumor cells to a specific drug treatment. We target the more challenging task of building general prediction models that can make predictions for both new tumor cells and new drugs. We apply the classic transfer learning framework that trains a prediction model on the source dataset and refines it on the target dataset, and extends the framework through ensemble. The ensemble transfer learning pipeline is implemented using LightGBM and two deep neural network (DNN) models with different architectures. Uniquely, we investigate its power for three application settings including drug repurposing, precision oncology, and new drug development, through different data partition schemes in cross-validation. We test the proposed ensemble transfer learning on benchmark in vitro drug screening datasets, taking one dataset as the source domain and another dataset as the target domain. The analysis results demonstrate the benefit of applying ensemble transfer learning for predicting anti-cancer drug response in all three applications with both LightGBM and DNN models. Compared between the different prediction models, a DNN model with two subnetworks for the inputs of tumor features and drug features separately outperforms LightGBM and the other DNN model that concatenates tumor features and drug features for input in the drug repurposing and precision oncology applications. In the more challenging application of new drug development, LightGBM performs better than the other two DNN models. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2004.13328 [pdf, other]

Privacy Guidelines for Contact Tracing Applications

Authors: Manish Shukla, Rajan M A, Sachin Lodha, Gautam Shroff, Ramesh Raskar

Abstract: Contact tracing is a very powerful method to implement and enforce social distancing to avoid spreading of infectious diseases. The traditional approach of contact tracing is time consuming, manpower intensive, dangerous and prone to error due to fatigue or lack of skill. Due to this there is an emergence of mobile based applications for contact tracing. These applications primarily utilize a comb… ▽ More Contact tracing is a very powerful method to implement and enforce social distancing to avoid spreading of infectious diseases. The traditional approach of contact tracing is time consuming, manpower intensive, dangerous and prone to error due to fatigue or lack of skill. Due to this there is an emergence of mobile based applications for contact tracing. These applications primarily utilize a combination of GPS based absolute location and Bluetooth based relative location remitted from user's smartphone to infer various insights. These applications have eased the task of contact tracing; however, they also have severe implication on user's privacy, for example, mass surveillance, personal information leakage and additionally revealing the behavioral patterns of the user. This impact on user's privacy leads to trust deficit in these applications, and hence defeats their purpose. In this work we discuss the various scenarios which a contact tracing application should be able to handle. We highlight the privacy handling of some of the prominent contact tracing applications. Additionally, we describe the various threat actors who can disrupt its working, or misuse end user's data, or hamper its mass adoption. Finally, we present privacy guidelines for contact tracing applications from different stakeholder's perspective. To best of our knowledge, this is the first generic work which provides privacy guidelines for contact tracing applications. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: 10 pages, 0 images

arXiv:2003.13311 [pdf, other]

doi 10.1016/j.comcom.2020.03.017

Unmanned Aerial Vehicle for Internet of Everything: Opportunities and Challenges

Authors: Yalin Liu, Hong-Ning Dai, Qubeijian Wang, Mahendra K. Shukla, Muhammad Imran

Abstract: The recent advances in information and communication technology (ICT) have further extended Internet of Things (IoT) from the sole "things" aspect to the omnipotent role of "intelligent connection of things". Meanwhile, the concept of internet of everything (IoE) is presented as such an omnipotent extension of IoT. However, the IoE realization meets critical challenges including the restricted net… ▽ More The recent advances in information and communication technology (ICT) have further extended Internet of Things (IoT) from the sole "things" aspect to the omnipotent role of "intelligent connection of things". Meanwhile, the concept of internet of everything (IoE) is presented as such an omnipotent extension of IoT. However, the IoE realization meets critical challenges including the restricted network coverage and the limited resource of existing network technologies. Recently, Unmanned Aerial Vehicles (UAVs) have attracted significant attentions attributed to their high mobility, low cost, and flexible deployment. Thus, UAVs may potentially overcome the challenges of IoE. This article presents a comprehensive survey on opportunities and challenges of UAV-enabled IoE. We first present three critical expectations of IoE: 1) scalability requiring a scalable network architecture with ubiquitous coverage, 2) intelligence requiring a global computing plane enabling intelligent things, 3) diversity requiring provisions of diverse applications. Thereafter, we review the enabling technologies to achieve these expectations and discuss four intrinsic constraints of IoE (i.e., coverage constraint, battery constraint, computing constraint, and security issues). We then present an overview of UAVs. We next discuss the opportunities brought by UAV to IoE. Additionally, we introduce a UAV-enabled IoE (Ue-IoE) solution by exploiting UAVs's mobility, in which we show that Ue-IoE can greatly enhance the scalability, intelligence and diversity of IoE. Finally, we outline the future directions in Ue-IoE. △ Less

Submitted 12 April, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: 21 pages, 9 figures

Journal ref: Computer Communications, 2020

arXiv:1910.08790 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053924

LEt-SNE: A Hybrid Approach To Data Embedding and Visualization of Hyperspectral Imagery

Authors: Megh Shukla, Biplab Banerjee, Krishna Mohan Buddhiraju

Abstract: Hyperspectral Imagery (and Remote Sensing in general) captured from UAVs or satellites are highly voluminous in nature due to the large spatial extent and wavelengths captured by them. Since analyzing these images requires a huge amount of computational time and power, various dimensionality reduction techniques have been used for feature reduction. Some popular techniques among these falter when… ▽ More Hyperspectral Imagery (and Remote Sensing in general) captured from UAVs or satellites are highly voluminous in nature due to the large spatial extent and wavelengths captured by them. Since analyzing these images requires a huge amount of computational time and power, various dimensionality reduction techniques have been used for feature reduction. Some popular techniques among these falter when applied to Hyperspectral Imagery due to the famed curse of dimensionality. In this paper, we propose a novel approach, LEt-SNE, which combines graph based algorithms like t-SNE and Laplacian Eigenmaps into a model parameterized by a shallow feed forward network. We introduce a new term, Compression Factor, that enables our method to combat the curse of dimensionality. The proposed algorithm is suitable for manifold visualization and sample clustering with labelled or unlabelled data. We demonstrate that our method is competitive with current state-of-the-art methods on hyperspectral remote sensing datasets in public domain. △ Less

Submitted 8 February, 2020; v1 submitted 19 October, 2019; originally announced October 2019.

Comments: Accepted, ICASSP 2020

Showing 1–26 of 26 results for author: Shukla, M