Search | arXiv e-print repository

Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks

Authors: Abeer Banerjee, Naval K. Mehta, Shyam S. Prasad, Himanshu, Sumeet Saurav, Sanjay Singh

Abstract: In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding metho… ▽ More In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2305.19400 [pdf, other]

Automating GPU Scalability for Complex Scientific Models: Phonon Boltzman Transport Equation

Authors: Eric Heisler, Siddharth Saurav, Aadesh Deshmukh, Sandip Mazumder, Ponnuswamy Sadayappan, Hari Sundar

Abstract: Heterogeneous computing environments combining CPU and GPU resources provide a great boost to large-scale scientific computing applications. Code generation utilities that partition the work into CPU and GPU tasks while considering data movement costs allow researchers to more quickly and easily develop high-performance solutions, and make these resources accessible to a larger user base. We pre… ▽ More Heterogeneous computing environments combining CPU and GPU resources provide a great boost to large-scale scientific computing applications. Code generation utilities that partition the work into CPU and GPU tasks while considering data movement costs allow researchers to more quickly and easily develop high-performance solutions, and make these resources accessible to a larger user base. We present developments for a domain-specific language (DSL) and code generation framework for solving partial differential equations (PDEs). These enhancements facilitate GPU-accelerated solution of the Boltzmann transport equation (BTE) for phonons, which is the governing equation for simulating thermal transport in semiconductor materials at sub-micron scales. The solution of the BTE involves thousands of coupled PDEs as well as complicated boundary conditions and nonlinear processing at each time step. These developments enable the DSL to generate configurable hybrid GPU/CPU code that couples accelerated kernels with user-defined code. We observed performance improvements of around 18X compared to a CPU-only version produced by this same DSL with minimal additional programming effort. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2303.08187 [pdf, other]

Vehicle lateral control using Machine Learning for automated vehicle guidance

Authors: Akash Fogla, Kanish Kumar, Sunnay Saurav, Bishnu ramanujan

Abstract: Uncertainty in decision-making is crucial in the machine learning model used for a safety-critical system that operates in the real world. Therefore, it is important to handle uncertainty in a graceful manner for the safe operation of the CPS. In this work, we design a vehicle's lateral controller using a machine-learning model. To this end, we train a random forest model that is an ensemble model… ▽ More Uncertainty in decision-making is crucial in the machine learning model used for a safety-critical system that operates in the real world. Therefore, it is important to handle uncertainty in a graceful manner for the safe operation of the CPS. In this work, we design a vehicle's lateral controller using a machine-learning model. To this end, we train a random forest model that is an ensemble model and a deep neural network model. Due to the ensemble in the random forest model, we can predict the confidence/uncertainty in the prediction. We train our controller on data generated from running the car on one track in the simulator and tested it on other tracks. Due to prediction in confidence, we could decide when the controller is less confident in prediction and takes control if needed. We have two results to share: first, even on a very small number of labeled data, a very good generalization capability of the random forest-based regressor in comparison with a deep neural network and accordingly random forest controller can drive on another similar track, where the deep neural network-based model fails to drive, and second confidence in predictions in random forest controller makes it possible to let us know when the controller is not confident in prediction and likely to fail. By creating a threshold, it was possible to take control when the controller is not safe and that is missing in a deep neural network-based controller. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2208.08295 [pdf, other]

ParaColorizer: Realistic Image Colorization using Parallel Generative Networks

Authors: Himanshu Kumar, Abeer Banerjee, Sumeet Saurav, Sanjay Singh

Abstract: Grayscale image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem makes it even more challenging since the outputs could be multi-modal. The learning-based methods currently in use produce acceptable results for straightforward cases but usually fail to restore the contextual information in the absence of clear figure-ground… ▽ More Grayscale image colorization is a fascinating application of AI for information restoration. The inherently ill-posed nature of the problem makes it even more challenging since the outputs could be multi-modal. The learning-based methods currently in use produce acceptable results for straightforward cases but usually fail to restore the contextual information in the absence of clear figure-ground separation. Also, the images suffer from color bleeding and desaturated backgrounds since a single model trained on full image features is insufficient for learning the diverse data modes. To address these issues, we present a parallel GAN-based colorization framework. In our approach, each separately tailored GAN pipeline colorizes the foreground (using object-level features) or the background (using full-image features). The foreground pipeline employs a Residual-UNet with self-attention as its generator trained using the full-image features and the corresponding object-level features from the COCO dataset. The background pipeline relies on full-image features and additional training examples from the Places dataset. We design a DenseFuse-based fusion network to obtain the final colorized image by feature-based fusion of the parallelly generated outputs. We show the shortcomings of the non-perceptual evaluation metrics commonly used to assess multi-modal problems like image colorization and perform extensive performance evaluation of our framework using multiple perceptual metrics. Our approach outperforms most of the existing learning-based methods and produces results comparable to the state-of-the-art. Further, we performed a runtime analysis and obtained an average inference time of 24ms per image. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.03288 [pdf, other]

doi 10.1145/3571600.3571607

Convolutional Ensembling based Few-Shot Defect Detection Technique

Authors: Soumyajit Karmakar, Abeer Banerjee, Prashant Sadashiv Gidde, Sumeet Saurav, Sanjay Singh

Abstract: Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the general task is to deal with heavy class imbalance. Our paper presents a new approach to few-shot classification, where we employ the knowledge-base of multiple pre-trained convolutional… ▽ More Over the past few years, there has been a significant improvement in the domain of few-shot learning. This learning paradigm has shown promising results for the challenging problem of anomaly detection, where the general task is to deal with heavy class imbalance. Our paper presents a new approach to few-shot classification, where we employ the knowledge-base of multiple pre-trained convolutional models that act as the backbone for our proposed few-shot framework. Our framework uses a novel ensembling technique for boosting the accuracy while drastically decreasing the total parameter count, thus paving the way for real-time implementation. We perform an extensive hyperparameter search using a power-line defect detection dataset and obtain an accuracy of 92.30% for the 5-way 5-shot task. Without further tuning, we evaluate our model on competing standards with the existing state-of-the-art methods and outperform them. △ Less

Submitted 23 November, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: 7 pages, 7 images

arXiv:2204.07032 [pdf]

Farmer-Bot: An Interactive Bot for Farmers

Authors: Narayana Darapaneni, Rajiv Tiwari, Anwesh Reddy Paduri, Suman Saurav, Rohit Chaoji, Sohil

Abstract: The Indian Agricultural sector generates huge employment accounting for over 54% of countrys workforce. Its overall stand in GDP is close to 14%. However, this sector has been plagued by knowledge and infrastructure deficit, especially in the rural sectors. Like other sectors, the Indian Agricultural sector has seen rapid digitization with use of technology and Kisan Call Center (KCC) is one such… ▽ More The Indian Agricultural sector generates huge employment accounting for over 54% of countrys workforce. Its overall stand in GDP is close to 14%. However, this sector has been plagued by knowledge and infrastructure deficit, especially in the rural sectors. Like other sectors, the Indian Agricultural sector has seen rapid digitization with use of technology and Kisan Call Center (KCC) is one such example. It is a Government of India initiative launched on 21st January 2004 which is a synthesis of two hitherto separate sectors the Information Technology and Agriculture sector. However, studies have shown to have constrains to KCC beneficiaries, especially in light of network congestion and incomplete knowledge of the call center representatives. With the advent of new technologies, like first-generation SMS based and next-generation social media tools like WhatsApp, farmers in India are digitally more connected to the agricultural information services. Previous studies have shown that the KCC dataset can be used as a viable alternative for Chat-bot. We will base our study with the available KCC dataset to build an NLP model by getting the semantic similarity of the queries made by farmers in the past and use it to automatically answer future queries. We will attempt to make a WhatsApp based chat-bot to easily communicate with farmers using RASA as a tool. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Showing 1–6 of 6 results for author: Saurav, S