Search | arXiv e-print repository

AUV trajectory optimization with hydrodynamic forces for Icy Moon Exploration

Authors: Lukas Rust, Shubham Vyas, Bilal Wehbe

Abstract: To explore oceans on ice-covered moons in the solar system, energy-efficient Autonomous Underwater Vehicles (AUVs) with long ranges must cover enough distance to record and collect enough data. These usually underactuated vehicles are hard to control when performing tasks such as vertical docking or the inspection of vertical walls. This paper introduces a control strategy for DeepLeng to navigate… ▽ More To explore oceans on ice-covered moons in the solar system, energy-efficient Autonomous Underwater Vehicles (AUVs) with long ranges must cover enough distance to record and collect enough data. These usually underactuated vehicles are hard to control when performing tasks such as vertical docking or the inspection of vertical walls. This paper introduces a control strategy for DeepLeng to navigate in the ice-covered ocean of Jupiter's moon Europa and presents simulation results preceding a discussion on what is further needed for robust control during the mission. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 7 pages, 8 figures

Journal ref: In 17th Symposium on Advanced Space Technologies in Robotics and Automation, 18-20 October 2023. 2023

arXiv:2405.06726 [pdf, other]

Region of Attraction Estimation for Free-Floating Systems under Time-Varying LQR Control

Authors: Lasse Shala, Shubham Vyas, Mohamed Khalil Ben-Larbi, Shivesh Kumar, Enrico Stoll

Abstract: Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar k… ▽ More Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar kinodynamic coupling. Furthermore, this control approach integrates synergistically with Lyapunov based region of attraction (ROA) estimation, which, in the context of ADR and OOS, allows for reasoning about composability of different sub-maneuvers. In this paper, TVLQR was used to stabilize an ADR detumbling maneuver in simulation. Moreover, the ROA of the closed loop dynamics was estimated using a probabilistic method. In order to demonstrate the real-world applicability for free floating robots, further experiments were conducted onboard a free floating testbed. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.13693 [pdf, other]

PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images

Authors: Abhishek Jha, Yogesh Rawat, Shruti Vyas

Abstract: Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which… ▽ More Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which is time-consuming, expensive, and prone to errors. We propose PV-S3 (Photovoltaic-Semi Supervised Segmentation), a Semi-Supervised Learning approach for semantic segmentation of defects in EL images that reduces reliance on extensive labeling. PV-S3 is a Deep learning model trained using a few labeled images along with numerous unlabeled images. We introduce a novel Semi Cross-Entropy loss function to train PV-S3 which addresses the challenges specific to automated PV defect detection, such as diverse defect types and class imbalance. We evaluate PV-S3 on multiple datasets and demonstrate its effectiveness and adaptability. With merely 20% labeled samples, we achieve an absolute improvement of 9.7% in IoU, 29.9% in Precision, 12.75% in Recall, and 20.42% in F1-Score over prior state-of-the-art supervised method (which uses 100% labeled samples) on UCF-EL dataset (largest dataset available for semantic segmentation of EL images) showing improvement in performance while reducing the annotation costs by 80%. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2401.13801 [pdf, other]

Exploring Adversarial Threat Models in Cyber Physical Battery Systems

Authors: Shanthan Kumar Padisala, Shashank Dhananjay Vyas, Satadru Dey

Abstract: Technological advancements like the Internet of Things (IoT) have facilitated data exchange across various platforms. This data exchange across various platforms has transformed the traditional battery system into a cyber physical system. Such connectivity makes modern cyber physical battery systems vulnerable to cyber threats where a cyber attacker can manipulate sensing and actuation signals to… ▽ More Technological advancements like the Internet of Things (IoT) have facilitated data exchange across various platforms. This data exchange across various platforms has transformed the traditional battery system into a cyber physical system. Such connectivity makes modern cyber physical battery systems vulnerable to cyber threats where a cyber attacker can manipulate sensing and actuation signals to bring the battery system into an unsafe operating condition. Hence, it is essential to build resilience in modern cyber physical battery systems (CPBS) under cyber attacks. The first step of building such resilience is to analyze potential adversarial behavior, that is, how the adversaries can inject attacks into the battery systems. However, it has been found that in this under-explored area of battery cyber physical security, such an adversarial threat model has not been studied in a systematic manner. In this study, we address this gap and explore adversarial attack generation policies based on optimal control framework. The framework is developed by performing theoretical analysis, which is subsequently supported by evaluation with experimental data generated from a commercial battery cell. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2312.10788 [pdf, other]

Linear Model Predictive Control for a planar free-floating platform: A comparison of binary input constraint formulations

Authors: Franek Stark, Shubham Vyas, Georg Schildbach, Frank Kirchner

Abstract: This work develops a first Model Predictive Control for European Space Agencies 3-dof free-floating platform. The challenges of the platform are the on/off thrusters, which cannot be actuated continuously and which are subject to certain timing constraints. This work compares penalty-term, Linear Complementarity Constraints, and classical Mixed Integer formulations in order to develop a controller… ▽ More This work develops a first Model Predictive Control for European Space Agencies 3-dof free-floating platform. The challenges of the platform are the on/off thrusters, which cannot be actuated continuously and which are subject to certain timing constraints. This work compares penalty-term, Linear Complementarity Constraints, and classical Mixed Integer formulations in order to develop a controller that natively handles binary inputs. Furthermore, linear constraints are proposed which enforce the timing constraints. Only the Mixed Integer formulation turns out to work sufficiently. Hence, this work develops a new Mixed Integer MPC on the decoupled model of the platform. Feasibility analysis and simulation results show that for a short enough prediction horizon, this controller can (sub)optimally stabilize and control the system under consideration of the constraints in real-time. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 17th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA 2023)

arXiv:2312.07169 [pdf, other]

Semi-supervised Active Learning for Video Action Detection

Authors: Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat

Abstract: In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning i… ▽ More In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning informative sample selection as well as semi-supervised learning pseudo label generation. First, we propose NoiseAug, a simple augmentation strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different benchmark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos. The code and models is publicly available at: \url{https://github.com/AKASH2907/semi-sup-active-learning}. △ Less

Submitted 3 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: AAAI Conference on Artificial Intelligence, Main Technical Track (AAAI), 2024, Code: https://github.com/AKASH2907/semi-sup-active-learning

arXiv:2310.07380 [pdf, other]

Histopathological Image Classification and Vulnerability Analysis using Federated Learning

Authors: Sankalp Vyas, Amar Nath Patra, Raj Mani Shukla

Abstract: Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue… ▽ More Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices. However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flip** increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted in IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

arXiv:2305.08373 [pdf, other]

doi 10.1109/LRA.2023.3269296

AcroMonk: A Minimalist Underactuated Brachiating Robot

Authors: Mahdi Javadi, Daniel Harnack, Paula Stocco, Shivesh Kumar, Shubham Vyas, Daniel Pizzutilo, Frank Kirchner

Abstract: Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper p… ▽ More Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper presents the simplest possible prototype of a brachiation robot, using only a single actuator and unactuated grippers. The novel passive gripper design allows it to snap on and release from monkey bars, while guaranteeing well defined start and end poses of the swing. The brachiation behavior is realized in three different ways, using trajectory optimization via direct collocation and stabilization by a model-based time-varying linear quadratic regulator (TVLQR) or model-free proportional derivative (PD) control, as well as by a reinforcement learning (RL) based control policy. The three control schemes are compared in terms of robustness to disturbances, mass uncertainty, and energy consumption. The system design and controllers have been open-sourced. Due to its minimal and open design, the system can serve as a canonical underactuated platform for education and research. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: The open-source implementation is available at https://github.com/dfki-ric-underactuated-lab/acromonk and a video demonstration of the experiments can be accessed at https://youtu.be/FIcDNtJo9Jc}

Journal ref: journal={IEEE Robotics and Automation Letters}, year={2023}, volume={8}, number={6}, pages={3637-3644}

arXiv:2303.04926 [pdf, other]

Automated Cyber Defence: A Review

Authors: Sanyam Vyas, John Hannay, Andrew Bolton, Professor Pete Burnap

Abstract: Within recent times, cybercriminals have curated a variety of organised and resolute cyber attacks within a range of cyber systems, leading to consequential ramifications to private and governmental institutions. Current security-based automation and orchestrations focus on automating fixed purpose and hard-coded solutions, which are easily surpassed by modern-day cyber attacks. Research within Au… ▽ More Within recent times, cybercriminals have curated a variety of organised and resolute cyber attacks within a range of cyber systems, leading to consequential ramifications to private and governmental institutions. Current security-based automation and orchestrations focus on automating fixed purpose and hard-coded solutions, which are easily surpassed by modern-day cyber attacks. Research within Automated Cyber Defence will allow the development and enabling intelligence response by autonomously defending networked systems through sequential decision-making agents. This article comprehensively elaborates the developments within Automated Cyber Defence through a requirement analysis divided into two sub-areas, namely, automated defence and attack agents and Autonomous Cyber Operation (ACO) Gyms. The requirement analysis allows the comparison of automated agents and highlights the importance of ACO Gyms for their continual development. The requirement analysis is also used to critique ACO Gyms with an overall aim to develop them for deploying automated agents within real-world networked systems. Relevant future challenges were addressed from the overall analysis to accelerate development within the area of Automated Cyber Defence. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2207.10693 [pdf, other]

Trajectory Optimization and Following for a Three Degrees of Freedom Overactuated Floating Platform

Authors: Anton Bredenbeck, Shubham Vyas, Martin Zwick, Dorit Borrmann, Miguel Olivares-Mendez, Andreas Nüchter

Abstract: Space robotics applications, such as Active Space Debris Removal (ASDR), require representative testing before launch. A commonly used approach to emulate the microgravity environment in space is air-bearing based platforms on flat-floors, such as the European Space Agency's Orbital Robotics and GNC Lab (ORGL). This work proposes a control architecture for a floating platform at the ORGL, equipped… ▽ More Space robotics applications, such as Active Space Debris Removal (ASDR), require representative testing before launch. A commonly used approach to emulate the microgravity environment in space is air-bearing based platforms on flat-floors, such as the European Space Agency's Orbital Robotics and GNC Lab (ORGL). This work proposes a control architecture for a floating platform at the ORGL, equipped with eight solenoid-valve-based thrusters and one reaction wheel. The control architecture consists of two main components: a trajectory planner that finds optimal trajectories connecting two states and a trajectory follower that follows any physically feasible trajectory. The controller is first evaluated within an introduced simulation, achieving a 100 % success rate at finding and following trajectories to the origin within a Monte-Carlo test. Individual trajectories are also successfully followed by the physical system. In this work, we showcase the ability of the controller to reject disturbances and follow a straight-line trajectory within tens of centimeters. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted to IROS2022, code at https://gitlab.com/anton.bredenbeck/ff-trajectories

arXiv:2207.02431 [pdf, other]

GAMa: Cross-view Video Geo-localization

Authors: Shruti Vyas, Chen Chen, Mubarak Shah

Abstract: The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding… ▽ More The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images. We also propose a novel approach to solve this problem. At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video. Moreover, we propose a hierarchical approach to further improve the clip-level geolocalization. It is a challenging dataset, unaligned and limited field of view, and our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile. Code and dataset are available at following link: https://github.com/svyas23/GAMa. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Journal ref: ECCV 2022

arXiv:2207.02159 [pdf, other]

Robustness Analysis of Video-Language Models Against Visual and Language Perturbations

Authors: Madeline C. Schiappa, Shruti Vyas, Hamid Palangi, Yogesh S. Rawat, Vibhav Vineet

Abstract: Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-vid… ▽ More Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-video retrieval and propose two large-scale benchmark datasets, MSRVTT-P and YouCook2-P, which utilize 90 different visual and 35 different text perturbations. The study reveals some interesting initial findings from the studied models: 1) models are generally more susceptible when only video is perturbed as opposed to when only text is perturbed, 2) models that are pre-trained are more robust than those trained from scratch, 3) models attend more to scene and objects rather than motion and action. We hope this study will serve as a benchmark and guide future research in robust video-language learning. The benchmark introduced in this study along with the code and datasets is available at https://bit.ly/3CNOly4. △ Less

Submitted 18 July, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: NeurIPS 2022 Datasets and Benchmarks Track. This projects webpage is located at https://bit.ly/3CNOly4

Journal ref: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022)

arXiv:2207.01398 [pdf, other]

Large-scale Robustness Analysis of Video Action Recognition Models

Authors: Madeline Chantry Schiappa, Naman Biyani, Prudvi Kamtam, Shruti Vyas, Hamid Palangi, Vibhav Vineet, Yogesh Rawat

Abstract: We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world d… ▽ More We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world distribution shift perturbations instead of adversarial perturbations. We propose four different benchmark datasets, HMDB51-P, UCF101-P, Kinetics400-P, and SSv2-P to perform this analysis. We study robustness of six state-of-the-art action recognition models against 90 different perturbations. The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than CNN based models, and 3) All of the studied models are robust to temporal perturbations for all datasets but SSv2; suggesting the importance of temporal information for action recognition varies based on the dataset and activities. Next, we study the role of augmentations in model robustness and present a real-world dataset, UCF101-DS, which contains realistic distribution shifts, to further validate some of these findings. We believe this study will serve as a benchmark for future research in robust video action recognition. △ Less

Submitted 7 April, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted in 2023 Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2206.03993 [pdf, other]

Finding and Following Optimal Trajectories for an Overactuated Floating Robotic Platform

Authors: Anton Bredenbeck, Shubham Vyas, Willem Suter, Martin Zwick, Dorit Borrmann, Miguel Olivares-Mendez, Andreas Nüchter

Abstract: The recent increase in yearly spacecraft launches and the high number of planned launches have raised questions about maintaining accessibility to space for all interested parties. A key to sustaining the future of space-flight is the ability to service malfunctioning - and actively remove dysfunctional spacecraft from orbit. Robotic platforms that autonomously perform these tasks are a topic of o… ▽ More The recent increase in yearly spacecraft launches and the high number of planned launches have raised questions about maintaining accessibility to space for all interested parties. A key to sustaining the future of space-flight is the ability to service malfunctioning - and actively remove dysfunctional spacecraft from orbit. Robotic platforms that autonomously perform these tasks are a topic of ongoing research and thus must undergo thorough testing before launch. For representative system-level testing, the European Space Agency (ESA) uses, among other things, the Orbital Robotics and GNC Lab (ORGL), a flat-floor facility where air-bearing based platforms exhibit free-floating behavior in three Degrees of Freedom (DoF). This work introduces a representative simulation of a free-floating platform in the testing environment and a software framework for controller development. Finally, this work proposes a controller within that framework for finding and following optimal trajectories between arbitrary states, which is evaluated in simulation and reality. △ Less

Submitted 19 July, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: 16th Symposium on Advanced Space Technologies in Robotics and Automation 2022

arXiv:2205.08109 [pdf]

Forecasting Solar Power Generation on the basis of Predictive and Corrective Maintenance Activities

Authors: Soham Vyas, Yuvraj Goyal, Neel Bhatt, Sanskar Bhuwania, Hardik Patel, Shakti Mishra, Brijesh Tripathi

Abstract: Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar radiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact ener… ▽ More Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar radiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact energy production. This paper presents a novel work for forecasting solar power energy production based on maintenance activities, problems observed at a power plant, and weather data. The results accomplished on the datasets obtained from the 1MW solar power plant of PDEU (our university) that has generated data set with 13 columns as daily entries from 2012 to 2020. There are 12 structured columns and one unstructured column with manual text entries about different maintenance activities, problems observed, and weather conditions daily. The unstructured column is used to create a new feature column vector using Hash Map, flag words, and stop words. The final dataset comprises five important feature vector columns based on correlation and causality analysis. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2204.07892 [pdf, other]

Video Action Detection: Analysing Limitations and Challenges

Authors: Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah

Abstract: Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action… ▽ More Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action class. We first analyze the existing datasets on video action detection and discuss their limitations. Next, we propose a new dataset, Multi Actor Multi Action (MAMA) which overcomes these limitations and is more suitable for real world applications. In addition, we perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect. This reveals if the actions in these datasets really need the motion information of an actor, or whether they predict the occurrence of an action even by looking at a single frame. Finally, we investigate the widely held assumptions on the importance of temporal ordering: is temporal ordering important for detecting these actions? Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: CVPRW'22

arXiv:2110.10899 [pdf, other]

LARNet: Latent Action Representation for Human Action Synthesis

Authors: Naman Biyani, Aayush J Rana, Shruti Vyas, Yogesh S Rawat

Abstract: We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach inste… ▽ More We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four real-world human action datasets demonstrating the effectiveness of the proposed approach in generating human actions. Code available at https://github.com/aayushjr/larnet. △ Less

Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: British Machine Vision Conference (BMVC) 2021

arXiv:2110.07993 [pdf, other]

Pose-guided Generative Adversarial Net for Novel View Action Synthesis

Authors: Xianhang Li, Junhao Zhang, Kunchang Li, Shruti Vyas, Yogesh S Rawat

Abstract: We focus on the problem of novel-view human action synthesis. Given an action video, the goal is to generate the same action from an unseen viewpoint. Naturally, novel view video synthesis is more challenging than image synthesis. It requires the synthesis of a sequence of realistic frames with temporal coherency. Besides, transferring the different actions to a novel target view requires awarenes… ▽ More We focus on the problem of novel-view human action synthesis. Given an action video, the goal is to generate the same action from an unseen viewpoint. Naturally, novel view video synthesis is more challenging than image synthesis. It requires the synthesis of a sequence of realistic frames with temporal coherency. Besides, transferring the different actions to a novel target view requires awareness of action category and viewpoint change simultaneously. To address these challenges, we propose a novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN), which utilizes pose to alleviate the difficulty of this task. First, we propose a recurrent pose-transformation module which transforms actions from the source view to the target view and generates novel view pose sequence in 2D coordinate space. Second, a well-transformed pose sequence enables us to separatethe action and background in the target view. We employ a novel local-global spatial transformation module to effectively generate sequential video features in the target view using these action and background features. Finally, the generated video features are used to synthesize human action with the help of a 3D decoder. Moreover, to focus on dynamic action in the video, we propose a novel multi-scale action-separable loss which further improves the video quality. We conduct extensive experiments on two large-scale multi-view human action datasets, NTU-RGBD and PKU-MMD, demonstrating the effectiveness of PAS-GAN which outperforms existing approaches. △ Less

Submitted 8 December, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted by WACV2022

arXiv:2110.06827 [pdf, other]

NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Authors: Mohit Sharma, Raj Patra, Harshal Desai, Shruti Vyas, Yogesh Rawat, Rajiv Ratn Shah

Abstract: Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore the use of user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million vid… ▽ More Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore the use of user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models will be publicly available for future research. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted at ACM Multimedia Asia 2021

arXiv:2107.11494 [pdf, other]

TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos

Authors: Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah

Abstract: This paper summarizes the TinyAction challenge which was organized in ActivityNet workshop at CVPR 2021. This challenge focuses on recognizing real-world low-resolution activities present in videos. Action recognition task is currently focused around classifying the actions from high-quality videos where the actors and the action is clearly visible. While various approaches have been shown effecti… ▽ More This paper summarizes the TinyAction challenge which was organized in ActivityNet workshop at CVPR 2021. This challenge focuses on recognizing real-world low-resolution activities present in videos. Action recognition task is currently focused around classifying the actions from high-quality videos where the actors and the action is clearly visible. While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region. However, many real world security videos often have the actual action captured in a small resolution, making action recognition in a tiny region a challenging task. In this work, we propose a benchmark dataset, TinyVIRAT-v2, which is comprised of naturally occuring low-resolution actions. This is an extension of the TinyVIRAT dataset and consists of actions with multiple labels. The videos are extracted from security videos which makes them realistic and more challenging. We use current state-of-the-art action recognition methods on the dataset as a benchmark, and propose the TinyAction Challenge. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 8 pages. arXiv admin note: text overlap with arXiv:2007.07355

arXiv:2106.03956 [pdf, other]

Novel View Video Prediction Using a Dual Representation

Authors: Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah

Abstract: We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view. The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree, as compared to the recent studies predicting small variations in viewpoint. Moreover, our met… ▽ More We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view. The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree, as compared to the recent studies predicting small variations in viewpoint. Moreover, our method relies only onRGB frames to learn a dual representation which is used to generate the video from a novel viewpoint. The dual representation encompasses a view-dependent and a global representation which incorporates complementary details to enable novel view video prediction. We demonstrate the effectiveness of our framework on two real world datasets: NTU-RGB+D and CMU Panoptic. A comparison with the State-of-the-art novel view video prediction methods shows an improvement of 26.1% in SSIM, 13.6% in PSNR, and 60% inFVD scores without using explicit priors from target views. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted in ICIP 2021

arXiv:2104.00235 [pdf, ps, other]

doi 10.21437/Interspeech.2021-1339

Multilingual and code-switching ASR challenges for low resource Indian languages

Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple languages are freely interchanged within a single sentence or between sentences. The success of low-resource multilingual and code-switching ASR often depends on the variety of languages in terms of their acoustics, linguistic characteristics as well as the amount of data available and how these are carefully considered in building the ASR system. In this challenge, we would like to focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali. For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: 6 pages

arXiv:2009.00638 [pdf, other]

doi 10.1007/978-3-030-03243-2_878-1

View-invariant action recognition

Authors: Yogesh S Rawat, Shruti Vyas

Abstract: Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appea… ▽ More Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appearance for learning a visual representation of human actions. However, most of the research in action recognition is focused on some common viewpoints, and these approaches do not perform well when there is a change in viewpoint. Human actions are performed in a 3-dimensional environment and are projected to a 2-dimensional space when captured as a video from a given viewpoint. Therefore, an action will have a different spatio-temporal appearance from different viewpoints. The research in view-invariant action recognition addresses this problem and focuses on recognizing human actions from unseen viewpoints. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:1811.10699 [pdf, other]

Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning

Authors: Shruti Vyas, Yogesh S Rawat, Mubarak Shah

Abstract: The recent success in deep learning has lead to various effective representation learning methods for videos. However, the current approaches for video representation require large amount of human labeled datasets for effective learning. We present an unsupervised representation learning framework to encode scene dynamics in videos captured from multiple viewpoints. The proposed framework has two… ▽ More The recent success in deep learning has lead to various effective representation learning methods for videos. However, the current approaches for video representation require large amount of human labeled datasets for effective learning. We present an unsupervised representation learning framework to encode scene dynamics in videos captured from multiple viewpoints. The proposed framework has two main components: Representation Learning Network (RL-NET), which learns a representation with the help of Blending Network (BL-NET), and Video Rendering Network (VR-NET), which is used for video synthesis. The framework takes as input video clips from different viewpoints and time, learns an internal representation and uses this representation to render a video clip from an arbitrary given viewpoint and time. The ability of the proposed network to render video frames from arbitrary viewpoints and time enable it to learn a meaningful and robust representation of the scene dynamics. We demonstrate the effectiveness of the proposed method in rendering view-aware as well as time-aware video clips on two different real-world datasets including UCF-101 and NTU-RGB+D. To further validate the effectiveness of the learned representation, we use it for the task of view-invariant activity classification where we observe a significant improvement (~26%) in the performance on NTU-RGB+D dataset compared to the existing state-of-the art methods. △ Less

Submitted 29 November, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

arXiv:1209.2368 [pdf]

Impact of E-Banking on Traditional Banking Services

Authors: Shilpan Dineshkumar Vyas

Abstract: Internet banking is changing the banking industry, having the major effects on banking relationships. Banking is now no longer confined to the branches were one has to approach the branch in person, to withdraw cash or deposit a cheque or request a statement of accounts. In true Internet banking, any inquiry or transaction is processed online without any reference to the branch (anywhere banking)… ▽ More Internet banking is changing the banking industry, having the major effects on banking relationships. Banking is now no longer confined to the branches were one has to approach the branch in person, to withdraw cash or deposit a cheque or request a statement of accounts. In true Internet banking, any inquiry or transaction is processed online without any reference to the branch (anywhere banking) at any time. Providing Internet banking is increasingly becoming a "need to have" than a "nice to have" service. The net banking, thus, now is more of a norm rather than an exception in many developed countries due to the fact that it is the cheapest way of providing banking services. This research paper will introduce you to e-banking, giving the meaning, functions, types, advantages and limitations of e-banking. It will also show the impact of e-banking on traditional services and finally the result documentation. △ Less

Submitted 31 August, 2012; originally announced September 2012.

Journal ref: International Journal of Computer Science & Communication Networks, Vol 2(3), 310-313, June-July 2012, ISSN: 2249-5789

arXiv:1207.2741 [pdf]

E-banking and E-commerce in India and USA

Authors: Shilpan Vyas

Abstract: Web based e-banking is becoming an important aspect of worldwide commerce. The United Nations predicts 17% of purchases by firms and individuals will be conducted online by 2006. The future of Web-based e-banking in developed areas appears bright but consumers and merchants in develo** countries face in number of barriers to successful e-banking, including less reliable telecommunications infras… ▽ More Web based e-banking is becoming an important aspect of worldwide commerce. The United Nations predicts 17% of purchases by firms and individuals will be conducted online by 2006. The future of Web-based e-banking in developed areas appears bright but consumers and merchants in develo** countries face in number of barriers to successful e-banking, including less reliable telecommunications infrastructure and power supplies, less access to online payment mechanisms, and relatively high costs for personal computers and Internet access. How should managers in charge of e-banking prepare for global implementation? What can they do reach consumers in develo** countries? What factors influence the adoption of consumer-oriented e-banking in various countries? This research paper will give you the idea on the local conditions in India, the Hofstede's dimension of culture in India and USA, the Diffusion of Innovation theory and hence the hypotheses for the innovation characteristics of interest. △ Less

Submitted 10 July, 2012; originally announced July 2012.

Comments: E-commerce and E-banking, Diffusion of Innovation theory, Hofstede's dimensions

Journal ref: Published in IJCSI Journal, Volume 9, Issue 3, No 2, May 2012

Showing 1–26 of 26 results for author: Vyas, S