-
AUV trajectory optimization with hydrodynamic forces for Icy Moon Exploration
Authors:
Lukas Rust,
Shubham Vyas,
Bilal Wehbe
Abstract:
To explore oceans on ice-covered moons in the solar system, energy-efficient Autonomous Underwater Vehicles (AUVs) with long ranges must cover enough distance to record and collect enough data. These usually underactuated vehicles are hard to control when performing tasks such as vertical docking or the inspection of vertical walls. This paper introduces a control strategy for DeepLeng to navigate…
▽ More
To explore oceans on ice-covered moons in the solar system, energy-efficient Autonomous Underwater Vehicles (AUVs) with long ranges must cover enough distance to record and collect enough data. These usually underactuated vehicles are hard to control when performing tasks such as vertical docking or the inspection of vertical walls. This paper introduces a control strategy for DeepLeng to navigate in the ice-covered ocean of Jupiter's moon Europa and presents simulation results preceding a discussion on what is further needed for robust control during the mission.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Region of Attraction Estimation for Free-Floating Systems under Time-Varying LQR Control
Authors:
Lasse Shala,
Shubham Vyas,
Mohamed Khalil Ben-Larbi,
Shivesh Kumar,
Enrico Stoll
Abstract:
Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar k…
▽ More
Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar kinodynamic coupling. Furthermore, this control approach integrates synergistically with Lyapunov based region of attraction (ROA) estimation, which, in the context of ADR and OOS, allows for reasoning about composability of different sub-maneuvers. In this paper, TVLQR was used to stabilize an ADR detumbling maneuver in simulation. Moreover, the ROA of the closed loop dynamics was estimated using a probabilistic method. In order to demonstrate the real-world applicability for free floating robots, further experiments were conducted onboard a free floating testbed.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images
Authors:
Abhishek Jha,
Yogesh Rawat,
Shruti Vyas
Abstract:
Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which…
▽ More
Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which is time-consuming, expensive, and prone to errors. We propose PV-S3 (Photovoltaic-Semi Supervised Segmentation), a Semi-Supervised Learning approach for semantic segmentation of defects in EL images that reduces reliance on extensive labeling. PV-S3 is a Deep learning model trained using a few labeled images along with numerous unlabeled images. We introduce a novel Semi Cross-Entropy loss function to train PV-S3 which addresses the challenges specific to automated PV defect detection, such as diverse defect types and class imbalance. We evaluate PV-S3 on multiple datasets and demonstrate its effectiveness and adaptability. With merely 20% labeled samples, we achieve an absolute improvement of 9.7% in IoU, 29.9% in Precision, 12.75% in Recall, and 20.42% in F1-Score over prior state-of-the-art supervised method (which uses 100% labeled samples) on UCF-EL dataset (largest dataset available for semantic segmentation of EL images) showing improvement in performance while reducing the annotation costs by 80%.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Exploring Adversarial Threat Models in Cyber Physical Battery Systems
Authors:
Shanthan Kumar Padisala,
Shashank Dhananjay Vyas,
Satadru Dey
Abstract:
Technological advancements like the Internet of Things (IoT) have facilitated data exchange across various platforms. This data exchange across various platforms has transformed the traditional battery system into a cyber physical system. Such connectivity makes modern cyber physical battery systems vulnerable to cyber threats where a cyber attacker can manipulate sensing and actuation signals to…
▽ More
Technological advancements like the Internet of Things (IoT) have facilitated data exchange across various platforms. This data exchange across various platforms has transformed the traditional battery system into a cyber physical system. Such connectivity makes modern cyber physical battery systems vulnerable to cyber threats where a cyber attacker can manipulate sensing and actuation signals to bring the battery system into an unsafe operating condition. Hence, it is essential to build resilience in modern cyber physical battery systems (CPBS) under cyber attacks. The first step of building such resilience is to analyze potential adversarial behavior, that is, how the adversaries can inject attacks into the battery systems. However, it has been found that in this under-explored area of battery cyber physical security, such an adversarial threat model has not been studied in a systematic manner. In this study, we address this gap and explore adversarial attack generation policies based on optimal control framework. The framework is developed by performing theoretical analysis, which is subsequently supported by evaluation with experimental data generated from a commercial battery cell.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Linear Model Predictive Control for a planar free-floating platform: A comparison of binary input constraint formulations
Authors:
Franek Stark,
Shubham Vyas,
Georg Schildbach,
Frank Kirchner
Abstract:
This work develops a first Model Predictive Control for European Space Agencies 3-dof free-floating platform. The challenges of the platform are the on/off thrusters, which cannot be actuated continuously and which are subject to certain timing constraints. This work compares penalty-term, Linear Complementarity Constraints, and classical Mixed Integer formulations in order to develop a controller…
▽ More
This work develops a first Model Predictive Control for European Space Agencies 3-dof free-floating platform. The challenges of the platform are the on/off thrusters, which cannot be actuated continuously and which are subject to certain timing constraints. This work compares penalty-term, Linear Complementarity Constraints, and classical Mixed Integer formulations in order to develop a controller that natively handles binary inputs. Furthermore, linear constraints are proposed which enforce the timing constraints. Only the Mixed Integer formulation turns out to work sufficiently. Hence, this work develops a new Mixed Integer MPC on the decoupled model of the platform. Feasibility analysis and simulation results show that for a short enough prediction horizon, this controller can (sub)optimally stabilize and control the system under consideration of the constraints in real-time.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Semi-supervised Active Learning for Video Action Detection
Authors:
Ayush Singh,
Aayush J Rana,
Akash Kumar,
Shruti Vyas,
Yogesh Singh Rawat
Abstract:
In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning i…
▽ More
In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning informative sample selection as well as semi-supervised learning pseudo label generation. First, we propose NoiseAug, a simple augmentation strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different benchmark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos. The code and models is publicly available at: \url{https://github.com/AKASH2907/semi-sup-active-learning}.
△ Less
Submitted 3 April, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Histopathological Image Classification and Vulnerability Analysis using Federated Learning
Authors:
Sankalp Vyas,
Amar Nath Patra,
Raj Mani Shukla
Abstract:
Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue…
▽ More
Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices.
However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flip** increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
AcroMonk: A Minimalist Underactuated Brachiating Robot
Authors:
Mahdi Javadi,
Daniel Harnack,
Paula Stocco,
Shivesh Kumar,
Shubham Vyas,
Daniel Pizzutilo,
Frank Kirchner
Abstract:
Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper p…
▽ More
Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper presents the simplest possible prototype of a brachiation robot, using only a single actuator and unactuated grippers. The novel passive gripper design allows it to snap on and release from monkey bars, while guaranteeing well defined start and end poses of the swing. The brachiation behavior is realized in three different ways, using trajectory optimization via direct collocation and stabilization by a model-based time-varying linear quadratic regulator (TVLQR) or model-free proportional derivative (PD) control, as well as by a reinforcement learning (RL) based control policy. The three control schemes are compared in terms of robustness to disturbances, mass uncertainty, and energy consumption. The system design and controllers have been open-sourced. Due to its minimal and open design, the system can serve as a canonical underactuated platform for education and research.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Automated Cyber Defence: A Review
Authors:
Sanyam Vyas,
John Hannay,
Andrew Bolton,
Professor Pete Burnap
Abstract:
Within recent times, cybercriminals have curated a variety of organised and resolute cyber attacks within a range of cyber systems, leading to consequential ramifications to private and governmental institutions. Current security-based automation and orchestrations focus on automating fixed purpose and hard-coded solutions, which are easily surpassed by modern-day cyber attacks. Research within Au…
▽ More
Within recent times, cybercriminals have curated a variety of organised and resolute cyber attacks within a range of cyber systems, leading to consequential ramifications to private and governmental institutions. Current security-based automation and orchestrations focus on automating fixed purpose and hard-coded solutions, which are easily surpassed by modern-day cyber attacks. Research within Automated Cyber Defence will allow the development and enabling intelligence response by autonomously defending networked systems through sequential decision-making agents. This article comprehensively elaborates the developments within Automated Cyber Defence through a requirement analysis divided into two sub-areas, namely, automated defence and attack agents and Autonomous Cyber Operation (ACO) Gyms. The requirement analysis allows the comparison of automated agents and highlights the importance of ACO Gyms for their continual development. The requirement analysis is also used to critique ACO Gyms with an overall aim to develop them for deploying automated agents within real-world networked systems. Relevant future challenges were addressed from the overall analysis to accelerate development within the area of Automated Cyber Defence.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Trajectory Optimization and Following for a Three Degrees of Freedom Overactuated Floating Platform
Authors:
Anton Bredenbeck,
Shubham Vyas,
Martin Zwick,
Dorit Borrmann,
Miguel Olivares-Mendez,
Andreas Nüchter
Abstract:
Space robotics applications, such as Active Space Debris Removal (ASDR), require representative testing before launch. A commonly used approach to emulate the microgravity environment in space is air-bearing based platforms on flat-floors, such as the European Space Agency's Orbital Robotics and GNC Lab (ORGL). This work proposes a control architecture for a floating platform at the ORGL, equipped…
▽ More
Space robotics applications, such as Active Space Debris Removal (ASDR), require representative testing before launch. A commonly used approach to emulate the microgravity environment in space is air-bearing based platforms on flat-floors, such as the European Space Agency's Orbital Robotics and GNC Lab (ORGL). This work proposes a control architecture for a floating platform at the ORGL, equipped with eight solenoid-valve-based thrusters and one reaction wheel. The control architecture consists of two main components: a trajectory planner that finds optimal trajectories connecting two states and a trajectory follower that follows any physically feasible trajectory. The controller is first evaluated within an introduced simulation, achieving a 100 % success rate at finding and following trajectories to the origin within a Monte-Carlo test. Individual trajectories are also successfully followed by the physical system. In this work, we showcase the ability of the controller to reject disturbances and follow a straight-line trajectory within tens of centimeters.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
GAMa: Cross-view Video Geo-localization
Authors:
Shruti Vyas,
Chen Chen,
Mubarak Shah
Abstract:
The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding…
▽ More
The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images. We also propose a novel approach to solve this problem. At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video. Moreover, we propose a hierarchical approach to further improve the clip-level geolocalization. It is a challenging dataset, unaligned and limited field of view, and our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile. Code and dataset are available at following link: https://github.com/svyas23/GAMa.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Authors:
Madeline C. Schiappa,
Shruti Vyas,
Hamid Palangi,
Yogesh S. Rawat,
Vibhav Vineet
Abstract:
Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-vid…
▽ More
Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-video retrieval and propose two large-scale benchmark datasets, MSRVTT-P and YouCook2-P, which utilize 90 different visual and 35 different text perturbations. The study reveals some interesting initial findings from the studied models: 1) models are generally more susceptible when only video is perturbed as opposed to when only text is perturbed, 2) models that are pre-trained are more robust than those trained from scratch, 3) models attend more to scene and objects rather than motion and action. We hope this study will serve as a benchmark and guide future research in robust video-language learning. The benchmark introduced in this study along with the code and datasets is available at https://bit.ly/3CNOly4.
△ Less
Submitted 18 July, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Large-scale Robustness Analysis of Video Action Recognition Models
Authors:
Madeline Chantry Schiappa,
Naman Biyani,
Prudvi Kamtam,
Shruti Vyas,
Hamid Palangi,
Vibhav Vineet,
Yogesh Rawat
Abstract:
We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world d…
▽ More
We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world distribution shift perturbations instead of adversarial perturbations. We propose four different benchmark datasets, HMDB51-P, UCF101-P, Kinetics400-P, and SSv2-P to perform this analysis. We study robustness of six state-of-the-art action recognition models against 90 different perturbations. The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than CNN based models, and 3) All of the studied models are robust to temporal perturbations for all datasets but SSv2; suggesting the importance of temporal information for action recognition varies based on the dataset and activities. Next, we study the role of augmentations in model robustness and present a real-world dataset, UCF101-DS, which contains realistic distribution shifts, to further validate some of these findings. We believe this study will serve as a benchmark for future research in robust video action recognition.
△ Less
Submitted 7 April, 2023; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Finding and Following Optimal Trajectories for an Overactuated Floating Robotic Platform
Authors:
Anton Bredenbeck,
Shubham Vyas,
Willem Suter,
Martin Zwick,
Dorit Borrmann,
Miguel Olivares-Mendez,
Andreas Nüchter
Abstract:
The recent increase in yearly spacecraft launches and the high number of planned launches have raised questions about maintaining accessibility to space for all interested parties. A key to sustaining the future of space-flight is the ability to service malfunctioning - and actively remove dysfunctional spacecraft from orbit. Robotic platforms that autonomously perform these tasks are a topic of o…
▽ More
The recent increase in yearly spacecraft launches and the high number of planned launches have raised questions about maintaining accessibility to space for all interested parties. A key to sustaining the future of space-flight is the ability to service malfunctioning - and actively remove dysfunctional spacecraft from orbit. Robotic platforms that autonomously perform these tasks are a topic of ongoing research and thus must undergo thorough testing before launch. For representative system-level testing, the European Space Agency (ESA) uses, among other things, the Orbital Robotics and GNC Lab (ORGL), a flat-floor facility where air-bearing based platforms exhibit free-floating behavior in three Degrees of Freedom (DoF). This work introduces a representative simulation of a free-floating platform in the testing environment and a software framework for controller development. Finally, this work proposes a controller within that framework for finding and following optimal trajectories between arbitrary states, which is evaluated in simulation and reality.
△ Less
Submitted 19 July, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Forecasting Solar Power Generation on the basis of Predictive and Corrective Maintenance Activities
Authors:
Soham Vyas,
Yuvraj Goyal,
Neel Bhatt,
Sanskar Bhuwania,
Hardik Patel,
Shakti Mishra,
Brijesh Tripathi
Abstract:
Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar radiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact ener…
▽ More
Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar radiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact energy production. This paper presents a novel work for forecasting solar power energy production based on maintenance activities, problems observed at a power plant, and weather data. The results accomplished on the datasets obtained from the 1MW solar power plant of PDEU (our university) that has generated data set with 13 columns as daily entries from 2012 to 2020. There are 12 structured columns and one unstructured column with manual text entries about different maintenance activities, problems observed, and weather conditions daily. The unstructured column is used to create a new feature column vector using Hash Map, flag words, and stop words. The final dataset comprises five important feature vector columns based on correlation and causality analysis.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Video Action Detection: Analysing Limitations and Challenges
Authors:
Rajat Modi,
Aayush Jung Rana,
Akash Kumar,
Praveen Tirupattur,
Shruti Vyas,
Yogesh Singh Rawat,
Mubarak Shah
Abstract:
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action…
▽ More
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action class. We first analyze the existing datasets on video action detection and discuss their limitations. Next, we propose a new dataset, Multi Actor Multi Action (MAMA) which overcomes these limitations and is more suitable for real world applications. In addition, we perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect. This reveals if the actions in these datasets really need the motion information of an actor, or whether they predict the occurrence of an action even by looking at a single frame. Finally, we investigate the widely held assumptions on the importance of temporal ordering: is temporal ordering important for detecting these actions? Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
△ Less
Submitted 16 April, 2022;
originally announced April 2022.
-
LARNet: Latent Action Representation for Human Action Synthesis
Authors:
Naman Biyani,
Aayush J Rana,
Shruti Vyas,
Yogesh S Rawat
Abstract:
We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach inste…
▽ More
We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four real-world human action datasets demonstrating the effectiveness of the proposed approach in generating human actions. Code available at https://github.com/aayushjr/larnet.
△ Less
Submitted 26 October, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Pose-guided Generative Adversarial Net for Novel View Action Synthesis
Authors:
Xianhang Li,
Junhao Zhang,
Kunchang Li,
Shruti Vyas,
Yogesh S Rawat
Abstract:
We focus on the problem of novel-view human action synthesis. Given an action video, the goal is to generate the same action from an unseen viewpoint. Naturally, novel view video synthesis is more challenging than image synthesis. It requires the synthesis of a sequence of realistic frames with temporal coherency. Besides, transferring the different actions to a novel target view requires awarenes…
▽ More
We focus on the problem of novel-view human action synthesis. Given an action video, the goal is to generate the same action from an unseen viewpoint. Naturally, novel view video synthesis is more challenging than image synthesis. It requires the synthesis of a sequence of realistic frames with temporal coherency. Besides, transferring the different actions to a novel target view requires awareness of action category and viewpoint change simultaneously. To address these challenges, we propose a novel framework named Pose-guided Action Separable Generative Adversarial Net (PAS-GAN), which utilizes pose to alleviate the difficulty of this task. First, we propose a recurrent pose-transformation module which transforms actions from the source view to the target view and generates novel view pose sequence in 2D coordinate space. Second, a well-transformed pose sequence enables us to separatethe action and background in the target view. We employ a novel local-global spatial transformation module to effectively generate sequential video features in the target view using these action and background features. Finally, the generated video features are used to synthesize human action with the help of a 3D decoder. Moreover, to focus on dynamic action in the video, we propose a novel multi-scale action-separable loss which further improves the video quality. We conduct extensive experiments on two large-scale multi-view human action datasets, NTU-RGBD and PKU-MMD, demonstrating the effectiveness of PAS-GAN which outperforms existing approaches.
△ Less
Submitted 8 December, 2021; v1 submitted 15 October, 2021;
originally announced October 2021.
-
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels
Authors:
Mohit Sharma,
Raj Patra,
Harshal Desai,
Shruti Vyas,
Yogesh Rawat,
Rajiv Ratn Shah
Abstract:
Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore the use of user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million vid…
▽ More
Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore the use of user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models will be publicly available for future research.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos
Authors:
Praveen Tirupattur,
Aayush J Rana,
Tushar Sangam,
Shruti Vyas,
Yogesh S Rawat,
Mubarak Shah
Abstract:
This paper summarizes the TinyAction challenge which was organized in ActivityNet workshop at CVPR 2021. This challenge focuses on recognizing real-world low-resolution activities present in videos. Action recognition task is currently focused around classifying the actions from high-quality videos where the actors and the action is clearly visible. While various approaches have been shown effecti…
▽ More
This paper summarizes the TinyAction challenge which was organized in ActivityNet workshop at CVPR 2021. This challenge focuses on recognizing real-world low-resolution activities present in videos. Action recognition task is currently focused around classifying the actions from high-quality videos where the actors and the action is clearly visible. While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region. However, many real world security videos often have the actual action captured in a small resolution, making action recognition in a tiny region a challenging task. In this work, we propose a benchmark dataset, TinyVIRAT-v2, which is comprised of naturally occuring low-resolution actions. This is an extension of the TinyVIRAT dataset and consists of actions with multiple labels. The videos are extracted from security videos which makes them realistic and more challenging. We use current state-of-the-art action recognition methods on the dataset as a benchmark, and propose the TinyAction Challenge.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Novel View Video Prediction Using a Dual Representation
Authors:
Sarah Shiraz,
Krishna Regmi,
Shruti Vyas,
Yogesh S. Rawat,
Mubarak Shah
Abstract:
We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view. The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree, as compared to the recent studies predicting small variations in viewpoint. Moreover, our met…
▽ More
We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view. The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree, as compared to the recent studies predicting small variations in viewpoint. Moreover, our method relies only onRGB frames to learn a dual representation which is used to generate the video from a novel viewpoint. The dual representation encompasses a view-dependent and a global representation which incorporates complementary details to enable novel view video prediction. We demonstrate the effectiveness of our framework on two real world datasets: NTU-RGB+D and CMU Panoptic. A comparison with the State-of-the-art novel view video prediction methods shows an improvement of 26.1% in SSIM, 13.6% in PSNR, and 60% inFVD scores without using explicit priors from target views.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Multilingual and code-switching ASR challenges for low resource Indian languages
Authors:
Anuj Diwan,
Rakesh Vaideeswaran,
Sanket Shah,
Ankita Singh,
Srinivasa Raghavan,
Shreya Khare,
Vinit Unni,
Saurabh Vyas,
Akash Rajpuria,
Chiranjeevi Yarra,
Ashish Mittal,
Prasanta Kumar Ghosh,
Preethi Jyothi,
Kalika Bali,
Vivek Seshadri,
Sunayana Sitaram,
Samarth Bharadwaj,
Jai Nanavati,
Raoul Nanavati,
Karthik Sankaranarayanan,
Tejaswi Seeram,
Basil Abraham
Abstract:
Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language…
▽ More
Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple languages are freely interchanged within a single sentence or between sentences. The success of low-resource multilingual and code-switching ASR often depends on the variety of languages in terms of their acoustics, linguistic characteristics as well as the amount of data available and how these are carefully considered in building the ASR system. In this challenge, we would like to focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali. For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
-
View-invariant action recognition
Authors:
Yogesh S Rawat,
Shruti Vyas
Abstract:
Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appea…
▽ More
Human action recognition is an important problem in computer vision. It has a wide range of applications in surveillance, human-computer interaction, augmented reality, video indexing, and retrieval. The varying pattern of spatio-temporal appearance generated by human action is key for identifying the performed action. We have seen a lot of research exploring this dynamics of spatio-temporal appearance for learning a visual representation of human actions. However, most of the research in action recognition is focused on some common viewpoints, and these approaches do not perform well when there is a change in viewpoint. Human actions are performed in a 3-dimensional environment and are projected to a 2-dimensional space when captured as a video from a given viewpoint. Therefore, an action will have a different spatio-temporal appearance from different viewpoints. The research in view-invariant action recognition addresses this problem and focuses on recognizing human actions from unseen viewpoints.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning
Authors:
Shruti Vyas,
Yogesh S Rawat,
Mubarak Shah
Abstract:
The recent success in deep learning has lead to various effective representation learning methods for videos. However, the current approaches for video representation require large amount of human labeled datasets for effective learning. We present an unsupervised representation learning framework to encode scene dynamics in videos captured from multiple viewpoints. The proposed framework has two…
▽ More
The recent success in deep learning has lead to various effective representation learning methods for videos. However, the current approaches for video representation require large amount of human labeled datasets for effective learning. We present an unsupervised representation learning framework to encode scene dynamics in videos captured from multiple viewpoints. The proposed framework has two main components: Representation Learning Network (RL-NET), which learns a representation with the help of Blending Network (BL-NET), and Video Rendering Network (VR-NET), which is used for video synthesis. The framework takes as input video clips from different viewpoints and time, learns an internal representation and uses this representation to render a video clip from an arbitrary given viewpoint and time. The ability of the proposed network to render video frames from arbitrary viewpoints and time enable it to learn a meaningful and robust representation of the scene dynamics. We demonstrate the effectiveness of the proposed method in rendering view-aware as well as time-aware video clips on two different real-world datasets including UCF-101 and NTU-RGB+D. To further validate the effectiveness of the learned representation, we use it for the task of view-invariant activity classification where we observe a significant improvement (~26%) in the performance on NTU-RGB+D dataset compared to the existing state-of-the art methods.
△ Less
Submitted 29 November, 2018; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Impact of E-Banking on Traditional Banking Services
Authors:
Shilpan Dineshkumar Vyas
Abstract:
Internet banking is changing the banking industry, having the major effects on banking relationships. Banking is now no longer confined to the branches were one has to approach the branch in person, to withdraw cash or deposit a cheque or request a statement of accounts. In true Internet banking, any inquiry or transaction is processed online without any reference to the branch (anywhere banking)…
▽ More
Internet banking is changing the banking industry, having the major effects on banking relationships. Banking is now no longer confined to the branches were one has to approach the branch in person, to withdraw cash or deposit a cheque or request a statement of accounts. In true Internet banking, any inquiry or transaction is processed online without any reference to the branch (anywhere banking) at any time. Providing Internet banking is increasingly becoming a "need to have" than a "nice to have" service. The net banking, thus, now is more of a norm rather than an exception in many developed countries due to the fact that it is the cheapest way of providing banking services. This research paper will introduce you to e-banking, giving the meaning, functions, types, advantages and limitations of e-banking. It will also show the impact of e-banking on traditional services and finally the result documentation.
△ Less
Submitted 31 August, 2012;
originally announced September 2012.
-
E-banking and E-commerce in India and USA
Authors:
Shilpan Vyas
Abstract:
Web based e-banking is becoming an important aspect of worldwide commerce. The United Nations predicts 17% of purchases by firms and individuals will be conducted online by 2006. The future of Web-based e-banking in developed areas appears bright but consumers and merchants in develo** countries face in number of barriers to successful e-banking, including less reliable telecommunications infras…
▽ More
Web based e-banking is becoming an important aspect of worldwide commerce. The United Nations predicts 17% of purchases by firms and individuals will be conducted online by 2006. The future of Web-based e-banking in developed areas appears bright but consumers and merchants in develo** countries face in number of barriers to successful e-banking, including less reliable telecommunications infrastructure and power supplies, less access to online payment mechanisms, and relatively high costs for personal computers and Internet access. How should managers in charge of e-banking prepare for global implementation? What can they do reach consumers in develo** countries? What factors influence the adoption of consumer-oriented e-banking in various countries? This research paper will give you the idea on the local conditions in India, the Hofstede's dimension of culture in India and USA, the Diffusion of Innovation theory and hence the hypotheses for the innovation characteristics of interest.
△ Less
Submitted 10 July, 2012;
originally announced July 2012.