Search | arXiv e-print repository

Datasets, Clues and State-of-the-Arts for Multimedia Forensics: An Extensive Review

Authors: Ankit Yadav, Dinesh Kumar Vishwakarma

Abstract: With the large chunks of social media data being created daily and the parallel rise of realistic multimedia tampering methods, detecting and localising tampering in images and videos has become essential. This survey focusses on approaches for tampering detection in multimedia data using deep learning models. Specifically, it presents a detailed analysis of benchmark datasets for malicious manipu… ▽ More With the large chunks of social media data being created daily and the parallel rise of realistic multimedia tampering methods, detecting and localising tampering in images and videos has become essential. This survey focusses on approaches for tampering detection in multimedia data using deep learning models. Specifically, it presents a detailed analysis of benchmark datasets for malicious manipulation detection that are publicly available. It also offers a comprehensive list of tampering clues and commonly used deep learning architectures. Next, it discusses the current state-of-the-art tampering detection methods, categorizing them into meaningful types such as deepfake detection methods, splice tampering detection methods, copy-move tampering detection methods, etc. and discussing their strengths and weaknesses. Top results achieved on benchmark datasets, comparison of deep learning approaches against traditional methods and critical insights from the recent tampering detection methods are also discussed. Lastly, the research gaps, future direction and conclusion are discussed to provide an in-depth understanding of the tampering detection research arena. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.06998 [pdf]

Towards Effective Image Forensics via A Novel Computationally Efficient Framework and A New Image Splice Dataset

Authors: Ankit Yadav, Dinesh Kumar Vishwakarma

Abstract: Splice detection models are the need of the hour since splice manipulations can be used to mislead, spread rumors and create disharmony in society. However, there is a severe lack of image splicing datasets, which restricts the capabilities of deep learning models to extract discriminative features without overfitting. This manuscript presents two-fold contributions toward splice detection. Firstl… ▽ More Splice detection models are the need of the hour since splice manipulations can be used to mislead, spread rumors and create disharmony in society. However, there is a severe lack of image splicing datasets, which restricts the capabilities of deep learning models to extract discriminative features without overfitting. This manuscript presents two-fold contributions toward splice detection. Firstly, a novel splice detection dataset is proposed having two variants. The two variants include spliced samples generated from code and through manual editing. Spliced images in both variants have corresponding binary masks to aid localization approaches. Secondly, a novel Spatio-Compression Lightweight Splice Detection Framework is proposed for accurate splice detection with minimum computational cost. The proposed dual-branch framework extracts discriminative spatial features from a lightweight spatial branch. It uses original resolution compression data to extract double compression artifacts from the second branch, thereby making it 'information preserving.' Several CNNs are tested in combination with the proposed framework on a composite dataset of images from the proposed dataset and the CASIA v2.0 dataset. The best model accuracy of 0.9382 is achieved and compared with similar state-of-the-art methods, demonstrating the superiority of the proposed framework. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.06995 [pdf]

A Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler

Authors: Ankit Yadav, Dinesh Kumar Vishwakarma

Abstract: Image splice manipulation presents a severe challenge in today's society. With easy access to image manipulation tools, it is easier than ever to modify images that can mislead individuals, organizations or society. In this work, a novel, "Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler" has been proposed. It contains a unique… ▽ More Image splice manipulation presents a severe challenge in today's society. With easy access to image manipulation tools, it is easier than ever to modify images that can mislead individuals, organizations or society. In this work, a novel, "Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler" has been proposed. It contains a unique "visually attentive multi-domain feature extractor" (VA-MDFE) that extracts attentional features from the RGB, edge and depth domains. Next, a "visually attentive downsampler" (VA-DS) is responsible for fusing and downsampling the multi-domain features. Finally, a novel "visually attentive multi-receptive field upsampler" (VA-MRFU) module employs multiple receptive field-based convolutions to upsample attentional features by focussing on different information scales. Experimental results conducted on the public benchmark dataset CASIA v2.0 prove the potency of the proposed model. It comfortably beats the existing state-of-the-arts by achieving an IoU score of 0.851, pixel F1 score of 0.9195 and pixel AUC score of 0.8989. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2311.18676 [pdf, other]

DQSSA: A Quantum-Inspired Solution for Maximizing Influence in Online Social Networks (Student Abstract)

Authors: Aryaman Rao, Parth Singh, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract: Influence Maximization is the task of selecting optimal nodes maximising the influence spread in social networks. This study proposes a Discretized Quantum-based Salp Swarm Algorithm (DQSSA) for optimizing influence diffusion in social networks. By discretizing meta-heuristic algorithms and infusing them with quantum-inspired enhancements, we address issues like premature convergence and low effic… ▽ More Influence Maximization is the task of selecting optimal nodes maximising the influence spread in social networks. This study proposes a Discretized Quantum-based Salp Swarm Algorithm (DQSSA) for optimizing influence diffusion in social networks. By discretizing meta-heuristic algorithms and infusing them with quantum-inspired enhancements, we address issues like premature convergence and low efficacy. The proposed method, guided by quantum principles, offers a promising solution for Influence Maximisation. Experiments on four real-world datasets reveal DQSSA's superior performance as compared to established cutting-edge algorithms. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: AAAI Conference on Artificial Intelligence 2024

arXiv:2301.05220 [pdf, other]

Adversarial Adaptation for French Named Entity Recognition

Authors: Arjun Choudhry, Inder Khatri, Pankaj Gupta, Aaryan Gupta, Maxime Nicol, Marie-Jean Meurs, Dinesh Kumar Vishwakarma

Abstract: Named Entity Recognition (NER) is the task of identifying and classifying named entities in large-scale texts into predefined classes. NER in French and other relatively limited-resource languages cannot always benefit from approaches proposed for languages like English due to a dearth of large, robust datasets. In this paper, we present our work that aims to mitigate the effects of this dearth of… ▽ More Named Entity Recognition (NER) is the task of identifying and classifying named entities in large-scale texts into predefined classes. NER in French and other relatively limited-resource languages cannot always benefit from approaches proposed for languages like English due to a dearth of large, robust datasets. In this paper, we present our work that aims to mitigate the effects of this dearth of large, labeled datasets. We propose a Transformer-based NER approach for French, using adversarial adaptation to similar domain or general corpora to improve feature extraction and enable better generalization. Our approach allows learning better features using large-scale unlabeled corpora from the same domain or mixed domains to introduce more variations during training and reduce overfitting. Experimental results on three labeled datasets show that our adaptation framework outperforms the corresponding non-adaptive models for various combinations of Transformer models, source datasets, and target corpora. We also show that adversarial adaptation to large-scale unlabeled corpora can help mitigate the performance dip incurred on using Transformer models pre-trained on smaller corpora. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: Preprint version of short paper accepted for the ECIR 2023 conference

arXiv:2212.03692 [pdf, other]

Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Authors: Arjun Choudhry, Pankaj Gupta, Inder Khatri, Aaryan Gupta, Maxime Nicol, Marie-Jean Meurs, Dinesh Kumar Vishwakarma

Abstract: Named Entity Recognition (NER) involves the identification and classification of named entities in unstructured text into predefined classes. NER in languages with limited resources, like French, is still an open problem due to the lack of large, robust, labelled datasets. In this paper, we propose a transformer-based NER approach for French using adversarial adaptation to similar domain or genera… ▽ More Named Entity Recognition (NER) involves the identification and classification of named entities in unstructured text into predefined classes. NER in languages with limited resources, like French, is still an open problem due to the lack of large, robust, labelled datasets. In this paper, we propose a transformer-based NER approach for French using adversarial adaptation to similar domain or general corpora for improved feature extraction and better generalization. We evaluate our approach on three labelled datasets and show that our adaptation framework outperforms the corresponding non-adaptive models for various combinations of transformer models, source datasets and target corpora. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: Author version of Student Abstract to appear in AAAI 2023 - Student Abstract and Poster Program

arXiv:2211.17200 [pdf, other]

CKS: A Community-based K-shell Decomposition Approach using Community Bridge Nodes for Influence Maximization

Authors: Inder Khatri, Aaryan Gupta, Arjun Choudhry, Aryan Tyagi, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract: Social networks have enabled user-specific advertisements and recommendations on their platforms, which puts a significant focus on Influence Maximisation (IM) for target advertising and related tasks. The aim is to identify nodes in the network which can maximize the spread of information through a diffusion cascade. We propose a community structures-based approach that employs K-Shell algorithm… ▽ More Social networks have enabled user-specific advertisements and recommendations on their platforms, which puts a significant focus on Influence Maximisation (IM) for target advertising and related tasks. The aim is to identify nodes in the network which can maximize the spread of information through a diffusion cascade. We propose a community structures-based approach that employs K-Shell algorithm with community structures to generate a score for the connections between seed nodes and communities. Further, our approach employs entropy within communities to ensure the proper spread of information within the communities. We validate our approach on four publicly available networks and show its superiority to four state-of-the-art approaches while still being relatively efficient. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: Accepted in the Student Abstract & Poster Presentation Track at AAAI 2023

arXiv:2211.17108 [pdf, other]

An Emotion-guided Approach to Domain Adaptive Fake News Detection using Adversarial Learning

Authors: Arkajyoti Chakraborty, Inder Khatri, Arjun Choudhry, Pankaj Gupta, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract: Recent works on fake news detection have shown the efficacy of using emotions as a feature for improved performance. However, the cross-domain impact of emotion-guided features for fake news detection still remains an open problem. In this work, we propose an emotion-guided, domain-adaptive, multi-task approach for cross-domain fake news detection, proving the efficacy of emotion-guided models in… ▽ More Recent works on fake news detection have shown the efficacy of using emotions as a feature for improved performance. However, the cross-domain impact of emotion-guided features for fake news detection still remains an open problem. In this work, we propose an emotion-guided, domain-adaptive, multi-task approach for cross-domain fake news detection, proving the efficacy of emotion-guided models in cross-domain settings for various datasets. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: Accepted in the Student Abstract & Poster Presentation track at AAAI 2023. arXiv admin note: substantial text overlap with arXiv:2211.13718

arXiv:2211.13718 [pdf, other]

Emotion-guided Cross-domain Fake News Detection using Adversarial Domain Adaptation

Authors: Arjun Choudhry, Inder Khatri, Arkajyoti Chakraborty, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract: Recent works on fake news detection have shown the efficacy of using emotions as a feature or emotions-based features for improved performance. However, the impact of these emotion-guided features for fake news detection in cross-domain settings, where we face the problem of domain shift, is still largely unexplored. In this work, we evaluate the impact of emotion-guided features for cross-domain… ▽ More Recent works on fake news detection have shown the efficacy of using emotions as a feature or emotions-based features for improved performance. However, the impact of these emotion-guided features for fake news detection in cross-domain settings, where we face the problem of domain shift, is still largely unexplored. In this work, we evaluate the impact of emotion-guided features for cross-domain fake news detection, and further propose an emotion-guided, domain-adaptive approach using adversarial learning. We prove the efficacy of emotion-guided models in cross-domain settings for various combinations of source and target datasets from FakeNewsAMT, Celeb, Politifact and Gossipcop datasets. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted as a Short Paper in the 19th International Conference on Natural Language Processing (ICON) 2022

arXiv:2211.12374 [pdf, other]

An Emotion-Aware Multi-Task Approach to Fake News and Rumour Detection using Transfer Learning

Authors: Arjun Choudhry, Inder Khatri, Minni Jain, Dinesh Kumar Vishwakarma

Abstract: Social networking sites, blogs, and online articles are instant sources of news for internet users globally. However, in the absence of strict regulations mandating the genuineness of every text on social media, it is probable that some of these texts are fake news or rumours. Their deceptive nature and ability to propagate instantly can have an adverse effect on society. This necessitates the nee… ▽ More Social networking sites, blogs, and online articles are instant sources of news for internet users globally. However, in the absence of strict regulations mandating the genuineness of every text on social media, it is probable that some of these texts are fake news or rumours. Their deceptive nature and ability to propagate instantly can have an adverse effect on society. This necessitates the need for more effective detection of fake news and rumours on the web. In this work, we annotate four fake news detection and rumour detection datasets with their emotion class labels using transfer learning. We show the correlation between the legitimacy of a text with its intrinsic emotion for fake news and rumour detection, and prove that even within the same emotion class, fake and real news are often represented differently, which can be used for improved feature extraction. Based on this, we propose a multi-task framework for fake news and rumour detection, predicting both the emotion and legitimacy of the text. We train a variety of deep learning models in single-task and multi-task settings for a more comprehensive comparison. We further analyze the performance of our multi-task approach for fake news detection in cross-domain settings to verify its efficacy for better generalization across datasets, and to verify that emotions act as a domain-independent feature. Experimental results verify that our multi-task models consistently outperform their single-task counterparts in terms of accuracy, precision, recall, and F1 score, both for in-domain and cross-domain settings. We also qualitatively analyze the difference in performance in single-task and multi-task learning models. △ Less

Submitted 7 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted in IEEE Transaction on Computational Social Systems 18 pages 5 figures

arXiv:2211.09683 [pdf, other]

Influence Maximization in Social Networks using Discretized Harris Hawks Optimization Algorithm and Neighbour Scout Strategy

Authors: Inder Khatri, Arjun Choudhry, Aryaman Rao, Aryan Tyagi, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract: Influence Maximization (IM) is the task of determining k optimal influential nodes in a social network to maximize the influence spread using a propagation model. IM is a prominent problem for viral marketing, and helps significantly in social media advertising. However, develo** effective algorithms with minimal time complexity for real-world social networks still remains a challenge. While tra… ▽ More Influence Maximization (IM) is the task of determining k optimal influential nodes in a social network to maximize the influence spread using a propagation model. IM is a prominent problem for viral marketing, and helps significantly in social media advertising. However, develo** effective algorithms with minimal time complexity for real-world social networks still remains a challenge. While traditional heuristic approaches have been applied for IM, they often result in minimal performance gains over the computationally expensive Greedy-based and Reverse Influence Sampling-based approaches. In this paper, we propose the discretization of the nature-inspired Harris Hawks Optimisation meta-heuristic algorithm using community structures for optimal selection of seed nodes for influence spread. In addition to Harris Hawks intelligence, we employ a neighbour scout strategy algorithm to avoid blindness and enhance the searching ability of the hawks. Further, we use a candidate nodes-based random population initialization approach, and these candidate nodes aid in accelerating the convergence process for the entire populace. We evaluate the efficacy of our proposed DHHO approach on six social networks using the Independent Cascade model for information diffusion. We observe that DHHO is comparable or better than competing meta-heuristic approaches for Influence Maximization across five metrics, and performs noticeably better than competing heuristic approaches. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 24 pages, 7 figures

arXiv:2211.09657 [pdf, other]

A Spreader Ranking Algorithm for Extremely Low-budget Influence Maximization in Social Networks using Community Bridge Nodes

Authors: Aaryan Gupta, Inder Khatri, Arjun Choudhry, Pranav Chandhok, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract: In recent years, social networking platforms have gained significant popularity among the masses like connecting with people and propagating ones thoughts and opinions. This has opened the door to user-specific advertisements and recommendations on these platforms, bringing along a significant focus on Influence Maximisation (IM) on social networks due to its wide applicability in target advertisi… ▽ More In recent years, social networking platforms have gained significant popularity among the masses like connecting with people and propagating ones thoughts and opinions. This has opened the door to user-specific advertisements and recommendations on these platforms, bringing along a significant focus on Influence Maximisation (IM) on social networks due to its wide applicability in target advertising, viral marketing, and personalized recommendations. The aim of IM is to identify certain nodes in the network which can help maximize the spread of certain information through a diffusion cascade. While several works have been proposed for IM, most were inefficient in exploiting community structures to their full extent. In this work, we propose a community structures-based approach, which employs a K-Shell algorithm in order to generate a score for the connections between seed nodes and communities for low-budget scenarios. Further, our approach employs entropy within communities to ensure the proper spread of information within the communities. We choose the Independent Cascade (IC) model to simulate information spread and evaluate it on four evaluation metrics. We validate our proposed approach on eight publicly available networks and find that it significantly outperforms the baseline approaches on these metrics, while still being relatively efficient. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 21 pages, 7 figures

arXiv:2112.08611 [pdf]

Clickbait in YouTube Prevention, Detection and Analysis of the Bait using Ensemble Learning

Authors: Peya Mowar, Mini Jain, Ruchika Goel, Dinesh Kumar Vishwakarma

Abstract: Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the cu… ▽ More Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the current solutions are mostly centred around the study of video descriptors and other metadata such as likes, tags, comments, etc and fail to utilize the video content, both video and audio. Therefore, we introduce a novel model to detect clickbaits on YouTube that consider the relationship between video content and title or thumbnail. The proposed model consists of a stacking classifier framework composed of six base models (K Nearest Neighbours, Support Vector Machine, XGBoost, Naive Bayes, Logistic Regression, and Multilayer Perceptron) and a meta classifier. The developed clickbait detection model achieved a high accuracy of 92.89% for the novel BollyBAIT dataset and 95.38% for Misleading Video Dataset. Additionally, the stated classifier does not use meta features or other statistics dependent on user interaction with the video (the number of likes, followers, or comments) for classification, and thus, can be used to detect potential clickbait videos before they are uploaded, thereby preventing the nuisance of clickbaits altogether and improving the users streaming experience. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 26 pages, 16 figures

arXiv:2109.13476 [pdf]

Fake News Detection using Semi-Supervised Graph Convolutional Network

Authors: Priyanka Meel, Dinesh Kumar Vishwakarma

Abstract: Social media becomes the central way for people to obtain and utilise news, due to its rapidness and inexpensive value of data distribution. Though, such features of social media platforms also present it a root cause of fake news distribution, causing adverse consequences on both people and culture. Hence, detecting fake news has become a significant research interest for bringing feasible real t… ▽ More Social media becomes the central way for people to obtain and utilise news, due to its rapidness and inexpensive value of data distribution. Though, such features of social media platforms also present it a root cause of fake news distribution, causing adverse consequences on both people and culture. Hence, detecting fake news has become a significant research interest for bringing feasible real time solutions to the problem. Most current techniques of fake news disclosure are supervised, that need large cost in terms of time and effort to make a certainly interpreted dataset. The proposed framework concentrates on the text-based detection of fake news items while considering that only limited number of labels are available. Graphs are functioned extensively under several purposes of real-world problems on the strength of their property to structure things easily. Deep neural networks are used to generate great results within tasks that utilizes graph classification. The Graph Convolution Network works as a deep learning paradigm which works on graphs. Our proposed framework deals with limited amount of labelled data; we go for a semi-supervised learning method. We come up with a semi-supervised fake news detection technique based on GCN (Graph Convolutional Networks). The recommended architecture comprises of three basic components: collecting word embeddings from the news articles in datasets utilising GloVe, building similarity graph using Word Movers Distance (WMD) and finally applying Graph Convolution Network (GCN) for binary classification of news articles in semi-supervised paradigm. The implemented technique is validated on three different datasets by varying the volume of labelled data achieving 95.27 % highest accuracy on Real or Fake dataset. Comparison with other contemporary techniques also reinforced the supremacy of the proposed framework. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: 25 pages, 7 figures

arXiv:2109.13063 [pdf]

An Automated Multi-Web Platform Voting Framework to Predict Misleading Information Proliferated during COVID-19 Outbreak using Ensemble Method

Authors: Deepika Varshney, Dinesh Kumar Vishwakarma

Abstract: Spreading of misleading information on social web platforms has fuelled huge panic and confusion among the public regarding the Corona disease, the detection of which is of paramount importance. To address this issue, in this paper, we have developed an automated system that can collect and validate the fact from multi web-platform to decide the credibility of the content. To identify the credibil… ▽ More Spreading of misleading information on social web platforms has fuelled huge panic and confusion among the public regarding the Corona disease, the detection of which is of paramount importance. To address this issue, in this paper, we have developed an automated system that can collect and validate the fact from multi web-platform to decide the credibility of the content. To identify the credibility of the posted claim, probable instances/clues(titles) of news information are first gathered from various web platforms. Later, the crucial set of features is retrieved that further feeds into the ensemble-based machine learning model to classify the news as misleading or real. The four sets of features based on the content, linguistics/semantic cues, similarity, and sentiments gathered from web-platforms and voting are applied to validate the news. Finally, the combined voting decides the support given to a specific claim. In addition to the validation part, a unique source platform is designed for collecting data/facts from three web platforms (Twitter, Facebook, Google) based on certain queries/words. This unique platform can also help researchers build datasets and gather useful/efficient clues from various web platforms. It has been observed that our proposed intelligent strategy gives promising results and quite effective in predicting misleading information. The proposed work provides practical implications for the policy makers and health practitioners that could be useful in protecting the world from misleading information proliferation during this pandemic. △ Less

Submitted 19 September, 2021; originally announced September 2021.

Comments: 22 pages, 06 figures

arXiv:2109.12547 [pdf]

Multi-modal Fusion using Fine-tuned Self-attention and Transfer Learning for Veracity Analysis of Web Information

Authors: Priyanka Meel, Dinesh Kumar Vishwakarma

Abstract: The nuisance of misinformation and fake news has escalated many folds since the advent of online social networks. Human consciousness and decision-making capabilities are negatively influenced by manipulated, fabricated, biased or unverified news posts. Therefore, there is a high demand for designing veracity analysis systems to detect fake information contents in multiple data modalities. In an a… ▽ More The nuisance of misinformation and fake news has escalated many folds since the advent of online social networks. Human consciousness and decision-making capabilities are negatively influenced by manipulated, fabricated, biased or unverified news posts. Therefore, there is a high demand for designing veracity analysis systems to detect fake information contents in multiple data modalities. In an attempt to find a sophisticated solution to this critical issue, we proposed an architecture to consider both the textual and visual attributes of the data. After the data pre-processing is done, text and image features are extracted from the training data using separate deep learning models. Feature extraction from text is done using BERT and ALBERT language models that leverage the benefits of bidirectional training of transformers using a deep self-attention mechanism. The Inception-ResNet-v2 deep neural network model is employed for image data to perform the task. The proposed framework focused on two independent multi-modal fusion architectures of BERT and Inception-ResNet-v2 as well as ALBERT and Inception-ResNet-v2. Multi-modal fusion of textual and visual branches is extensively experimented and analysed using concatenation of feature vectors and weighted averaging of probabilities named as Early Fusion and Late Fusion respectively. Three publicly available broadly accepted datasets All Data, Weibo and MediaEval 2016 that incorporates English news articles, Chinese news articles, and Tweets correspondingly are used so that our designed framework's outcomes can be properly tested and compared with previous notable work in the domain. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 31 pages, 12 figures

arXiv:2109.09929 [pdf]

A Unified Approach of Detecting Misleading Images via Tracing its Instances on Web and Analysing its Past Context for the Verification of Content

Authors: Deepika Varshney, Dinesh Kumar Vishwakarma

Abstract: The verification of multimedia content over social media is one of the challenging and crucial issues in the current scenario and gaining prominence in an age where user-generated content and online social web platforms are the leading sources in sha** and propagating news stories. As these sources allow users to share their opinions without restriction, opportunistic users often post misleading… ▽ More The verification of multimedia content over social media is one of the challenging and crucial issues in the current scenario and gaining prominence in an age where user-generated content and online social web platforms are the leading sources in sha** and propagating news stories. As these sources allow users to share their opinions without restriction, opportunistic users often post misleading/ unreliable content on social media such as Twitter, Facebook, etc. At present, to lure users towards the news story, the text is often attached with some multimedia content (images/videos/audios). Verifying these contents to maintain the credibility and reliability of social media information is of paramount importance. Motivated by this, we proposed a generalized system that supports the automatic classification of images into credible or misleading. In this paper, we investigated machine learning-based as well as deep learning-based approaches utilized to verify misleading multimedia content, where the available image traces are used to identify the credibility of the content. The experiment is performed on the real-world dataset (Media-eval-2015 dataset) collected from Twitter. It also demonstrates the efficiency of our proposed approach and features using both Machine and Deep Learning Model (Bi-directional LSTM). The experiment result reveals that the Microsoft bings image search engine is quite effective in retrieving titles and performs better than our study's Google image search engine. It also shows that gathering clues from attached multimedia content (image) is more effective than detecting only posted content-based features. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 22 pages, 8 figures

arXiv:2109.06488 [pdf]

Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers

Authors: Dinesh Kumar Vishwakarma, Mayank **dal, Ayush Mittal, Aditya Sharma

Abstract: Automated movie genre classification has emerged as an active and essential area of research and exploration. Short duration movie trailers provide useful insights about the movie as video content consists of the cognitive and the affective level features. Previous approaches were focused upon either cognitive or affective content analysis. In this paper, we propose a novel multi-modality: situati… ▽ More Automated movie genre classification has emerged as an active and essential area of research and exploration. Short duration movie trailers provide useful insights about the movie as video content consists of the cognitive and the affective level features. Previous approaches were focused upon either cognitive or affective content analysis. In this paper, we propose a novel multi-modality: situation, dialogue, and metadata-based movie genre classification framework that takes both cognition and affect-based features into consideration. A pre-features fusion-based framework that takes into account: situation-based features from a regular snapshot of a trailer that includes nouns and verbs providing the useful affect-based map** with the corresponding genres, dialogue (speech) based feature from audio, metadata which together provides the relevant information for cognitive and affect based video analysis. We also develop the English movie trailer dataset (EMTD), which contains 2000 Hollywood movie trailers belonging to five popular genres: Action, Romance, Comedy, Horror, and Science Fiction, and perform cross-validation on the standard LMTD-9 dataset for validating the proposed framework. The results demonstrate that the proposed methodology for movie genre classification has performed excellently as depicted by the F1 scores, precision, recall, and area under the precision-recall curves. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: 21 pages, 7 figures

arXiv:2105.05708 [pdf, other]

Deep and Shallow Covariance Feature Quantization for 3D Facial Expression Recognition

Authors: Walid Hariri, Nadir Farah, Dinesh Kumar Vishwakarma

Abstract: Facial expressions recognition (FER) of 3D face scans has received a significant amount of attention in recent years. Most of the facial expression recognition methods have been proposed using mainly 2D images. These methods suffer from several issues like illumination changes and pose variations. Moreover, 2D map** from 3D images may lack some geometric and topological characteristics of the fa… ▽ More Facial expressions recognition (FER) of 3D face scans has received a significant amount of attention in recent years. Most of the facial expression recognition methods have been proposed using mainly 2D images. These methods suffer from several issues like illumination changes and pose variations. Moreover, 2D map** from 3D images may lack some geometric and topological characteristics of the face. Hence, to overcome this problem, a multi-modal 2D + 3D feature-based method is proposed. We extract shallow features from the 3D images, and deep features using Convolutional Neural Networks (CNN) from the transformed 2D images. Combining these features into a compact representation uses covariance matrices as descriptors for both features instead of single-handedly descriptors. A covariance matrix learning is used as a manifold layer to reduce the deep covariance matrices size and enhance their discrimination power while preserving their manifold structure. We then use the Bag-of-Features (BoF) paradigm to quantize the covariance matrices after flattening. Accordingly, we obtained two codebooks using shallow and deep features. The global codebook is then used to feed an SVM classifier. High classification performances have been achieved on the BU-3DFE and Bosphorus datasets compared to the state-of-the-art methods. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2012.13318 [pdf]

Person Re-Identification using Deep Learning Networks: A Systematic Review

Authors: Ankit Yadav, Dinesh Kumar Vishwakarma

Abstract: Person re-identification has received a lot of attention from the research community in recent times. Due to its vital role in security based applications, person re-identification lies at the heart of research relevant to tracking robberies, preventing terrorist attacks and other security critical events. While the last decade has seen tremendous growth in re-id approaches, very little review lit… ▽ More Person re-identification has received a lot of attention from the research community in recent times. Due to its vital role in security based applications, person re-identification lies at the heart of research relevant to tracking robberies, preventing terrorist attacks and other security critical events. While the last decade has seen tremendous growth in re-id approaches, very little review literature exists to comprehend and summarize this progress. This review deals with the latest state-of-the-art deep learning based approaches for person re-identification. While the few existing re-id review works have analysed re-id techniques from a singular aspect, this review evaluates numerous re-id techniques from multiple deep learning aspects such as deep architecture types, common Re-Id challenges (variation in pose, lightning, view, scale, partial or complete occlusion, background clutter), multi-modal Re-Id, cross-domain Re-Id challenges, metric learning approaches and video Re-Id contributions. This review also includes several re-id benchmarks collected over the years, describing their characteristics, specifications and top re-id results obtained on them. The inclusion of the latest deep re-id works makes this a significant contribution to the re-id literature. Lastly, the conclusion and future directions are included. △ Less

Submitted 24 December, 2020; originally announced December 2020.

Comments: 34 pages, 15 figures

arXiv:2012.08256 [pdf]

doi 10.1145/3517139

A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis

Authors: Ashima Yadav, Dinesh Kumar Vishwakarma

Abstract: Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulte… ▽ More Multimodal sentiment analysis has attracted increasing attention with broad application prospects. The existing methods focuses on single modality, which fails to capture the social media content for multiple modalities. Moreover, in multi-modal learning, most of the works have focused on simply combining the two modalities, without exploring the complicated correlations between them. This resulted in dissatisfying performance for multimodal sentiment classification. Motivated by the status quo, we propose a Deep Multi-Level Attentive network, which exploits the correlation between image and text modalities to improve multimodal learning. Specifically, we generate the bi-attentive visual map along the spatial and channel dimensions to magnify CNNs representation power. Then we model the correlation between the image regions and semantics of the word by extracting the textual features related to the bi-attentive visual features by applying semantic attention. Finally, self-attention is employed to automatically fetch the sentiment-rich multimodal features for the classification. We conduct extensive evaluations on four real-world datasets, namely, MVSA-Single, MVSA-Multiple, Flickr, and Getty Images, which verifies the superiority of our method. △ Less

Submitted 15 December, 2020; originally announced December 2020.

Comments: 11 pages, 7 figures

Journal ref: ACM Transactions on Multimedia Computing, Communications, and Applications, 2022

arXiv:2011.10358 [pdf]

A Deep Language-independent Network to analyze the impact of COVID-19 on the World via Sentiment Analysis

Authors: Ashima Yadav, Dinesh Kumar Vishwakarma

Abstract: Towards the end of 2019, Wuhan experienced an outbreak of novel coronavirus, which soon spread all over the world, resulting in a deadly pandemic that infected millions of people around the globe. The government and public health agencies followed many strategies to counter the fatal virus. However, the virus severely affected the social and economic lives of the people. In this paper, we extract… ▽ More Towards the end of 2019, Wuhan experienced an outbreak of novel coronavirus, which soon spread all over the world, resulting in a deadly pandemic that infected millions of people around the globe. The government and public health agencies followed many strategies to counter the fatal virus. However, the virus severely affected the social and economic lives of the people. In this paper, we extract and study the opinion of people from the top five worst affected countries by the virus, namely USA, Brazil, India, Russia, and South Africa. We propose a deep language-independent Multilevel Attention-based Conv-BiGRU network (MACBiG-Net), which includes embedding layer, word-level encoded attention, and sentence-level encoded attention mechanism to extract the positive, negative, and neutral sentiments. The embedding layer encodes the sentence sequence into a real-valued vector. The word-level and sentence-level encoding is performed by a 1D Conv-BiGRU based mechanism, followed by word-level and sentence-level attention, respectively. We further develop a COVID-19 Sentiment Dataset by crawling the tweets from Twitter. Extensive experiments on our proposed dataset demonstrate the effectiveness of the proposed MACBiG-Net. Also, attention-weights visualization and in-depth results analysis shows that the proposed network has effectively captured the sentiments of the people. △ Less

Submitted 20 November, 2020; originally announced November 2020.

arXiv:1912.03632 [pdf]

doi 10.1109/TIP.2020.2965299

View-invariant Deep Architecture for Human Action Recognition using late fusion

Authors: Chhavi Dhiman, Dinesh Kumar Vishwakarma

Abstract: Human action Recognition for unknown views is a challenging task. We propose a view-invariant deep human action recognition framework, which is a novel integration of two important action cues: motion and shape temporal dynamics (STD). The motion stream encapsulates the motion content of action as RGB Dynamic Images (RGB-DIs) which are processed by the fine-tuned InceptionV3 model. The STD stream… ▽ More Human action Recognition for unknown views is a challenging task. We propose a view-invariant deep human action recognition framework, which is a novel integration of two important action cues: motion and shape temporal dynamics (STD). The motion stream encapsulates the motion content of action as RGB Dynamic Images (RGB-DIs) which are processed by the fine-tuned InceptionV3 model. The STD stream learns long-term view-invariant shape dynamics of action using human pose model (HPM) based view-invariant features mined from structural similarity index matrix (SSIM) based key depth human pose frames. To predict the score of the test sample, three types of late fusion (maximum, average and product) techniques are applied on individual stream scores. To validate the performance of the proposed novel framework the experiments are performed using both cross subject and cross-view validation schemes on three publically available benchmarks- NUCLA multi-view dataset, UWA3D-II Activity dataset and NTU RGB-D Activity dataset. Our algorithm outperforms with existing state-of-the-arts significantly that is reported in terms of accuracy, receiver operating characteristic (ROC) curve and area under the curve (AUC). △ Less

Submitted 8 December, 2019; originally announced December 2019.

Comments: 10 pages, 7 figures

Report number: 8960517

Journal ref: 2019

arXiv:1912.00576 [pdf]

Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues

Authors: Chhavi Dhiman, Dinesh Kumar Vishwakarma, Paras Aggarwal

Abstract: There exist a wide range of intra class variations of the same actions and inter class similarity among the actions, at the same time, which makes the action recognition in videos very challenging. In this paper, we present a novel skeleton-based part-wise Spatiotemporal CNN RIAC Network-based 3D human action recognition framework to visualise the action dynamics in part wise manner and utilise ea… ▽ More There exist a wide range of intra class variations of the same actions and inter class similarity among the actions, at the same time, which makes the action recognition in videos very challenging. In this paper, we present a novel skeleton-based part-wise Spatiotemporal CNN RIAC Network-based 3D human action recognition framework to visualise the action dynamics in part wise manner and utilise each part for action recognition by applying weighted late fusion mechanism. Part wise skeleton based motion dynamics helps to highlight local features of the skeleton which is performed by partitioning the complete skeleton in five parts such as Head to Spine, Left Leg, Right Leg, Left Hand, Right Hand. The RIAFNet architecture is greatly inspired by the InceptionV4 architecture which unified the ResNet and Inception based Spatio-temporal feature representation concept and achieving the highest top-1 accuracy till date. To extract and learn salient features for action recognition, attention driven residues are used which enhance the performance of residual components for effective 3D skeleton-based Spatio-temporal action representation. The robustness of the proposed framework is evaluated by performing extensive experiments on three challenging datasets such as UT Kinect Action 3D, Florence 3D action Dataset, and MSR Daily Action3D datasets, which consistently demonstrate the superiority of our method △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: 20 pages, 9 figures

arXiv:1903.04090 [pdf]

A Hybrid Framework for Action Recognition in Low-Quality Video Sequences

Authors: Tej Singh, Dinesh Kumar Vishwakarma

Abstract: Vision-based activity recognition is essential for security, monitoring and surveillance applications. Further, real-time analysis having low-quality video and contain less information about surrounding due to poor illumination, and occlusions. Therefore, it needs a more robust and integrated model for low quality and night security operations. In this context, we proposed a hybrid model for illum… ▽ More Vision-based activity recognition is essential for security, monitoring and surveillance applications. Further, real-time analysis having low-quality video and contain less information about surrounding due to poor illumination, and occlusions. Therefore, it needs a more robust and integrated model for low quality and night security operations. In this context, we proposed a hybrid model for illumination invariant human activity recognition based on sub-image histogram equalization enhancement and k-key pose human silhouettes. This feature vector gives good average recognition accuracy on three low exposure video sequences subset of original actions video datasets. Finally, the performance of the proposed approach is tested over three manually downgraded low qualities Weizmann action, KTH, and Ballet Movement dataset. This model outperformed on low exposure videos over existing technique and achieved comparable classification accuracy to similar state-of-the-art methods. △ Less

Submitted 10 March, 2019; originally announced March 2019.

Comments: 13 pages, 9 Figures

arXiv:1805.07720 [pdf]

doi 10.1109/TMSCS.2018.2870592

A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models

Authors: Dinesh Kumar Vishwakarma, Sakshi Upadhyay

Abstract: Person re-identification is being widely used in the forensic, and security and surveillance system, but person re-identification is a challenging task in real life scenario. Hence, in this work, a new feature descriptor model has been proposed using a multilayer framework of Gaussian distribution model on pixel features, which include color moments, color space values and Schmid filter responses.… ▽ More Person re-identification is being widely used in the forensic, and security and surveillance system, but person re-identification is a challenging task in real life scenario. Hence, in this work, a new feature descriptor model has been proposed using a multilayer framework of Gaussian distribution model on pixel features, which include color moments, color space values and Schmid filter responses. An image of a person usually consists of distinct body regions, usually with differentiable clothing followed by local colors and texture patterns. Thus, the image is evaluated locally by dividing the image into overlap** regions. Each region is further fragmented into a set of local Gaussians on small patches. A global Gaussian encodes, these local Gaussians for each region creating a multi-level structure. Hence, the global picture of a person is described by local level information present in it, which is often ignored. Also, we have analyzed the efficiency of earlier metric learning methods on this descriptor. The performance of the descriptor is evaluated on four public available challenging datasets and the highest accuracy achieved on these datasets are compared with similar state-of-the-arts, which demonstrate the superior performance. △ Less

Submitted 20 May, 2018; originally announced May 2018.

Comments: 9 pages

Report number: 8469037

Journal ref: IEEE Transactions on Multi-Scale Computing Systems 4 (2018) 513 - 521

arXiv:1611.06683 [pdf]

Covariate conscious approach for Gait recognition based upon Zernike moment invariants

Authors: Himanshu Aggarwal, Dinesh K. Vishwakarma

Abstract: Gait recognition i.e. identification of an individual from his/her walking pattern is an emerging field. While existing gait recognition techniques perform satisfactorily in normal walking conditions, there performance tend to suffer drastically with variations in clothing and carrying conditions. In this work, we propose a novel covariate cognizant framework to deal with the presence of such cova… ▽ More Gait recognition i.e. identification of an individual from his/her walking pattern is an emerging field. While existing gait recognition techniques perform satisfactorily in normal walking conditions, there performance tend to suffer drastically with variations in clothing and carrying conditions. In this work, we propose a novel covariate cognizant framework to deal with the presence of such covariates. We describe gait motion by forming a single 2D spatio-temporal template from video sequence, called Average Energy Silhouette image (AESI). Zernike moment invariants (ZMIs) are then computed to screen the parts of AESI infected with covariates. Following this, features are extracted from Spatial Distribution of Oriented Gradients (SDOGs) and novel Mean of Directional Pixels (MDPs) methods. The obtained features are fused together to form the final well-endowed feature set. Experimental evaluation of the proposed framework on three publicly available datasets i.e. CASIA dataset B, OU-ISIR Treadmill dataset B and USF Human-ID challenge dataset with recently published gait recognition approaches, prove its superior performance. △ Less

Submitted 21 November, 2016; originally announced November 2016.

Comments: 11 pages

Showing 1–27 of 27 results for author: Vishwakarma, D K