Search | arXiv e-print repository

Bohr radius for invariant families of bounded analytic functions and certain Integral transforms

Authors: Molla Basir Ahamed, Partha Pratim Roy, Sabir Ahammed

Abstract: In this paper, we first obtain a refined Bohr radius for invariant families of bounded analytic functions on unit disk $ \mathbb{D} $. Then, we obtain Bohr inequality for certain integral transforms, namely Fourier (discrete) and Laplace (discrete) transforms of bounded analytic functions $ f(z)=\sum_{n=0}^{\infty}a_nz^n $, in a simply connected domain \begin{align*} Ω_γ=\biggl\{z\in\mathbb{C}:… ▽ More In this paper, we first obtain a refined Bohr radius for invariant families of bounded analytic functions on unit disk $ \mathbb{D} $. Then, we obtain Bohr inequality for certain integral transforms, namely Fourier (discrete) and Laplace (discrete) transforms of bounded analytic functions $ f(z)=\sum_{n=0}^{\infty}a_nz^n $, in a simply connected domain \begin{align*} Ω_γ=\biggl\{z\in\mathbb{C}: \bigg|z+\dfracγ{1-γ}\bigg|<\dfrac{1}{1-γ}\;\mbox{for}\; 0\leq γ<1\biggr\}, \end{align*} where $ Ω_0=\mathbb{D} $. These results generalize some existing results. We also show that a better estimate can be obtained in radius and inequality can be shown sharp for Laplace transform of $ f $. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 18 pages, 1 figure

MSC Class: 30C45; 30C50; 30C65; 30C80

arXiv:2405.01895 [pdf, other]

The Bohr inequality on a simply connected domain and its applications

Authors: Sabir Ahammed, Molla Basir Ahamed, Partha Pratim Roy

Abstract: In this article, we first establish a generalized Bohr inequality and examine its sharpness for a class of analytic functions $f$ in a simply connected domain $Ω_γ,$ where $0\leq γ<1$ with a sequence $\{\varphi_n(r) \}^{\infty}_{n=0}$ of non-negative continuous functions defined on $[0,1)$ such that the series $\sum_{n=0}^{\infty}\varphi_n(r)$ converges locally uniformly on $[0,1)$. Our results re… ▽ More In this article, we first establish a generalized Bohr inequality and examine its sharpness for a class of analytic functions $f$ in a simply connected domain $Ω_γ,$ where $0\leq γ<1$ with a sequence $\{\varphi_n(r) \}^{\infty}_{n=0}$ of non-negative continuous functions defined on $[0,1)$ such that the series $\sum_{n=0}^{\infty}\varphi_n(r)$ converges locally uniformly on $[0,1)$. Our results represent twofold generalizations corresponding to those obtained for the classes $\mathcal{B}(\mathbb{D})$ and $\mathcal{B}(Ω_γ)$, where \begin{align*} Ω_γ:=\biggl\{z\in \mathbb{C}: \bigg|z+\dfracγ{1-γ}\bigg|<\dfrac{1}{1-γ}\biggr\}. \end{align*} As a convolution counterpart, we determine the Bohr radius for hypergeometric function on $ Ω_γ $. Lastly, we establish a generalized Bohr inequality and its sharpness for the class of $ K $-quasiconformal, sense-preserving harmonic maps of the form $f=h+\overline{g}$ in $Ω_γ.$ △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2402.15689 [pdf, ps, other]

Revisiting Bohr Inequalities with Analytic and Harmonic Map**s on unit disk

Authors: Molla Basir Ahamed, Partha Pratim Roy

Abstract: In this paper, we study some improved and refined versions of the classical Bohr inequality applicable to the class $\mathcal{B}$, which consists of self-analytic map**s defined on the unit disk $\mathbb{D}$. First, we improve the Bohr inequality for the class $\mathcal{B}$ of analytic self-maps, incorporating the area measurements of sub-disks $\mathbb{D}_r$ of $\mathbb{D}$. Secondly, we establ… ▽ More In this paper, we study some improved and refined versions of the classical Bohr inequality applicable to the class $\mathcal{B}$, which consists of self-analytic map**s defined on the unit disk $\mathbb{D}$. First, we improve the Bohr inequality for the class $\mathcal{B}$ of analytic self-maps, incorporating the area measurements of sub-disks $\mathbb{D}_r$ of $\mathbb{D}$. Secondly, we establish a sharp inequality with suitable setting as an improved version of the classic Bohr inequality. Then we obtain a sharp refined Bohr inequality in which the coefficients $|a_k|$ $(k=0, 1, 2, 3)$ in the majorant series $M_f(r)$ of $f$ are replaced by $|f^{(k)}(z)|/k!$. Finally, for a certain class $\mathcal{P}^0_{\mathcal{H}}(M)$ of harmonic map**s of the form $f=h+\overline{g}$, we generalize the Bohr inequality incorporating a sequence $\{\varphi_n(r)\}_{n=0}^{\infty}$ of continuous functions of $r$ in $[0, 1)$ and give some applications. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 25 pages, 0 figures

MSC Class: Primary 30A10; 30H05; 30C35; 30C50 Secondary 30C45

arXiv:2402.11808 [pdf, other]

Bohr inequalities via proper combinations for a certain class of close-to-convex harmonic map**s

Authors: Molla Basir Ahamed, Partha Pratim Roy

Abstract: Let $ \mathcal{H}(Ω) $ be the class of complex-valued functions harmonic in $ Ω\subset\mathbb{C} $ and each $f=h+\overline{g}\in \mathcal{H}(Ω)$, where $ h $ and $ g $ are analytic. In the study of Bohr phenomenon for certain class of harmonic map**s, it is to find a constant $ r_f\in (0, 1) $ such that the inequality \begin{align*} M_f(r):=r+\sum_{n=2}^{\infty}\left(|a_n|+|b_n|\right)r^n\le… ▽ More Let $ \mathcal{H}(Ω) $ be the class of complex-valued functions harmonic in $ Ω\subset\mathbb{C} $ and each $f=h+\overline{g}\in \mathcal{H}(Ω)$, where $ h $ and $ g $ are analytic. In the study of Bohr phenomenon for certain class of harmonic map**s, it is to find a constant $ r_f\in (0, 1) $ such that the inequality \begin{align*} M_f(r):=r+\sum_{n=2}^{\infty}\left(|a_n|+|b_n|\right)r^n\leq d\left(f(0), \partialΩ\right) \;\mbox{for}\;|z|=r\leq r_f, \end{align*} where $ d\left(f(0), \partialΩ\right) $ is the Euclidean distance between $ f(0) $ and the boundary of $ Ω:=f(\mathbb{D}) $. The largest such radius $ r_f $ is called the Bohr radius and the inequality $ M_f(r)\leq d\left(f(0), \partialΩ\right) $ is called the Bohr inequality for the class $ \mathcal{H}(Ω) $. In this paper, we study Bohr phenomenon for the class of close-to-convex harmonic map**s establishing several inequalities. All the results are proved to be sharp. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 26 pages, 9 figures

MSC Class: Primary 30C45; 30C50; 30C80

arXiv:2401.16878 [pdf, other]

Enhancing EEG Signal-Based Emotion Recognition with Synthetic Data: Diffusion Model Approach

Authors: Gourav Siddhad, Masakazu Iwamura, Partha Pratim Roy

Abstract: Emotions are crucial in human life, influencing perceptions, relationships, behaviour, and choices. Emotion recognition using Electroencephalography (EEG) in the Brain-Computer Interface (BCI) domain presents significant challenges, particularly the need for extensive datasets. This study aims to generate synthetic EEG samples that are similar to real samples but are distinct by augmenting noise t… ▽ More Emotions are crucial in human life, influencing perceptions, relationships, behaviour, and choices. Emotion recognition using Electroencephalography (EEG) in the Brain-Computer Interface (BCI) domain presents significant challenges, particularly the need for extensive datasets. This study aims to generate synthetic EEG samples that are similar to real samples but are distinct by augmenting noise to a conditional denoising diffusion probabilistic model, thus addressing the prevalent issue of data scarcity in EEG research. The proposed method is tested on the DEAP dataset, showcasing a 1.94% improvement in classification performance when using synthetic data. This is higher compared to the traditional GAN-based and DDPM-based approaches. The proposed diffusion-based approach for EEG data generation appears promising in refining the accuracy of emotion recognition systems and marks a notable contribution to EEG-based emotion recognition. Our research further evaluates the effectiveness of state-of-the-art classifiers on EEG data, employing both real and synthetic data with varying noise levels. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 8 Pages, 3 Figures, 2 Tables

arXiv:2311.11250 [pdf, other]

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications

Authors: Sudhanshu Kumar, Partha Pratim Roy, Debi Prosad Dogra, Byung-Gyu Kim

Abstract: Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to… ▽ More Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing} △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2310.16527 [pdf, other]

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents

Authors: Tofik Ali, Partha Pratim Roy

Abstract: This paper introduces a deep learning model tailored for document information analysis, emphasizing document classification, entity relation extraction, and document visual question answering. The proposed model leverages transformer-based models to encode all the information present in a document image, including textual, visual, and layout information. The model is pre-trained and subsequently f… ▽ More This paper introduces a deep learning model tailored for document information analysis, emphasizing document classification, entity relation extraction, and document visual question answering. The proposed model leverages transformer-based models to encode all the information present in a document image, including textual, visual, and layout information. The model is pre-trained and subsequently fine-tuned for various document image analysis tasks. The proposed model incorporates three additional tasks during the pre-training phase, including reading order identification of different layout segments in a document image, layout segments categorization as per PubLayNet, and generation of the text sequence within a given layout segment (text block). The model also incorporates a collective pre-training scheme where losses of all the tasks under consideration, including pre-training and fine-tuning tasks with all datasets, are considered. Additional encoder and decoder blocks are added to the RoBERTa network to generate results for all tasks. The proposed model achieved impressive results across all tasks, with an accuracy of 95.87% on the RVL-CDIP dataset for document classification, F1 scores of 0.9306, 0.9804, 0.9794, and 0.8742 on the FUNSD, CORD, SROIE, and Kleister-NDA datasets respectively for entity relation extraction, and an ANLS score of 0.8468 on the DocVQA dataset for visual question answering. The results highlight the effectiveness of the proposed model in understanding and interpreting complex document layouts and content, making it a promising tool for document analysis tasks. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2308.02515 [pdf, other]

Feature Reweighting for EEG-based Motor Imagery Classification

Authors: Taveena Lotey, Prateek Keserwani, Debi Prosad Dogra, Partha Pratim Roy

Abstract: Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low… ▽ More Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low signal-to-noise ratio, non-stationarity, non-linearity, and high complexity of EEG signals. The features computed by CNN-based networks on the highly noisy MI-EEG signals contain irrelevant information. Subsequently, the feature maps of the CNN-based network computed from the noisy and irrelevant features contain irrelevant information. Thus, many non-contributing features often mislead the neural network training and degrade the classification performance. Hence, a novel feature reweighting approach is proposed to address this issue. The proposed method gives a noise reduction mechanism named feature reweighting module that suppresses irrelevant temporal and channel feature maps. The feature reweighting module of the proposed method generates scores that reweight the feature maps to reduce the impact of irrelevant information. Experimental results show that the proposed method significantly improved the classification of MI-EEG signals of Physionet EEG-MMIDB and BCI Competition IV 2a datasets by a margin of 9.34% and 3.82%, respectively, compared to the state-of-the-art methods. △ Less

Submitted 29 July, 2023; originally announced August 2023.

arXiv:2308.01548 [pdf, ps, other]

Hankel and Toeplitz determinants of logarithmic coefficients of Inverse functions for certain classes of univalent functions

Authors: Sanju Mandal, Partha Pratim Roy, Molla Basir Ahamed

Abstract: The Hankel and Toeplitz determinants $H_{2,1}(F_{f^{-1}}/2)$ and $T_{2,1}(F_{f^{-1}}/2)$ are defined as: \begin{align*} H_{2,1}(F_{f^{-1}}/2):= \begin{vmatrix} Γ_1 & Γ_2 Γ_2 & Γ_3 \end{vmatrix} \;\;\mbox{and} \;\; T_{2,1}(F_{f^{-1}}/2):= \begin{vmatrix} Γ_1 & Γ_2 Γ_2 & Γ_1 \end{vmatrix} \end{align*} where $Γ_1, Γ_2,$ and $Γ_3$ are the first, second and third logarithmic coefficie… ▽ More The Hankel and Toeplitz determinants $H_{2,1}(F_{f^{-1}}/2)$ and $T_{2,1}(F_{f^{-1}}/2)$ are defined as: \begin{align*} H_{2,1}(F_{f^{-1}}/2):= \begin{vmatrix} Γ_1 & Γ_2 Γ_2 & Γ_3 \end{vmatrix} \;\;\mbox{and} \;\; T_{2,1}(F_{f^{-1}}/2):= \begin{vmatrix} Γ_1 & Γ_2 Γ_2 & Γ_1 \end{vmatrix} \end{align*} where $Γ_1, Γ_2,$ and $Γ_3$ are the first, second and third logarithmic coefficients of inverse functions belonging to the class $\mathcal{S}$ of normalized univalent functions. In this article, we establish sharp inequalities $|H_{2,1}(F_{f^{-1}}/2)|\leq 1/4$, $|H_{2,1}(F_{f^{-1}}/2)| \leq 1/36$, $|T_{2,1}(F_{f^{-1}}/2)|\leq 5/16$ and $|T_{2,1}(F_{f^{-1}}/2)|\leq 145/2304$ for the logarithmic coefficients of inverse functions for the classes starlike functions and convex functions with respect to symmetric points. In addition, our findings are substantiated further through the incorporation of illustrative examples, which support the strict inequality and lend credence to our conclusions. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 15 pages. arXiv admin note: substantial text overlap with arXiv:2305.12500, arXiv:2307.14365

MSC Class: Primary 30A10; 30H05; 30C35; Secondary 30C45

arXiv:2307.15991 [pdf, other]

Separate Scene Text Detector for Unseen Scripts is Not All You Need

Authors: Prateek Keserwani, Taveena Lotey, Rohit Keshari, Partha Pratim Roy

Abstract: Text detection in the wild is a well-known problem that becomes more challenging while handling multiple scripts. In the last decade, some scripts have gained the attention of the research community and achieved good detection performance. However, many scripts are low-resourced for training deep learning-based scene text detectors. It raises a critical question: Is there a need for separate train… ▽ More Text detection in the wild is a well-known problem that becomes more challenging while handling multiple scripts. In the last decade, some scripts have gained the attention of the research community and achieved good detection performance. However, many scripts are low-resourced for training deep learning-based scene text detectors. It raises a critical question: Is there a need for separate training for new scripts? It is an unexplored query in the field of scene text detection. This paper acknowledges this problem and proposes a solution to detect scripts not present during training. In this work, the analysis has been performed to understand cross-script text detection, i.e., trained on one and tested on another. We found that the identical nature of text annotation (word-level/line-level) is crucial for better cross-script text detection. The different nature of text annotation between scripts degrades cross-script text detection performance. Additionally, for unseen script detection, the proposed solution utilizes vector embedding to map the stroke information of text corresponding to the script category. The proposed method is validated with a well-known multi-lingual scene text dataset under a zero-shot setting. The results show the potential of the proposed method for unseen script detection in natural images. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.02746 [pdf, ps, other]

The third Hankel determinant for inverse coefficients of starlike functions of order 1/2

Authors: Molla Basir Ahamed, Partha Pratim Roy

Abstract: The sharp bound for the third Hankel determinant for the coefficients of the inverse function of starlike function of order $1/2$ is obtained. In light of this, we can deduce that the functionals $|H_3(1)(f)|$ and $|H_3(1)(f^{-1})|$ exhibit invariance on the class $\mathcal{S}^*(1/2)$. The sharp bound for the third Hankel determinant for the coefficients of the inverse function of starlike function of order $1/2$ is obtained. In light of this, we can deduce that the functionals $|H_3(1)(f)|$ and $|H_3(1)(f^{-1})|$ exhibit invariance on the class $\mathcal{S}^*(1/2)$. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 9 pages

MSC Class: Primary 30A10; 30H05; 30C35; Secondary 30C45

arXiv:2305.12500 [pdf, ps, other]

Sharp bounds for second Hankel determinant of logarithmic coefficients for certain classes of univalent functions

Authors: Sanju Mandal, Partha Pratim Roy, Molla Basir Ahamed

Abstract: The Hankel determinant $H_{2,2}(F_{f}/2)$ is defined as: \begin{align*} H_{2,2}(F_{f}/2):= \begin{vmatrix} γ_2 & γ_3 γ_3 & γ_4 \end{vmatrix}, \end{align*} where $γ_2, γ_3,$ and $γ_4$ are the second, third, and fourth logarithmic coefficients of functions belonging to the class $\mathcal{S}$ of normalized univalent functions. In this article, we establish sharp inequalities… ▽ More The Hankel determinant $H_{2,2}(F_{f}/2)$ is defined as: \begin{align*} H_{2,2}(F_{f}/2):= \begin{vmatrix} γ_2 & γ_3 γ_3 & γ_4 \end{vmatrix}, \end{align*} where $γ_2, γ_3,$ and $γ_4$ are the second, third, and fourth logarithmic coefficients of functions belonging to the class $\mathcal{S}$ of normalized univalent functions. In this article, we establish sharp inequalities $|H_{2,2}(F_{f}/2)|\leq (1272 + 113\sqrt{678})/32856$ and $|H_{2,2}(F_{f}/2)| \leq 13/1080$ for the logarithmic coefficients of starlike and convex functions with respect to symmetric points. Moreover, we provide examples that demonstrate the strict inequality holds. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 10 pages, 0 figures

MSC Class: Primary 30A10; 30H05; 30C35; Secondary 30C45

arXiv:2204.09019 [pdf, other]

Hybrid Transformer Network for Different Horizons-based Enriched Wind Speed Forecasting

Authors: Dr. M. Madhiarasan, Prof. Partha Pratim Roy

Abstract: Highly accurate different horizon-based wind speed forecasting facilitates a better modern power system. This paper proposed a novel astute hybrid wind speed forecasting model and applied it to different horizons. The proposed hybrid forecasting model decomposes the original wind speed data into IMFs (Intrinsic Mode Function) using Improved Complete Ensemble Empirical Mode Decomposition with Adapt… ▽ More Highly accurate different horizon-based wind speed forecasting facilitates a better modern power system. This paper proposed a novel astute hybrid wind speed forecasting model and applied it to different horizons. The proposed hybrid forecasting model decomposes the original wind speed data into IMFs (Intrinsic Mode Function) using Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN). We fed the obtained subseries from ICEEMDAN to the transformer network. Each transformer network computes the forecast subseries and then passes to the fusion phase. Get the primary wind speed forecasting from the fusion of individual transformer network forecast subseries. Estimate the residual error values and predict errors using a multilayer perceptron neural network. The forecast error is added to the primary forecast wind speed to leverage the high accuracy of wind speed forecasting. Comparative analysis with real-time Kethanur, India wind farm dataset results reveals the proposed ICEEMDAN-TNF-MLPN-RECS hybrid model's superior performance with MAE=1.7096*10^-07, MAPE=2.8416*10^-06, MRE=2.8416*10^-08, MSE=5.0206*10^-14, and RMSE=2.2407*10^-07 for case study 1 and MAE=6.1565*10^-07, MAPE=9.5005*10^-06, MRE=9.5005*10^-08, MSE=8.9289*10^-13, and RMSE=9.4493*10^-07 for case study 2 enriched wind speed forecasting than state-of-the-art methods and reduces the burden on the power system engineer. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: Communicated to IEEE Transactions on Power Systems status Under Review

arXiv:2204.03328 [pdf, other]

A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

Authors: Dr. M. Madhiarasan, Prof. Partha Pratim Roy

Abstract: A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution, moda… ▽ More A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution, modalities, and datasets affect the performance a lot. Many researchers have been striving to carry out generic real-time SLR models. This review paper facilitates a comprehensive overview of SLR and discusses the needs, challenges, and problems associated with SLR. We study related works about manual and non-manual, various modalities, and datasets. Research progress and existing state-of-the-art SLR models over the past decade have been reviewed. Finally, we find the research gap and limitations in this domain and suggest future directions. This review paper will be helpful for readers and researchers to get complete guidance about SLR and the progressive design of the state-of-the-art SLR model △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: communicated to the Computer Science Review (Elsevier) status With Editor

arXiv:2202.05170 [pdf, ps, other]

doi 10.1016/j.bspc.2023.105488

Efficacy of Transformer Networks for Classification of Raw EEG Data

Authors: Gourav Siddhad, Anmol Gupta, Debi Prosad Dogra, Partha Pratim Roy

Abstract: With the unprecedented success of transformer networks in natural language processing (NLP), recently, they have been successfully adapted to areas like computer vision, generative adversarial networks (GAN), and reinforcement learning. Classifying electroencephalogram (EEG) data has been challenging and researchers have been overly dependent on pre-processing and hand-crafted feature extraction.… ▽ More With the unprecedented success of transformer networks in natural language processing (NLP), recently, they have been successfully adapted to areas like computer vision, generative adversarial networks (GAN), and reinforcement learning. Classifying electroencephalogram (EEG) data has been challenging and researchers have been overly dependent on pre-processing and hand-crafted feature extraction. Despite having achieved automated feature extraction in several other domains, deep learning has not yet been accomplished for EEG. In this paper, the efficacy of the transformer network for the classification of raw EEG data (cleaned and pre-processed) is explored. The performance of transformer networks was evaluated on a local (age and gender data) and a public dataset (STEW). First, a classifier using a transformer network is built to classify the age and gender of a person with raw resting-state EEG data. Second, the classifier is tuned for mental workload classification with open access raw multi-tasking mental workload EEG data (STEW). The network achieves an accuracy comparable to state-of-the-art accuracy on both the local (Age and Gender dataset; 94.53% (gender) and 87.79% (age)) and the public (STEW dataset; 95.28% (two workload levels) and 88.72% (three workload levels)) dataset. The accuracy values have been achieved using raw EEG data without feature extraction. Results indicate that the transformer-based deep learning models can successfully abate the need for heavy feature-extraction of EEG data for successful classification. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Journal ref: Biomedical Signal Processing and Control, Vol 87, 2023

arXiv:2106.15989 [pdf, other]

Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions

Authors: Mizuki Maruyama, Shuvozit Ghose, Katsufumi Inoue, Partha Pratim Roy, Masakazu Iwamura, Michifumi Yoshioka

Abstract: In recent years, Word-level Sign Language Recognition (WSLR) research has gained popularity in the computer vision community, and thus various approaches have been proposed. Among these approaches, the method using I3D network achieves the highest recognition accuracy on large public datasets for WSLR. However, the method with I3D only utilizes appearance information of the upper body of the signe… ▽ More In recent years, Word-level Sign Language Recognition (WSLR) research has gained popularity in the computer vision community, and thus various approaches have been proposed. Among these approaches, the method using I3D network achieves the highest recognition accuracy on large public datasets for WSLR. However, the method with I3D only utilizes appearance information of the upper body of the signers to recognize sign language words. On the other hand, in WSLR, the information of local regions, such as the hand shape and facial expression, and the positional relationship among the body and both hands are important. Thus in this work, we utilized local region images of both hands and face, along with skeletal information to capture local information and the positions of both hands relative to the body, respectively. In other words, we propose a novel multi-stream WSLR framework, in which a stream with local region images and a stream with skeletal information are introduced by extending I3D network to improve the recognition accuracy of WSLR. From the experimental results on WLASL dataset, it is evident that the proposed method has achieved about 15% improvement in the Top-1 accuracy than the existing conventional methods. △ Less

Submitted 30 June, 2021; originally announced June 2021.

arXiv:2010.12669 [pdf, other]

Position and Rotation Invariant Sign Language Recognition from 3D Kinect Data with Recurrent Neural Networks

Authors: Prasun Roy, Saumik Bhattacharya, Partha Pratim Roy, Umapada Pal

Abstract: Sign language is a gesture-based symbolic communication medium among speech and hearing impaired people. It also serves as a communication bridge between non-impaired and impaired populations. Unfortunately, in most situations, a non-impaired person is not well conversant in such symbolic languages restricting the natural information flow between these two categories. Therefore, an automated trans… ▽ More Sign language is a gesture-based symbolic communication medium among speech and hearing impaired people. It also serves as a communication bridge between non-impaired and impaired populations. Unfortunately, in most situations, a non-impaired person is not well conversant in such symbolic languages restricting the natural information flow between these two categories. Therefore, an automated translation mechanism that seamlessly translates sign language into natural language can be highly advantageous. In this paper, we attempt to perform recognition of 30 basic Indian sign gestures. Gestures are represented as temporal sequences of 3D maps (RGB + depth), each consisting of 3D coordinates of 20 body joints captured by the Kinect sensor. A recurrent neural network (RNN) is employed as the classifier. To improve the classifier's performance, we use geometric transformation for the alignment correction of depth frames. In our experiments, the model achieves 84.81% accuracy. △ Less

Submitted 14 March, 2023; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: 10 pages

arXiv:2010.06200 [pdf, other]

End-to-end Triplet Loss based Emotion Embedding System for Speech Emotion Recognition

Authors: Puneet Kumar, Sidharth Jain, Balasubramanian Raman, Partha Pratim Roy, Masakazu Iwamura

Abstract: In this paper, an end-to-end neural embedding system based on triplet loss and residual learning has been proposed for speech emotion recognition. The proposed system learns the embeddings from the emotional information of the speech utterances. The learned embeddings are used to recognize the emotions portrayed by given speech samples of various lengths. The proposed system implements Residual Ne… ▽ More In this paper, an end-to-end neural embedding system based on triplet loss and residual learning has been proposed for speech emotion recognition. The proposed system learns the embeddings from the emotional information of the speech utterances. The learned embeddings are used to recognize the emotions portrayed by given speech samples of various lengths. The proposed system implements Residual Neural Network architecture. It is trained using softmax pre-training and triplet loss function. The weights between the fully connected and embedding layers of the trained network are used to calculate the embedding values. The embedding representations of various emotions are mapped onto a hyperplane, and the angles among them are computed using the cosine similarity. These angles are utilized to classify a new speech sample into its appropriate emotion class. The proposed system has demonstrated 91.67% and 64.44% accuracy while recognizing emotions for RAVDESS and IEMOCAP dataset, respectively. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: Accepted in ICPR 2020

arXiv:2007.07075 [pdf, other]

UDBNET: Unsupervised Document Binarization Network via Adversarial Game

Authors: Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

Abstract: Degraded document image binarization is one of the most challenging tasks in the domain of document image analysis. In this paper, we present a novel approach towards document image binarization by introducing three-player min-max adversarial game. We train the network in an unsupervised setup by assuming that we do not have any paired-training data. In our approach, an Adversarial Texture Augment… ▽ More Degraded document image binarization is one of the most challenging tasks in the domain of document image analysis. In this paper, we present a novel approach towards document image binarization by introducing three-player min-max adversarial game. We train the network in an unsupervised setup by assuming that we do not have any paired-training data. In our approach, an Adversarial Texture Augmentation Network (ATANet) first superimposes the texture of a degraded reference image over a clean image. Later, the clean image along with its generated degraded version constitute the pseudo paired-data which is used to train the Unsupervised Document Binarization Network (UDBNet). Following this approach, we have enlarged the document binarization datasets as it generates multiple images having same content feature but different textual feature. These generated noisy images are then fed into the UDBNet to get back the clean version. The joint discriminator which is the third-player of our three-player min-max adversarial game tries to couple both the ATANet and UDBNet. The three-player min-max adversarial game stops, when the distributions modelled by the ATANet and the UDBNet align to the same joint distribution over time. Thus, the joint discriminator enforces the UDBNet to perform better on real degraded image. The experimental results indicate the superior performance of the proposed model over existing state-of-the-art algorithm on widely used DIBCO datasets. The source code of the proposed system is publicly available at https://github.com/VIROBO-15/UDBNET. △ Less

Submitted 27 October, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: Accepted in ICPR 2020

arXiv:2007.05764 [pdf, ps, other]

Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis

Authors: Ankit Sharma, Puneet Kumar, Vikas Maddukuri, Nagasai Madamshettib, Kishore KG, Sahit Sai Sriram Kavurub, Balasubramanian Raman, Partha Pratim Roy

Abstract: The performance of text-to-speech (TTS) systems heavily depends on spectrogram to waveform generation, also known as the speech reconstruction phase. The time required for the same is known as synthesis delay. In this paper, an approach to reduce speech synthesis delay has been proposed. It aims to enhance the TTS systems for real-time applications such as digital assistants, mobile phones, embedd… ▽ More The performance of text-to-speech (TTS) systems heavily depends on spectrogram to waveform generation, also known as the speech reconstruction phase. The time required for the same is known as synthesis delay. In this paper, an approach to reduce speech synthesis delay has been proposed. It aims to enhance the TTS systems for real-time applications such as digital assistants, mobile phones, embedded devices, etc. The proposed approach applies Fast Griffin Lim Algorithm (FGLA) instead Griffin Lim algorithm (GLA) as vocoder in the speech synthesis phase. GLA and FGLA are both iterative, but the convergence rate of FGLA is faster than GLA. The proposed approach is tested on LJSpeech, Blizzard and Tatoeba datasets and the results for FGLA are compared against GLA and neural Generative Adversarial Network (GAN) based vocoder. The performance is evaluated based on synthesis delay and speech quality. A 36.58% reduction in speech synthesis delay has been observed. The quality of the output speech has improved, which is advocated by higher Mean opinion scores (MOS) and faster convergence with FGLA as opposed to GLA. △ Less

Submitted 11 July, 2020; originally announced July 2020.

Comments: Accepted for publication in Springer Multimedia Tools and Applications Journal

arXiv:2004.08141 [pdf, other]

Modeling Extent-of-Texture Information for Ground Terrain Recognition

Authors: Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

Abstract: Ground Terrain Recognition is a difficult task as the context information varies significantly over the regions of a ground terrain image. In this paper, we propose a novel approach towards ground-terrain recognition via modeling the Extent-of-Texture information to establish a balance between the order-less texture component and ordered-spatial information locally. At first, the proposed method u… ▽ More Ground Terrain Recognition is a difficult task as the context information varies significantly over the regions of a ground terrain image. In this paper, we propose a novel approach towards ground-terrain recognition via modeling the Extent-of-Texture information to establish a balance between the order-less texture component and ordered-spatial information locally. At first, the proposed method uses a CNN backbone feature extractor network to capture meaningful information of a ground terrain image, and model the extent of texture and shape information locally. Then, the order-less texture information and ordered shape information are encoded in a patch-wise manner, which is utilized by intra-domain message passing module to make every patch aware of each other for rich feature learning. Next, the Extent-of-Texture (EoT) Guided Inter-domain Message Passing module combines the extent of texture and shape information with the encoded texture and shape information in a patch-wise fashion for sharing knowledge to balance out the order-less texture information with ordered shape information. Further, Bilinear model generates a pairwise correlation between the order-less texture information and ordered shape information. Finally, the ground-terrain image classification is performed by a fully connected layer. The experimental results indicate superior performance of the proposed model over existing state-of-the-art techniques on publicly available datasets like DTD, MINC and GTOS-mobile. △ Less

Submitted 27 October, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

Comments: Accepted in ICPR 2020

arXiv:2003.05626 [pdf, other]

Understanding Crowd Flow Movements Using Active-Langevin Model

Authors: Shreetam Behera, Debi Prosad Dogra, Malay Kumar Bandyopadhyay, Partha Pratim Roy

Abstract: Crowd flow describes the elementary group behavior of crowds. Understanding the dynamics behind these movements can help to identify various abnormalities in crowds. However, develo** a crowd model describing these flows is a challenging task. In this paper, a physics-based model is proposed to describe the movements in dense crowds. The crowd model is based on active Langevin equation where the… ▽ More Crowd flow describes the elementary group behavior of crowds. Understanding the dynamics behind these movements can help to identify various abnormalities in crowds. However, develo** a crowd model describing these flows is a challenging task. In this paper, a physics-based model is proposed to describe the movements in dense crowds. The crowd model is based on active Langevin equation where the motion points are assumed to be similar to active colloidal particles in fluids. The model is further augmented with computer-vision techniques to segment both linear and non-linear motion flows in a dense crowd. The evaluation of the active Langevin equation-based crowd segmentation has been done on publicly available crowd videos and on our own videos. The proposed method is able to segment the flow with lesser optical flow error and better accuracy in comparison to existing state-of-the-art methods. △ Less

Submitted 18 August, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

arXiv:1904.07233 [pdf, other]

Estimation of Linear Motion in Dense Crowd Videos using Langevin Model

Authors: Shreetam Behera, Debi Prosad Dogra, Malay Kumar Bandyopadhyay, Partha Pratim Roy

Abstract: Crowd gatherings at social and cultural events are increasing in leaps and bounds with the increase in population. Surveillance through computer vision and expert decision making systems can help to understand the crowd phenomena at large gatherings. Understanding crowd phenomena can be helpful in early identification of unwanted incidents and their prevention. Motion flow is one of the important… ▽ More Crowd gatherings at social and cultural events are increasing in leaps and bounds with the increase in population. Surveillance through computer vision and expert decision making systems can help to understand the crowd phenomena at large gatherings. Understanding crowd phenomena can be helpful in early identification of unwanted incidents and their prevention. Motion flow is one of the important crowd phenomena that can be instrumental in describing the crowd behavior. Flows can be useful in understanding instabilities in the crowd. However, extracting motion flows is a challenging task due to randomness in crowd movement and limitations of the sensing device. Moreover, low-level features such as optical flow can be misleading if the randomness is high. In this paper, we propose a new model based on Langevin equation to analyze the linear dominant flows in videos of densely crowded scenarios. We assume a force model with three components, namely external force, confinement/drift force, and disturbance force. These forces are found to be sufficient to describe the linear or near-linear motion in dense crowd videos. The method is significantly faster as compared to existing popular crowd segmentation methods. The evaluation of the proposed model has been carried out on publicly available datasets as well as using our dataset. It has been observed that the proposed method is able to estimate and segment the linear flows in the dense crowd with better accuracy as compared to state-of-the-art techniques with substantial decrease in the computational overhead. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1902.04955 [pdf, other]

Can We Automate Diagrammatic Reasoning?

Authors: Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy, Dilip K. Prasad

Abstract: Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on vis… ▽ More Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1902.03514 [pdf, other]

doi 10.1109/ICASSP.2019.8683737

Facial Micro-Expression Spotting and Recognition using Time Contrasted Feature with Visual Memory

Authors: Sauradip Nag, Ayan Kumar Bhunia, Aishik Konwer, Partha Pratim Roy

Abstract: Facial micro-expressions are sudden involuntary minute muscle movements which reveal true emotions that people try to conceal. Spotting a micro-expression and recognizing it is a major challenge owing to its short duration and intensity. Many works pursued traditional and deep learning based approaches to solve this issue but compromised on learning low-level features and higher accuracy due to un… ▽ More Facial micro-expressions are sudden involuntary minute muscle movements which reveal true emotions that people try to conceal. Spotting a micro-expression and recognizing it is a major challenge owing to its short duration and intensity. Many works pursued traditional and deep learning based approaches to solve this issue but compromised on learning low-level features and higher accuracy due to unavailability of datasets. This motivated us to propose a novel joint architecture of spatial and temporal network which extracts time-contrasted features from the feature maps to contrast out micro-expression from rapid muscle movements. The usage of time contrasted features greatly improved the spotting of micro-expression from inconspicuous facial movements. Also, we include a memory module to predict the class and intensity of the micro-expression across the temporal frames of the micro-expression clip. Our method achieves superior performance in comparison to other conventional approaches on CASMEII dataset. △ Less

Submitted 18 April, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

Comments: International Conference on Acoustics, Speech, and Signal Processing(ICASSP), 2019

arXiv:1901.08292 [pdf, other]

doi 10.1145/3417989

Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey

Authors: Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy

Abstract: Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical a… ▽ More Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical anomaly detection is learning, we emphasize more on learning methods applied on video scenes. We then summarize the important contributions made during last six years on anomaly detection primarily focusing on features, underlying techniques, applied scenarios and types of anomalies using single static camera. Finally, we discuss the challenges in the computer vision related anomaly detection techniques and some of the important future possibilities. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Journal ref: ACM Computing Surveys (2020), 6(53):Article 119, 2020

arXiv:1812.07203 [pdf, other]

Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE

Authors: Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy, Adway Mitra

Abstract: Classifying time series data using neural networks is a challenging problem when the length of the data varies. Video object trajectories, which are key to many of the visual surveillance applications, are often found to be of varying length. If such trajectories are used to understand the behavior (normal or anomalous) of moving objects, they need to be represented correctly. In this paper, we pr… ▽ More Classifying time series data using neural networks is a challenging problem when the length of the data varies. Video object trajectories, which are key to many of the visual surveillance applications, are often found to be of varying length. If such trajectories are used to understand the behavior (normal or anomalous) of moving objects, they need to be represented correctly. In this paper, we propose video object trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and Variational Autoencoder (VAE) architecture. First, we introduce a high level representation of object trajectories using color gradient form. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporal Unknown Incremental Clustering (TUIC), has been applied for trajectory class labeling. Anomalous trajectories are separated using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been used for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify some of the important traffic anomalies such as vehicles not following lane driving, sudden speed variations, abrupt termination of vehicle movement, and vehicles moving in wrong directions. The proposed method is able to detect above anomalies at higher accuracy as compared to existing anomaly detection methods. △ Less

Submitted 18 December, 2018; originally announced December 2018.

Comments: First version submitted in an Journal on 8-10-2018

arXiv:1811.10804 [pdf, other]

Movie Recommendation System using Sentiment Analysis from Microblogging Data

Authors: Sudhanshu Kumar, Shirsendu Sukanta Halder, Kanjar De, Partha Pratim Roy

Abstract: Recommendation systems are important intelligent systems that play a vital role in providing selective information to users. Traditional approaches in recommendation systems include collaborative filtering and content-based filtering. However, these approaches have certain limitations like the necessity of prior user history and habits for performing the task of recommendation. In order to reduce… ▽ More Recommendation systems are important intelligent systems that play a vital role in providing selective information to users. Traditional approaches in recommendation systems include collaborative filtering and content-based filtering. However, these approaches have certain limitations like the necessity of prior user history and habits for performing the task of recommendation. In order to reduce the effect of such dependencies, this paper proposes a hybrid recommendation system which combines the collaborative filtering, content-based filtering with sentiment analysis of movie tweets. The movie tweets have been collected from microblogging websites to understand the current trends and user response of the movie. Experiments conducted on public database produce promising results. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: 19 pages, 7 tables, 5 figures

arXiv:1811.10801 [pdf, other]

Perceptual Conditional Generative Adversarial Networks for End-to-End Image Colourization

Authors: Shirsendu Sukanta Halder, Kanjar De, Partha Pratim Roy

Abstract: Colours are everywhere. They embody a significant part of human visual perception. In this paper, we explore the paradigm of hallucinating colours from a given gray-scale image. The problem of colourization has been dealt in previous literature but mostly in a supervised manner involving user-interference. With the emergence of Deep Learning methods numerous tasks related to computer vision and pa… ▽ More Colours are everywhere. They embody a significant part of human visual perception. In this paper, we explore the paradigm of hallucinating colours from a given gray-scale image. The problem of colourization has been dealt in previous literature but mostly in a supervised manner involving user-interference. With the emergence of Deep Learning methods numerous tasks related to computer vision and pattern recognition have been automatized and carried in an end-to-end fashion due to the availability of large data-sets and high-power computing systems. We investigate and build upon the recent success of Conditional Generative Adversarial Networks (cGANs) for Image-to-Image translations. In addition to using the training scheme in the basic cGAN, we propose an encoder-decoder generator network which utilizes the class-specific cross-entropy loss as well as the perceptual loss in addition to the original objective function of cGAN. We train our model on a large-scale dataset and present illustrative qualitative and quantitative analysis of our results. Our results vividly display the versatility and proficiency of our methods through life-like colourization outcomes. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: 16 pages, 8 figures, 3 tables

arXiv:1811.01401 [pdf, other]

doi 10.1109/WACV.2019.00070

Texture Synthesis Guided Deep Hashing for Texture Image Retrieval

Authors: Ayan Kumar Bhunia, Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy

Abstract: With the large-scale explosion of images and videos over the internet, efficient hashing methods have been developed to facilitate memory and time efficient retrieval of similar images. However, none of the existing works uses hashing to address texture image retrieval mostly because of the lack of sufficiently large texture image databases. Our work addresses this problem by develo** a novel de… ▽ More With the large-scale explosion of images and videos over the internet, efficient hashing methods have been developed to facilitate memory and time efficient retrieval of similar images. However, none of the existing works uses hashing to address texture image retrieval mostly because of the lack of sufficiently large texture image databases. Our work addresses this problem by develo** a novel deep learning architecture that generates binary hash codes for input texture images. For this, we first pre-train a Texture Synthesis Network (TSN) which takes a texture patch as input and outputs an enlarged view of the texture by injecting newer texture content. Thus it signifies that the TSN encodes the learnt texture specific information in its intermediate layers. In the next stage, a second network gathers the multi-scale feature representations from the TSN's intermediate layers using channel-wise attention, combines them in a progressive manner to a dense continuous representation which is finally converted into a binary hash code with the help of individual and pairwise label information. The new enlarged texture patches also help in data augmentation to alleviate the problem of insufficient texture data and are used to train the second stage of the network. Experiments on three public texture image retrieval datasets indicate the superiority of our texture synthesis guided hashing approach over current state-of-the-art methods. △ Less

Submitted 5 June, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Video Presentation: https://www.youtube.com/watch?v=tXaXTGhzaJo

arXiv:1811.01396 [pdf, other]

Handwriting Recognition in Low-resource Scripts using Adversarial Learning

Authors: Ayan Kumar Bhunia, Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy

Abstract: Handwritten Word Recognition and Spotting is a challenging field dealing with handwritten text possessing irregular and complex shapes. The design of deep neural network models makes it necessary to extend training datasets in order to introduce variations and increase the number of samples; word-retrieval is therefore very difficult in low-resource scripts. Much of the existing literature compris… ▽ More Handwritten Word Recognition and Spotting is a challenging field dealing with handwritten text possessing irregular and complex shapes. The design of deep neural network models makes it necessary to extend training datasets in order to introduce variations and increase the number of samples; word-retrieval is therefore very difficult in low-resource scripts. Much of the existing literature comprises preprocessing strategies which are seldom sufficient to cover all possible variations. We propose the Adversarial Feature Deformation Module (AFDM) that learns ways to elastically warp extracted features in a scalable manner. The AFDM is inserted between intermediate layers and trained alternatively with the original framework, boosting its capability to better learn highly informative features rather than trivial ones. We test our meta-framework, which is built on top of popular word-spotting and word-recognition frameworks and enhanced by the AFDM, not only on extensive Latin word datasets but also sparser Indic scripts. We record results for varying training data sizes, and observe that our enhanced network generalizes much better in the low-data regime; the overall word-error rates and mAP scores are observed to improve as well. △ Less

Submitted 25 February, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

arXiv:1811.01395 [pdf, other]

doi 10.1016/j.patcog.2019.106965

A Deep One-Shot Network for Query-based Logo Retrieval

Authors: Ayan Kumar Bhunia, Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal

Abstract: Logo detection in real-world scene images is an important problem with applications in advertisement and marketing. Existing general-purpose object detection methods require large training data with annotations for every logo class. These methods do not satisfy the incremental demand of logo classes necessary for practical deployment since it is practically impossible to have such annotated data f… ▽ More Logo detection in real-world scene images is an important problem with applications in advertisement and marketing. Existing general-purpose object detection methods require large training data with annotations for every logo class. These methods do not satisfy the incremental demand of logo classes necessary for practical deployment since it is practically impossible to have such annotated data for new unseen logo. In this work, we develop an easy-to-implement query-based logo detection and localization system by employing a one-shot learning technique. Given an image of a query logo, our model searches for it within a given target image and predicts the possible location of the logo by estimating a binary segmentation mask. The proposed model consists of a conditional branch and a segmentation branch. The former gives a conditional latent representation of the given query logo which is combined with feature maps of the segmentation branch at multiple scales in order to find the matching position of the query logo in a target image, should it be present. Feature matching between the latent query representation and multi-scale feature maps of segmentation branch using simple concatenation operation followed by 1x1 convolution layer makes our model scale-invariant. Despite its simplicity, our query-based logo retrieval framework achieved superior performance in FlickrLogos-32 and TopLogos-10 dataset over different existing baselines. △ Less

Submitted 13 July, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

Comments: Accepted in Pattern Recognition, Elsevier(2019)

arXiv:1811.00201 [pdf, other]

Cogni-Net: Cognitive Feature Learning through Deep Visual Perception

Authors: Pranay Mukherjee, Abhirup Das, Ayan Kumar Bhunia, Partha Pratim Roy

Abstract: Can we ask computers to recognize what we see from brain signals alone? Our paper seeks to utilize the knowledge learnt in the visual domain by popular pre-trained vision models and use it to teach a recurrent model being trained on brain signals to learn a discriminative manifold of the human brain's cognition of different visual object categories in response to perceived visual cues. For this we… ▽ More Can we ask computers to recognize what we see from brain signals alone? Our paper seeks to utilize the knowledge learnt in the visual domain by popular pre-trained vision models and use it to teach a recurrent model being trained on brain signals to learn a discriminative manifold of the human brain's cognition of different visual object categories in response to perceived visual cues. For this we make use of brain EEG signals triggered from visual stimuli like images and leverage the natural synchronization between images and their corresponding brain signals to learn a novel representation of the cognitive feature space. The concept of knowledge distillation has been used here for training the deep cognition model, CogniNet\footnote{The source code of the proposed system is publicly available at {https://www.github.com/53X/CogniNET}}, by employing a student-teacher learning technique in order to bridge the process of inter-modal knowledge transfer. The proposed novel architecture obtains state-of-the-art results, significantly surpassing other existing models. The experiments performed by us also suggest that if visual stimuli information like brain EEG signals can be gathered on a large scale, then that would help to obtain a better understanding of the largely unexplored domain of human brain cognition. △ Less

Submitted 1 May, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

Comments: IEEE International Conference on Image Processing (ICIP), 2019

arXiv:1810.13054 [pdf, other]

doi 10.1109/ICASSP.2019.8683761

User Constrained Thumbnail Generation using Adaptive Convolutions

Authors: Perla Sai Raj Kishore, Ayan Kumar Bhunia, Shuvozit Ghose, Partha Pratim Roy

Abstract: Thumbnails are widely used all over the world as a preview for digital images. In this work we propose a deep neural framework to generate thumbnails of any size and aspect ratio, even for unseen values during training, with high accuracy and precision. We use Global Context Aggregation (GCA) and a modified Region Proposal Network (RPN) with adaptive convolutions to generate thumbnails in real tim… ▽ More Thumbnails are widely used all over the world as a preview for digital images. In this work we propose a deep neural framework to generate thumbnails of any size and aspect ratio, even for unseen values during training, with high accuracy and precision. We use Global Context Aggregation (GCA) and a modified Region Proposal Network (RPN) with adaptive convolutions to generate thumbnails in real time. GCA is used to selectively attend and aggregate the global context information from the entire image while the RPN is used to predict candidate bounding boxes for the thumbnail image. Adaptive convolution eliminates the problem of generating thumbnails of various aspect ratios by using filter weights dynamically generated from the aspect ratio information. The experimental results indicate the superior performance of the proposed model over existing state-of-the-art techniques. △ Less

Submitted 18 April, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

Comments: International Conference on Acoustics, Speech, and Signal Processing(ICASSP), 2019

arXiv:1810.11120 [pdf, other]

Improving Document Binarization via Adversarial Noise-Texture Augmentation

Authors: Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, Partha Pratim Roy

Abstract: Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Network that transfers the texture element of a degraded reference document image to a clean binary image. In this way, the network creates multiple… ▽ More Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Network that transfers the texture element of a degraded reference document image to a clean binary image. In this way, the network creates multiple versions of the same textual content with various noisy textures, thus enlarging the available document binarization datasets. At last, the newly generated images are passed through a Binarization network to get back the clean version. By jointly training the two networks we can increase the adversarial robustness of our system. Also, it is noteworthy that our model can learn from unpaired data. Experimental results suggest that the proposed method achieves superior performance over widely used DIBCO datasets. △ Less

Submitted 1 May, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: IEEE International Conference on Image Processing (ICIP), 2019. The full source code of the proposed system is publicly available at https://github.com/ankanbhunia/AdverseBiNet

arXiv:1810.10581 [pdf, other]

doi 10.1016/j.displa.2019.03.001

Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures

Authors: Abhik Singla, Partha Pratim Roy, Debi Prosad Dogra

Abstract: Designing of touchless user interface is gaining popularity in various contexts. Using such interfaces, users can interact with electronic devices even when the hands are dirty or non-conductive. Also, user with partial physical disability can interact with electronic devices using such systems. Research in this direction has got major boost because of the emergence of low-cost sensors such as Lea… ▽ More Designing of touchless user interface is gaining popularity in various contexts. Using such interfaces, users can interact with electronic devices even when the hands are dirty or non-conductive. Also, user with partial physical disability can interact with electronic devices using such systems. Research in this direction has got major boost because of the emergence of low-cost sensors such as Leap Motion, Kinect or RealSense devices. In this paper, we propose a Leap Motion controller-based methodology to facilitate rendering of 2D and 3D shapes on display devices. The proposed method tracks finger movements while users perform natural gestures within the field of view of the sensor. In the next phase, trajectories are analyzed to extract extended Npen++ features in 3D. These features represent finger movements during the gestures and they are fed to unidirectional left-to-right Hidden Markov Model (HMM) for training. A one-to-one map** between gestures and shapes is proposed. Finally, shapes corresponding to these gestures are rendered over the display using MuPad interface. We have created a dataset of 5400 samples recorded by 10 volunteers. Our dataset contains 18 geometric and 18 non-geometric shapes such as "circle", "rectangle", "flower", "cone", "sphere" etc. The proposed methodology achieves an accuracy of 92.87% when evaluated using 5-fold cross validation method. Our experiments revel that the extended 3D features perform better than existing 3D features in the context of shape representation and classification. The method can be used for develo** useful HCI applications for smart display devices. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: Submitted to Elsevier Displays Journal, 32 pages, 18 figures, 7 tables

arXiv:1809.03016 [pdf, other]

doi 10.1016/j.eswa.2019.06.034

Fingertip Detection and Tracking for Recognition of Air-Writing in Videos

Authors: Sohom Mukherjee, Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy

Abstract: Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due… ▽ More Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due to small dimension of the fingertip. Moreover, the initialization and termination of mid-air finger writing is also challenging due to the absence of any standard delimiting criterion. To solve these problems, we propose a new writing hand pose detection algorithm for initialization of air-writing using the Faster R-CNN framework for accurate hand detection followed by hand segmentation and finally counting the number of raised fingers based on geometrical properties of the hand. Further, we propose a robust fingertip detection and tracking approach using a new signature function called distance-weighted curvature entropy. Finally, a fingertip velocity-based termination criterion is used as a delimiter to mark the completion of the air-writing gesture. Experiments show the superiority of the proposed fingertip detection and tracking algorithm over state-of-the-art approaches giving a mean precision of 73.1 % while achieving real-time performance at 18.5 fps, a condition which is of vital importance to air-writing. Character recognition experiments give a mean accuracy of 96.11 % using the proposed air-writing system, a result which is comparable to that of existing handwritten character recognition systems. △ Less

Submitted 9 September, 2018; originally announced September 2018.

Comments: 32 pages, 10 figures, 2 tables. Submitted to Journal of Expert Systems with Applications

Journal ref: Expert Systems with Applications Volume 136, 1 December 2019, Pages 217-229

arXiv:1807.06772 [pdf, ps, other]

Bag-of-Visual-Words for Signature-Based Multi-Script Document Retrieval

Authors: Ranju Mandal, Partha Pratim Roy, Umapada Pal, Michael Blumenstein

Abstract: An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-… ▽ More An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-words powered by SIFT descriptors in a patch-based framework is proposed to compute the features and a Support Vector Machine (SVM)-based classifier was used to separate signatures from the documents. In the second stage, features from the foreground (i.e. signature strokes) and the background spatial information (i.e. background loops, reservoirs etc.) were combined to characterize the signature object to match with the query signature. Finally, three distance measures were used to match a query signature with the signature present in target documents for retrieval. The `Tobacco' document database and an Indian script database containing 560 documents of Devanagari (Hindi) and Bangla scripts were used for the performance evaluation. The proposed system was also tested on noisy documents and promising results were obtained. A comparative study shows that the proposed method outperforms the state-of-the-art approaches. △ Less

Submitted 18 July, 2018; originally announced July 2018.

arXiv:1804.06680 [pdf, other]

doi 10.1109/TITS.2018.2834958

Temporal Unknown Incremental Clustering (TUIC) Model for Analysis of Traffic Surveillance Videos

Authors: Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy

Abstract: Optimized scene representation is an important characteristic of a framework for detecting abnormalities on live videos. One of the challenges for detecting abnormalities in live videos is real-time detection of objects in a non-parametric way. Another challenge is to efficiently represent the state of objects temporally across frames. In this paper, a Gibbs sampling based heuristic model referred… ▽ More Optimized scene representation is an important characteristic of a framework for detecting abnormalities on live videos. One of the challenges for detecting abnormalities in live videos is real-time detection of objects in a non-parametric way. Another challenge is to efficiently represent the state of objects temporally across frames. In this paper, a Gibbs sampling based heuristic model referred to as Temporal Unknown Incremental Clustering (TUIC) has been proposed to cluster pixels with motion. Pixel motion is first detected using optical flow and a Bayesian algorithm has been applied to associate pixels belonging to similar cluster in subsequent frames. The algorithm is fast and produces accurate results in $Θ(kn)$ time, where $k$ is the number of clusters and $n$ the number of pixels. Our experimental validation with publicly available datasets reveals that the proposed framework has good potential to open-up new opportunities for real-time traffic analysis. △ Less

Submitted 18 April, 2018; originally announced April 2018.

arXiv:1804.06254 [pdf]

Synthetic data generation for Indic handwritten text recognition

Authors: Partha Pratim Roy, Akash Mohta, Bidyut B. Chaudhuri

Abstract: This paper presents a novel approach to generate synthetic dataset for handwritten word recognition systems. It is difficult to recognize handwritten scripts for which sufficient training data is not readily available or it may be expensive to collect such data. Hence, it becomes hard to train recognition systems owing to lack of proper dataset. To overcome such problems, synthetic data could be u… ▽ More This paper presents a novel approach to generate synthetic dataset for handwritten word recognition systems. It is difficult to recognize handwritten scripts for which sufficient training data is not readily available or it may be expensive to collect such data. Hence, it becomes hard to train recognition systems owing to lack of proper dataset. To overcome such problems, synthetic data could be used to create or expand the existing training dataset to improve recognition performance. Any available digital data from online newspaper and such sources can be used to generate synthetic data. In this paper, we propose to add distortion/deformation to digital data in such a way that the underlying pattern is preserved, so that the image so produced bears a close similarity to actual handwritten samples. The images thus produced can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the recognition system. We experimented using synthetic data to improve the recognition accuracy of isolated characters and words. The framework is tested on 2 Indic scripts - Devanagari (Hindi) and Bengali (Bangla), for numeral, character and word recognition. We have obtained encouraging results from the experiment. Finally, the experiment with Latin text verifies the utility of the approach. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1803.06613 [pdf, other]

Trajectory-based Scene Understanding using Dirichlet Process Mixture Model

Authors: Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy, Bidyut Baran Chaudhuri

Abstract: Appropriate modeling of a surveillance scene is essential for detection of anomalies in road traffic. Learning usual paths can provide valuable insight into road traffic conditions and thus can help in identifying unusual routes taken by commuters/vehicles. If usual traffic paths are learned in a nonparametric way, manual interventions in road marking road can be avoided. In this paper, we propose… ▽ More Appropriate modeling of a surveillance scene is essential for detection of anomalies in road traffic. Learning usual paths can provide valuable insight into road traffic conditions and thus can help in identifying unusual routes taken by commuters/vehicles. If usual traffic paths are learned in a nonparametric way, manual interventions in road marking road can be avoided. In this paper, we propose an unsupervised and nonparametric method to learn frequently used paths from the tracks of moving objects in $Θ(kn)$ time, where $k$ denotes the number of paths and $n$ represents the number of tracks. In the proposed method, temporal dependencies of the moving objects are considered to make the clustering meaningful using Temporally Incremental Gravity Model (TIGM). In addition, the distance-based scene learning makes it intuitive to estimate the model parameters. Further, we have extended TIGM hierarchically as Dynamically Evolving Model (DEM) to represent notable traffic dynamics of a scene. Experimental validation reveals that the proposed method can learn a scene quickly without prior knowledge about the number of paths ($k$). We have compared the results with various state-of-the-art methods. We have also highlighted the advantages of the proposed method over existing techniques popularly used for designing traffic monitoring applications. It can be used for administrative decision making to control traffic at junctions or crowded places and generate alarm signals, if necessary. △ Less

Submitted 16 June, 2019; v1 submitted 18 March, 2018; originally announced March 2018.

Comments: 14 pages, 27 figures

arXiv:1803.06480 [pdf, other]

doi 10.1016/j.eswa.2018.09.057

Queuing Theory Guided Intelligent Traffic Scheduling through Video Analysis using Dirichlet Process Mixture Model

Authors: Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy

Abstract: Accurate prediction of traffic signal duration for roadway junction is a challenging problem due to the dynamic nature of traffic flows. Though supervised learning can be used, parameters may vary across roadway junctions. In this paper, we present a computer vision guided expert system that can learn the departure rate of a given traffic junction modeled using traditional queuing theory. First, w… ▽ More Accurate prediction of traffic signal duration for roadway junction is a challenging problem due to the dynamic nature of traffic flows. Though supervised learning can be used, parameters may vary across roadway junctions. In this paper, we present a computer vision guided expert system that can learn the departure rate of a given traffic junction modeled using traditional queuing theory. First, we temporally group the optical flow of the moving vehicles using Dirichlet Process Mixture Model (DPMM). These groups are referred to as tracklets or temporal clusters. Tracklet features are then used to learn the dynamic behavior of a traffic junction, especially during on/off cycles of a signal. The proposed queuing theory based approach can predict the signal open duration for the next cycle with higher accuracy when compared with other popular features used for tracking. The hypothesis has been verified on two publicly available video datasets. The results reveal that the DPMM based features are better than existing tracking frameworks to estimate $μ$. Thus, signal duration prediction is more accurate when tested on these datasets.The method can be used for designing intelligent operator-independent traffic control systems for roadway junctions at cities and highways. △ Less

Submitted 17 March, 2018; originally announced March 2018.

Journal ref: Expert Systems with Applications Volume 118, 15 March 2019, Pages 169-181

arXiv:1802.08568 [pdf, other]

Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

Authors: Ayan Kumar Bhunia, Subham Mukherjee, Aneeshan Sain, Ankan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

Abstract: In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities join… ▽ More In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities jointly for script identification task. We take handwritten data in either modality as input and the opposite modality is generated through intermodality conversion. Thereafter, we feed this offline-online modality pair to our network. Hence, along with the advantage of utilizing information from both the modalities, it can work as a single framework for both offline and online script identification simultaneously which alleviates the need for designing two separate script identification modules for individual modality. One more major contribution is that we propose a novel conditional multimodal fusion scheme to combine the information from offline and online modality which takes into account the real origin of the data being fed to our network and thus it combines adaptively. An exhaustive experiment has been done on a data set consisting of English and six Indic scripts. Our proposed framework clearly outperforms different frameworks based on traditional classifiers along with handcrafted features and deep learning based methods with a clear margin. Extensive experiments show that using only character level training data can achieve state-of-art performance similar to that obtained with traditional training using word level data in our framework. △ Less

Submitted 15 October, 2019; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: Accepted in Information Fusion, Elsevier

arXiv:1801.07211 [pdf]

Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network

Authors: Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Aishik Konwer, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal

Abstract: In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and off… ▽ More In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and offline methods. Our proposed framework employs sequence to sequence model which consists of an encoder-decoder LSTM module. Our encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation. The output of the encoder is fed to a decoder LSTM and we get the successive coordinate points from every time step of the decoder LSTM. Although the sequence to sequence model is a popular paradigm in various computer vision and language translation tasks, the main contribution of our work lies in designing an end-to-end network for a decade old popular problem in Document Image Analysis community. Tamil, Telugu and Devanagari characters of LIPI Toolkit dataset are used for our experiments. Our proposed method has achieved superior performance compared to the other conventional approaches. △ Less

Submitted 3 June, 2018; v1 submitted 22 January, 2018; originally announced January 2018.

Comments: To be appeared in ICPR 2018, 2018 International Conference on Pattern Recognition, Code Link: https://drive.google.com/file/d/1clT-UuXgPp6uFn1tmIXx481qvPUcY0fV/view

arXiv:1801.07156 [pdf]

Word Level Font-to-Font Image Translation using Convolutional Recurrent Generative Adversarial Networks

Authors: Ankan Kumar Bhunia, Ayan Kumar Bhunia, Prithaj Banerjee, Aishik Konwer, Abir Bhowmick, Partha Pratim Roy, Umapada Pal

Abstract: Conversion of one font to another font is very useful in real life applications. In this paper, we propose a Convolutional Recurrent Generative model to solve the word level font transfer problem. Our network is able to convert the font style of any printed text images from its current font to the required font. The network is trained end-to-end for the complete word images. Thus it eliminates the… ▽ More Conversion of one font to another font is very useful in real life applications. In this paper, we propose a Convolutional Recurrent Generative model to solve the word level font transfer problem. Our network is able to convert the font style of any printed text images from its current font to the required font. The network is trained end-to-end for the complete word images. Thus it eliminates the necessary pre-processing steps, like character segmentations. We extend our model to conditional setting that helps to learn one-to-many map** function. We employ a novel convolutional recurrent model architecture in the Generator that efficiently deals with the word images of arbitrary width. It also helps to maintain the consistency of the final images after concatenating the generated image patches of target font. Besides, the Generator and the Discriminator network, we employ a Classification network to classify the generated word images of converted font style to their subsequent font categories. Most of the earlier works related to image translation are performed on square images. Our proposed architecture is the first work which can handle images of varying widths. Word images generally have varying width depending on the number of characters present. Hence, we test our model on a synthetically generated font dataset. We compare our method with some of the state-of-the-art methods for image translation. The superior performance of our network on the same dataset proves the ability of our model to learn the font distributions. △ Less

Submitted 23 May, 2018; v1 submitted 22 January, 2018; originally announced January 2018.

Comments: To be appeared in ICPR 2018, 2018 International Conference on Pattern Recognition

arXiv:1801.07141 [pdf]

Staff line Removal using Generative Adversarial Networks

Authors: Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal

Abstract: Staff line removal is a crucial pre-processing step in Optical Music Recognition. It is a challenging task to simultaneously reduce the noise and also retain the quality of music symbol context in ancient degraded music score images. In this paper we propose a novel approach for staff line removal, based on Generative Adversarial Networks. We convert staff line images into patches and feed them in… ▽ More Staff line removal is a crucial pre-processing step in Optical Music Recognition. It is a challenging task to simultaneously reduce the noise and also retain the quality of music symbol context in ancient degraded music score images. In this paper we propose a novel approach for staff line removal, based on Generative Adversarial Networks. We convert staff line images into patches and feed them into a U-Net, used as Generator. The Generator intends to produce staff-less images at the output. Then the Discriminator does binary classification and differentiates between the generated fake staff-less image and real ground truth staff less image. For training, we use a Loss function which is a weighted combination of L2 loss and Adversarial loss. L2 loss minimizes the difference between real and fake staff-less image. Adversarial loss helps to retrieve more high quality textures in generated images. Thus our architecture supports solutions which are closer to ground truth and it reflects in our results. For evaluation we consider the ICDAR/GREC 2013 staff removal database. Our method achieves superior performance in comparison to other conventional approaches. △ Less

Submitted 5 June, 2018; v1 submitted 22 January, 2018; originally announced January 2018.

Comments: To be appeared in ICPR 2018, 2018 International Conference on Pattern Recognition(Oral)

arXiv:1801.00879 [pdf]

A Novel Feature Descriptor for Image Retrieval by Combining Modified Color Histogram and Diagonally Symmetric Co-occurrence Texture Pattern

Authors: Ayan Kumar Bhunia, Avirup Bhattacharyya, Prithaj Banerjee, Partha Pratim Roy, Subrahmanyam Murala

Abstract: In this paper, we have proposed a novel feature descriptors combining color and texture information collectively. In our proposed color descriptor component, the inter-channel relationship between Hue (H) and Saturation (S) channels in the HSV color space has been explored which was not done earlier. We have quantized the H channel into a number of bins and performed the voting with saturation val… ▽ More In this paper, we have proposed a novel feature descriptors combining color and texture information collectively. In our proposed color descriptor component, the inter-channel relationship between Hue (H) and Saturation (S) channels in the HSV color space has been explored which was not done earlier. We have quantized the H channel into a number of bins and performed the voting with saturation values and vice versa by following a principle similar to that of the HOG descriptor, where orientation of the gradient is quantized into a certain number of bins and voting is done with gradient magnitude. This helps us to study the nature of variation of saturation with variation in Hue and nature of variation of Hue with the variation in saturation. The texture component of our descriptor considers the co-occurrence relationship between the pixels symmetric about both the diagonals of a 3x3 window. Our work is inspired from the work done by Dubey et al.[1]. These two components, viz. color and texture information individually perform better than existing texture and color descriptors. Moreover, when concatenated the proposed descriptors provide significant improvement over existing descriptors for content base color image retrieval. The proposed descriptor has been tested for image retrieval on five databases, including texture image databases - MIT VisTex database and Salzburg texture database and natural scene databases Corel 1K, Corel 5K and Corel 10K. The precision and recall values experimented on these databases are compared with some state-of-art local patterns. The proposed method provided satisfactory results from the experiments. △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: Preprint Submitted

arXiv:1801.00470 [pdf]

doi 10.1016/j.patcog.2018.07.034

Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network

Authors: Ankan Kumar Bhunia, Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Partha P. Roy, Umapada Pal

Abstract: Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging. In this paper, we propose a nove… ▽ More Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging. In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Next, we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch. Experiments have been done in four public script identification datasets: SIW-13, CVSI2015, ICDAR-17 and MLe2e. The proposed framework achieves superior results in comparison to conventional methods. △ Less

Submitted 7 August, 2018; v1 submitted 1 January, 2018; originally announced January 2018.

Comments: The first and second authors contributed equally. Accepted in Pattern Recognition Journal

arXiv:1801.00187 [pdf]

Fractional Local Neighborhood Intensity Pattern for Image Retrieval using Genetic Algorithm

Authors: Shuvozit Ghose, Abhirup Das, Ayan Kumar Bhunia, Partha Pratim Roy

Abstract: In this paper, a new texture descriptor named "Fractional Local Neighborhood Intensity Pattern" (FLNIP) has been proposed for content based image retrieval (CBIR). It is an extension of the Local Neighborhood Intensity Pattern (LNIP)[1]. FLNIP calculates the relative intensity difference between a particular pixel and the center pixel of a 3x3 window by considering the relationship with adjacent n… ▽ More In this paper, a new texture descriptor named "Fractional Local Neighborhood Intensity Pattern" (FLNIP) has been proposed for content based image retrieval (CBIR). It is an extension of the Local Neighborhood Intensity Pattern (LNIP)[1]. FLNIP calculates the relative intensity difference between a particular pixel and the center pixel of a 3x3 window by considering the relationship with adjacent neighbors. In this work, the fractional change in the local neighborhood involving the adjacent neighbors has been calculated first with respect to one of the eight neighbors of the center pixel of a 3x3 window. Next, the fractional change has been calculated with respect to the center itself. The two values of fractional change are next compared to generate a binary bit pattern. Both sign and magnitude information are encoded in a single descriptor as it deals with the relative change in magnitude in the adjacent neighborhood i.e., the comparison of the fractional change. The descriptor is applied on four multi-resolution images -- one being the raw image and the other three being filtered gaussian images obtained by applying gaussian filters of different standard deviations on the raw image to signify the importance of exploring texture information at different resolutions in an image. The four sets of distances obtained between the query and the target image are then combined with a genetic algorithm based approach to improve the retrieval performance by minimizing the distance between similar class images. The performance of the method has been tested for image retrieval on four popular databases. The precision and recall values observed on these databases have been compared with recent state-of-art local patterns. The proposed method has shown a significant improvement over many other existing methods. △ Less

Submitted 20 November, 2019; v1 submitted 30 December, 2017; originally announced January 2018.

Comments: MTAP, Springer(Minor Revision)

arXiv:1712.06908 [pdf]

doi 10.1016/j.patcog.2018.01.034

Cross-language Framework for Word Recognition and Spotting of Indic Scripts

Authors: Ayan Kumar Bhunia, Partha Pratim Roy, Akash Mohta, Umapada Pal

Abstract: Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script… ▽ More Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script (considered as source script) and testing is done on other scripts (considered as target script). Training with one source script and testing with another script to have a reasonable result is not easy in handwriting domain due to the complex nature of handwriting variability among scripts. Also it is difficult in map** between source and target characters when they appear in cursive word images. The proposed Indic cross language framework exploits a large resource of dataset for training and uses it for recognizing and spotting text of other target scripts where sufficient amount of training data is not available. Since, Indic scripts are mostly written in 3 zones, namely, upper, middle and lower, we employ zone-wise character (or component) map** for efficient learning purpose. The performance of our cross-language framework depends on the extent of similarity between the source and target scripts. Hence, we devise an entropy based script similarity score using source to target character map** that will provide a feasibility of cross language transcription. We have tested our approach in three Indic scripts, namely, Bangla, Devanagari and Gurumukhi, and the corresponding results are reported. △ Less

Submitted 28 January, 2018; v1 submitted 19 December, 2017; originally announced December 2017.

Comments: Accepted in Pattern Recognition, Elsevier(2018)

Showing 1–50 of 61 results for author: Roy, P P