Search | arXiv e-print repository

arXiv:2406.19631 [pdf, other]

Personalized Interpretation on Federated Learning: A Virtual Concepts approach

Authors: Peng Yan, Guodong Long, **g Jiang, Michael Blumenstein

Abstract: Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixt… ▽ More Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixture of conceptual vectors that each one represents an interpretable concept to end-users. These conceptual vectors could be pre-defined or refined in a human-in-the-loop process or be learnt via the optimization procedure of the federated learning system. In addition to the interpretability, the clarity of client-specific personalization could also be applied to enhance the robustness of the training process on FL system. The effectiveness of the proposed method have been validated on benchmark datasets. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2403.19211 [pdf, other]

Dual-Personalizing Adapter for Federated Foundation Models

Authors: Yiyuan Yang, Guodong Long, Tao Shen, **g Jiang, Michael Blumenstein

Abstract: Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning large amounts of instruction data. Notably, federated foundation models emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. T… ▽ More Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning large amounts of instruction data. Notably, federated foundation models emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. To alleviate communication and computation overhead, parameter-efficient methods are introduced for efficiency, and some research adapted personalization methods to federated foundation models for better user preferences alignment. However, a critical gap in existing research is the neglect of test-time distribution shifts in real-world applications. Therefore, to bridge this gap, we propose a new setting, termed test-time personalization, which not only concentrates on the targeted local task but also extends to other tasks that exhibit test-time distribution shifts. To address challenges in this new setting, we explore a simple yet effective solution to learn a comprehensive foundation model. Specifically, a dual-personalizing adapter architecture (FedDPA) is proposed, comprising a global adapter and a local adapter for addressing test-time distribution shifts and personalization, respectively. Additionally, we introduce an instance-wise dynamic weighting mechanism to optimize the balance between the global and local adapters, enhancing overall performance. The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2308.02905 [pdf, other]

FAST: Font-Agnostic Scene Text Editing

Authors: Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

Abstract: Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1… ▽ More Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1) complex image backgrounds, (2) various font styles, and (3) varying word lengths within the text. To address such inferior editing performance issues, in this paper, we propose a novel font-agnostic scene text editing framework, named FAST, for simultaneously generating text in arbitrary styles and locations while preserving a natural and realistic appearance through combined mask generation and style transfer. The proposed approach differs from the existing methods as they directly modify all image pixels. Instead, the proposed method has introduced a filtering mechanism to remove background distractions, allowing the network to focus solely on the text regions where editing is required. Additionally, a text-style transfer module has been designed to mitigate the challenges posed by varying word lengths. Extensive experiments and ablations have been conducted, and the results demonstrate that the proposed method outperforms the existing methods both qualitatively and quantitatively. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 13 pages, in submission

arXiv:2304.11993 [pdf, other]

MMC: Multi-Modal Colorization of Images using Textual Descriptions

Authors: Subhankar Ghosh, Saumik Bhattacharya, Prasun Roy, Umapada Pal, Michael Blumenstein

Abstract: Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of… ▽ More Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To do so, we have proposed a deep network that takes two inputs (grayscale image and the respective encoded text description) and tries to predict the relevant color components. Also, we have predicted each object in the image and have colorized them with their individual description to incorporate their specific attributes in the colorization process. After that, a fusion model fuses all the image objects (segments) to generate the final colorized image. As the respective textual descriptions contain color information of the objects present in the image, text encoding helps to improve the overall quality of predicted colors. In terms of performance, the proposed method outperforms existing colorization techniques in terms of LPIPS, PSNR and SSIM metrics. △ Less

Submitted 25 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 9 pages

arXiv:2302.14728 [pdf, other]

Global Context-Aware Person Image Generation

Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

Abstract: We propose a data-driven approach for context-aware person image generation. Specifically, we attempt to generate a person image such that the synthesized instance can blend into a complex scene. In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene. The proposed technique is divided into three sequential steps.… ▽ More We propose a data-driven approach for context-aware person image generation. Specifically, we attempt to generate a person image such that the synthesized instance can blend into a complex scene. In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene. The proposed technique is divided into three sequential steps. At first, we employ a Pix2PixHD model to infer a coarse semantic mask that represents the new person's spatial location, scale, and potential pose. Next, we use a data-centric approach to select the closest representation from a precomputed cluster of fine semantic masks. Finally, we adopt a multi-scale, attention-guided architecture to transfer the appearance attributes from an exemplar image. The proposed strategy enables us to synthesize semantically coherent realistic persons that can blend into an existing scene without altering the global context. We conclude our findings with relevant qualitative and quantitative evaluations. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 14 pages

arXiv:2208.02843 [pdf, other]

TIC: Text-Guided Image Colorization

Authors: Subhankar Ghosh, Prasun Roy, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

Abstract: Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as… ▽ More Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To the best of our knowledge, this is one of the first attempts to incorporate textual conditioning in the colorization pipeline. To do so, we have proposed a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. We have evaluated our proposed model using different metrics and found that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2207.11718 [pdf, other]

TIPS: Text-Induced Pose Synthesis

Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

Abstract: In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying… ▽ More In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying process challenging to apply in real-world scenarios as the generation of the target image is the actual aim. In this paper, we first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. We divide the problem into three independent stages: (a) text to pose representation, (b) pose refinement, and (c) pose rendering. To the best of our knowledge, this is one of the first attempts to develop a text-based pose transfer framework where we also introduce a new dataset DF-PASS, by adding descriptive pose annotations for the images of the DeepFashion dataset. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: Accepted in The European Conference on Computer Vision (ECCV) 2022

arXiv:2206.02717 [pdf, other]

Scene Aware Person Image Generation through Global Contextual Conditioning

Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

Abstract: Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the pe… ▽ More Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene. Our method uses three individual networks in a sequential pipeline. At first, we predict the potential location and the skeletal structure of the new person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on the existing human skeletons present in the scene. Next, the predicted skeleton is refined through a shallow linear network to achieve higher structural accuracy in the generated image. Finally, the target image is generated from the refined skeleton using another generative network conditioned on a given image of the target person. In our experiments, we achieve high-resolution photo-realistic generation results while preserving the general context of the scene. We conclude our paper with multiple qualitative and quantitative benchmarks on the results. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted in The International Conference on Pattern Recognition (ICPR) 2022

arXiv:2005.12524 [pdf]

A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

Authors: Sauradip Nag, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein

Abstract: Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels i… ▽ More Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions. Candidate regions are used for determining the number of temporal frame clusters obtained by K-means clustering on frame differences. This process in turn detects key frames. The proposed method explores Bayesian probability for skin portions using color values at both pixel and component levels of temporal frames, which provides fused images with skin components. Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences between them. We further propose adaptive pixels linking a deep learning model for text detection from torso regions. The proposed method is tested on our own dataset collected from marathon/sports video and three standard datasets, namely, RBNR, MMM and R-ID of marathon images, to evaluate the performance. In addition, the proposed method is also tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text datasets, to show the objectiveness of the proposed method. A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: Accepted in Pattern Recognition, Elsevier

arXiv:2005.04605 [pdf, other]

doi 10.1109/TIP.2020.3033151

Robust Tensor Decomposition for Image Representation Based on Generalized Correntropy

Authors: Miaohua Zhang, Yongsheng Gao, Changming Sun, Michael Blumenstein

Abstract: Traditional tensor decomposition methods, e.g., two dimensional principal component analysis and two dimensional singular value decomposition, that minimize mean square errors, are sensitive to outliers. To overcome this problem, in this paper we propose a new robust tensor decomposition method using generalized correntropy criterion (Corr-Tensor). A Lagrange multiplier method is used to effective… ▽ More Traditional tensor decomposition methods, e.g., two dimensional principal component analysis and two dimensional singular value decomposition, that minimize mean square errors, are sensitive to outliers. To overcome this problem, in this paper we propose a new robust tensor decomposition method using generalized correntropy criterion (Corr-Tensor). A Lagrange multiplier method is used to effectively optimize the generalized correntropy objective function in an iterative manner. The Corr-Tensor can effectively improve the robustness of tensor decomposition with the existence of outliers without introducing any extra computational cost. Experimental results demonstrated that the proposed method significantly reduces the reconstruction error on face reconstruction and improves the accuracies on handwritten digit recognition and facial image clustering. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 13 pages

arXiv:2005.04541 [pdf]

A Robust Matching Pursuit Algorithm Using Information Theoretic Learning

Authors: Miaohua Zhang, Yongsheng Gao, Changming Sun, Michael Blumenstein

Abstract: Current orthogonal matching pursuit (OMP) algorithms calculate the correlation between two vectors using the inner product operation and minimize the mean square error, which are both suboptimal when there are non-Gaussian noises or outliers in the observation data. To overcome these problems, a new OMP algorithm is developed based on the information theoretic learning (ITL), which is built on the… ▽ More Current orthogonal matching pursuit (OMP) algorithms calculate the correlation between two vectors using the inner product operation and minimize the mean square error, which are both suboptimal when there are non-Gaussian noises or outliers in the observation data. To overcome these problems, a new OMP algorithm is developed based on the information theoretic learning (ITL), which is built on the following new techniques: (1) an ITL-based correlation (ITL-Correlation) is developed as a new similarity measure which can better exploit higher-order statistics of the data, and is robust against many different types of noise and outliers in a sparse representation framework; (2) a non-second order statistic measurement and minimization method is developed to improve the robustness of OMP by overcoming the limitation of Gaussianity inherent in cost function based on second-order moments. The experimental results on both simulated and real-world data consistently demonstrate the superiority of the proposed OMP algorithm in data recovery, image reconstruction, and classification. △ Less

Submitted 9 May, 2020; originally announced May 2020.

Comments: Accepted by "Pattern Recognition"

arXiv:2002.10061 [pdf, other]

Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification

Authors: Wensi Tang, Guodong Long, Lu Liu, Tianyi Zhou, Michael Blumenstein, **g Jiang

Abstract: The Receptive Field (RF) size has been one of the most important factors for One Dimensional Convolutional Neural Networks (1D-CNNs) on time series classification tasks. Large efforts have been taken to choose the appropriate size because it has a huge influence on the performance and differs significantly for each dataset. In this paper, we propose an Omni-Scale block (OS-block) for 1D-CNNs, wher… ▽ More The Receptive Field (RF) size has been one of the most important factors for One Dimensional Convolutional Neural Networks (1D-CNNs) on time series classification tasks. Large efforts have been taken to choose the appropriate size because it has a huge influence on the performance and differs significantly for each dataset. In this paper, we propose an Omni-Scale block (OS-block) for 1D-CNNs, where the kernel sizes are decided by a simple and universal rule. Particularly, it is a set of kernel sizes that can efficiently cover the best RF size across different datasets via consisting of multiple prime numbers according to the length of the time series. The experiment result shows that models with the OS-block can achieve a similar performance as models with the searched optimal RF size and due to the strong optimal RF size capture ability, simple 1D-CNN models with OS-block achieves the state-of-the-art performance on four time series benchmarks, including both univariate and multivariate data from multiple domains. Comprehensive analysis and discussions shed light on why the OS-block can capture optimal RF sizes across different datasets. Code available [https://github.com/Wensi-Tang/OS-CNN] △ Less

Submitted 17 June, 2022; v1 submitted 23 February, 2020; originally announced February 2020.

Comments: Accepted by ICLR 2022

Journal ref: The Tenth International Conference on Learning Representations(ICLR 2022)

arXiv:1912.12168 [pdf, other]

doi 10.1109/TIFS.2020.2991833

Intra-Variable Handwriting Inspection Reinforced with Idiosyncrasy Analysis

Authors: Chandranath Adak, Bidyut B. Chaudhuri, Chin-Teng Lin, Michael Blumenstein

Abstract: In this paper, we work on intra-variable handwriting, where the writing samples of an individual can vary significantly. Such within-writer variation throws a challenge for automatic writer inspection, where the state-of-the-art methods do not perform well. To deal with intra-variability, we analyze the idiosyncrasy in individual handwriting. We identify/verify the writer from highly idiosyncratic… ▽ More In this paper, we work on intra-variable handwriting, where the writing samples of an individual can vary significantly. Such within-writer variation throws a challenge for automatic writer inspection, where the state-of-the-art methods do not perform well. To deal with intra-variability, we analyze the idiosyncrasy in individual handwriting. We identify/verify the writer from highly idiosyncratic text-patches. Such patches are detected using a deep recurrent reinforcement learning-based architecture. An idiosyncratic score is assigned to every patch, which is predicted by employing deep regression analysis. For writer identification, we propose a deep neural architecture, which makes the final decision by the idiosyncratic score-induced weighted average of patch-based decisions. For writer verification, we propose two algorithms for patch-fed deep feature aggregation, which assist in authentication using a triplet network. The experiments were performed on two databases, where we obtained encouraging results. △ Less

Submitted 7 May, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

Journal ref: IEEE Transactions on Information Forensics and Security, 2020

arXiv:1909.06886 [pdf, other]

doi 10.1109/ICDM.2019.00060

Temporal Self-Attention Network for Medical Concept Embedding

Authors: Xue** Peng, Guodong Long, Tao Shen, Sen Wang, **g Jiang, Michael Blumenstein

Abstract: In longitudinal electronic health records (EHRs), the event records of a patient are distributed over a long period of time and the temporal relations between the events reflect sufficient domain knowledge to benefit prediction tasks such as the rate of inpatient mortality. Medical concept embedding as a feature extraction method that transforms a set of medical concepts with a specific time stamp… ▽ More In longitudinal electronic health records (EHRs), the event records of a patient are distributed over a long period of time and the temporal relations between the events reflect sufficient domain knowledge to benefit prediction tasks such as the rate of inpatient mortality. Medical concept embedding as a feature extraction method that transforms a set of medical concepts with a specific time stamp into a vector, which will be fed into a supervised learning algorithm. The quality of the embedding significantly determines the learning performance over the medical data. In this paper, we propose a medical concept embedding method based on applying a self-attention mechanism to represent each medical concept. We propose a novel attention mechanism which captures the contextual information and temporal relationships between medical concepts. A light-weight neural net, "Temporal Self-Attention Network (TeSAN)", is then proposed to learn medical concept embedding based solely on the proposed attention mechanism. To test the effectiveness of our proposed methods, we have conducted clustering and prediction tasks on two public EHRs datasets comparing TeSAN against five state-of-the-art embedding methods. The experimental results demonstrate that the proposed TeSAN model is superior to all the compared methods. To the best of our knowledge, this work is the first to exploit temporal self-attentive relations between medical events. △ Less

Submitted 15 September, 2019; originally announced September 2019.

Comments: 10 pages, 7 figures, accepted at IEEE ICDM 2019

MSC Class: 68T30 ACM Class: I.2.1

arXiv:1904.09405 [pdf, other]

doi 10.1007/s11432-019-2713-1

FACLSTM: ConvLSTM with Focused Attention for Scene Text Recognition

Authors: Qingqing Wang, Wen**g Jia, Xiangjian He, Yue Lu, Michael Blumenstein, Ye Huang

Abstract: Scene text recognition has recently been widely treated as a sequence-to-sequence prediction problem, where traditional fully-connected-LSTM (FC-LSTM) has played a critical role. Due to the limitation of FC-LSTM, existing methods have to convert 2-D feature maps into 1-D sequential feature vectors, resulting in severe damages of the valuable spatial and structural information of text images. In th… ▽ More Scene text recognition has recently been widely treated as a sequence-to-sequence prediction problem, where traditional fully-connected-LSTM (FC-LSTM) has played a critical role. Due to the limitation of FC-LSTM, existing methods have to convert 2-D feature maps into 1-D sequential feature vectors, resulting in severe damages of the valuable spatial and structural information of text images. In this paper, we argue that scene text recognition is essentially a spatiotemporal prediction problem for its 2-D image inputs, and propose a convolution LSTM (ConvLSTM)-based scene text recognizer, namely, FACLSTM, i.e., Focused Attention ConvLSTM, where the spatial correlation of pixels is fully leveraged when performing sequential prediction with LSTM. Particularly, the attention mechanism is properly incorporated into an efficient ConvLSTM structure via the convolutional operations and additional character center masks are generated to help focus attention on right feature areas. The experimental results on benchmark datasets IIIT5K, SVT and CUTE demonstrate that our proposed FACLSTM performs competitively on the regular, low-resolution and noisy text images, and outperforms the state-of-the-art approaches on the curved text with large margins. △ Less

Submitted 5 January, 2020; v1 submitted 20 April, 2019; originally announced April 2019.

Comments: Accepted by Science China Information Science

arXiv:1807.06772 [pdf, ps, other]

Bag-of-Visual-Words for Signature-Based Multi-Script Document Retrieval

Authors: Ranju Mandal, Partha Pratim Roy, Umapada Pal, Michael Blumenstein

Abstract: An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-… ▽ More An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-words powered by SIFT descriptors in a patch-based framework is proposed to compute the features and a Support Vector Machine (SVM)-based classifier was used to separate signatures from the documents. In the second stage, features from the foreground (i.e. signature strokes) and the background spatial information (i.e. background loops, reservoirs etc.) were combined to characterize the signature object to match with the query signature. Finally, three distance measures were used to match a query signature with the signature present in target documents for retrieval. The `Tobacco' document database and an Indian script database containing 560 documents of Devanagari (Hindi) and Bangla scripts were used for the performance evaluation. The proposed system was also tested on noisy documents and promising results were obtained. A comparative study shows that the proposed method outperforms the state-of-the-art approaches. △ Less

Submitted 18 July, 2018; originally announced July 2018.

arXiv:1709.02243 [pdf, other]

Towards a Dedicated Computer Vision Tool set for Crowd Simulation Models

Authors: Sultan Daud Khan, Muhammad Saqib, Michael Blumenstein

Abstract: As the population of world is increasing, and even more concentrated in urban areas, ensuring public safety is becoming a taunting job for security personnel and crowd managers. Mass events like sports, festivals, concerts, political gatherings attract thousand of people in a constraint environment,therefore adequate safety measures should be adopted. Despite safety measures, crowd disasters still… ▽ More As the population of world is increasing, and even more concentrated in urban areas, ensuring public safety is becoming a taunting job for security personnel and crowd managers. Mass events like sports, festivals, concerts, political gatherings attract thousand of people in a constraint environment,therefore adequate safety measures should be adopted. Despite safety measures, crowd disasters still occur frequently. Understanding underlying dynamics and behavior of crowd is becoming areas of interest for most of computer scientists. In recent years, researchers developed several models for understanding crowd dynamics. These models should be properly calibrated and validated by means of data acquired in the field. In this paper, we developed a computer vision tool set that can be helpful not only in initializing the crowd simulation models but can also validate the simulation results. The main features of proposed tool set are: (1) Crowd flow segmentation and crowd counting, (2) Identifying source/sink location for understanding crowd behavior, (3) Group detection and tracking in crowds. △ Less

Submitted 1 September, 2017; originally announced September 2017.

arXiv:1708.03361 [pdf, other]

doi 10.1109/ACCESS.2019.2899908

An Empirical Study on Writer Identification & Verification from Intra-variable Individual Handwriting

Authors: Chandranath Adak, Bidyut B. Chaudhuri, Michael Blumenstein

Abstract: The handwriting of an individual may vary substantially with factors such as mood, time, space, writing speed, writing medium and tool, writing topic, etc. It becomes challenging to perform automated writer verification/identification on a particular set of handwritten patterns (e.g., speedy handwriting) of a person, especially when the system is trained using a different set of writing patterns (… ▽ More The handwriting of an individual may vary substantially with factors such as mood, time, space, writing speed, writing medium and tool, writing topic, etc. It becomes challenging to perform automated writer verification/identification on a particular set of handwritten patterns (e.g., speedy handwriting) of a person, especially when the system is trained using a different set of writing patterns (e.g., normal speed) of that same person. However, it would be interesting to experimentally analyze if there exists any implicit characteristic of individuality which is insensitive to high intra-variable handwriting. In this paper, we study some handcrafted features and auto-derived features extracted from intra-variable writing. Here, we work on writer identification/verification from offline Bengali handwriting of high intra-variability. To this end, we use various models mainly based on handcrafted features with SVM (Support Vector Machine) and features auto-derived by the convolutional network. For experimentation, we have generated two handwritten databases from two different sets of 100 writers and enlarged the dataset by a data-augmentation technique. We have obtained some interesting results. △ Less

Submitted 20 January, 2019; v1 submitted 10 August, 2017; originally announced August 2017.

Showing 1–18 of 18 results for author: Blumenstein, M