11institutetext: 1 College of Computer Science, Sichuan University, Chengdu 610045, China,
[email protected],[email protected],[email protected]
2 School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, South Korea
[email protected]
Corresponding author

Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

Chengrui Gao1,2    Ziyuan Yang1    Andrew Beng ** Teoh2,∗    Min Zhu1,∗
Abstract

Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order textures, which describe the curves and arcs of the textures, encompass this overlooked structural information. This paper introduces a novel FKP recognition approach, the Dual-Order Texture Competition Network (DOTCNet), designed to capture texture information in FKP images comprehensively. DOTCNet incorporates three dual-order texture competitive modules (DTCMs), each targeting textures at different scales. Each DTCM employs a learnable texture descriptor, specifically a learnable Gabor filter (LGF), to extract texture features. By leveraging LGFs, the network extracts first and second order textures to describe fine textures and structural features thoroughly. Furthermore, an attention mechanism enhances relevant features in the first-order features, thereby highlighting significant texture details. For second-order features, a competitive mechanism emphasizes structural information while reducing noise from higher-order features. Extensive experimental results reveal that DOTCNet significantly outperforms several standard algorithms on the publicly available PolyU-FKP dataset.

Keywords:
Finger Knuckle Print Recognition \cdot Dual-order texture \cdot Learnable Gabor filter \cdot Competitive mechanism

1 Introduction

Biometric recognition is becoming increasingly prevalent across various application domains such as healthcare systems, public safety systems, and electronic banking [1]. The range of biometric technologies developed so far includes facial recognition, iris recognition, fingerprints, palmprints, and finger knuckle prints (FKP), among others [2]. Recently, FKP has garnered increasing research attention due to its numerous advantages [3]. For example, unlike fingerprints, which may become damaged or worn from frequent handling of objects, the surface of FKP remains largely intact. Compared to facial recognition, FKP targets are smaller and more difficult to capture maliciously, thus offering greater privacy [5]. Additionally, the skin creases on the outer side of the FKP exhibit unique lines and wrinkles with rich textures and distinct features [4]. Furthermore, the data collection process for FKP is either non-contact or involves minimal contact, enhancing hygiene and making it a more user-friendly biometric modality.

In recent decades, numerous methods for FKP recognition have been developed. These approaches can be broadly categorized into two main types: (1) handcrafted methods and (2) deep learning methods. Since FKP images contain directional features similar to those in palmprints, many recent methods use palmprint encoding techniques  [6]. Zhang et al. [7] encoded the dominant directional features of FKP images using a competitive code based on Gabor filter responses. These traditional descriptors are typically manually designed and leverage prior information. However, handcrafted algorithms often struggle to adapt to diverse modalities and varying image quality.

Deep learning methods, such as Convolutional Neural Networks (CNNs), have recently garnered significant attention in FKP recognition [8]. Many current CNN-based methods either utilize generic network models for training or directly adopt pre-trained models (such as VGGNet [9] and ResNet [10]) to extract deep features from corresponding databases. However, these generic models are primarily trained on large-scale image datasets (such as ImageNet), and the images in these datasets often exhibit a significant difference in feature distribution compared to those used in the FKP recognition task. Consequently, the performance of these models is often compromised. Thus, develo** an effective neural network architecture tailored to the characteristics of FKP images is essential. For instance, Cheng et al. [11] achieved FKP recognition by investigating the learning of minimally dimensional discriminative feature vectors to represent FKP images. Li et al. [12] developed a Sparse and Discriminative Multi-modal Feature Coding (SDMFC) model for jointly learning specific and common features. Li et al. [13] proposed a Joint Discriminative Feature Learning (JDFL) model, which extracts discriminative binary codes from Gabor features for FKP recognition. However, these methods overlook the importance of multi-order feature learning despite its crucial role in thoroughly modeling spatial correlations, thereby ultimately enhancing recognition accuracy. As illustrated in Fig. 1, for FKP images, features processed by first-order learnable Gabor filters contain rich, detailed information. In contrast, those processed by second-order learnable Gabor filters encapsulate major structural features crucial for recognition.

To address the aforementioned issues, we introduce DOTCNet, an FKP recognition method that comprises three different scale branches, facilitating the propagation of multi-scale texture information and enhancing the features’ nonlinear representation capabilities. Additionally, each branch incorporates a DTCM that employs diverse feature extraction techniques tailored to different feature orders in FKP images, thereby effectively capturing comprehensive texture and discriminative structure features. Specifically, first-order Gabor filtering is integrated with triplet attention mechanisms to enhance local and global features during the initial feature extraction phase. For second-order Gabor filtering, a competitive mechanism is utilized to select the optimal structure feature response, effectively eliminating redundant information, such as noise and irrelevant data, while preserving the important orientation of structure features. Then, we combine first-order and second-order features, allowing higher-dimensional feature vectors to be created and offering a more comprehensive texture description.

The main contributions of this article can be summarized as follows:

  • We design the DTCM to utilize texture and structure information while avoiding additional noise fully. Attention and competitive mechanisms are leveraged on first-order and second-order features to focus on important characteristics.

  • We propose an advanced FKP recognition method, DOTCNet, which combines the parallel texture-ordering feature extraction branches and the DTCM for comprehensive feature extraction.

  • Experimental results on the open-access PolyU-FKP dataset substantiate the effectiveness of our method and demonstrate significant improvements in recognition performance.

Refer to caption
Figure 1: The FKP image is processed by 1st/2nd order Gabor filters

2 Proposed method

We propose DOTCNet, a network designed to enhance the global flow of multi-scale information. DOTCNet aims to achieve more refined multi-scale features and capture dual-order feature information from each scale, leading to a more comprehensive understanding of the data. The overall network architecture is shown in Fig. 2. This network consists of three branches, each containing the DTCM of different scales. For the DTCM, the first-order LGF captures features that contain rich details, while the second-order LGF captures features that contain the main structural elements. We use a spatial and channel-based attention mechanism for first-order features to extract local and global texture detail features comprehensively. Meanwhile, we focus only on the competitive relationships in the textural structure orientation of second-order features to avoid noise introduced by high-order filtering.

2.1 Multi-scale network structure

Different scales of DTCM are applied to the input FKP image, generating feature maps at various resolutions. This process constructs multi-scale features and integrates multi-scale contextual information. Specifically, DOTCNet is divided into three stages: Stage 1, Stage 2, and Stage 3. Features obtained from different scales of filters — large-scale, medium-scale, and tiny-scale — contain texture information at varying scales. Tiny-scale features often have relatively large spatial extents to capture more detailed texture information, whereas large-scale features contain strong semantic information. Information at different scales is interrelated and complementary. Global multi-scale features are then obtained by concatenating features from different scales. These concatenated features are processed through two fully connected layers to generate the final feature vector. The following expression illustrates this process:

F=FC(FC(Concat(Fls,Fms,Fts))),𝐹FCFCConcatsubscript𝐹𝑙𝑠subscript𝐹𝑚𝑠subscript𝐹𝑡𝑠F=\mathrm{FC}\left(\mathrm{FC}\left(\mathrm{Concat}(F_{ls},F_{ms},F_{ts})% \right)\right),italic_F = roman_FC ( roman_FC ( roman_Concat ( italic_F start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_m italic_s end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) ) ) , (1)

where FC denotes the full connection layer, Flssubscript𝐹𝑙𝑠F_{ls}italic_F start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT, Fmssubscript𝐹𝑚𝑠F_{ms}italic_F start_POSTSUBSCRIPT italic_m italic_s end_POSTSUBSCRIPT, and Ftssubscript𝐹𝑡𝑠F_{ts}italic_F start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT represent the feature map generated by the different scale branches, Concat denotes the concatenate operation, and F𝐹Fitalic_F represents the final vector feature.

Refer to caption
Figure 2: The overview of the proposed DOTCNet.

2.2 Dual-order texture competitive module

For each DTCM, the process includes capturing the initial detail texture features through first-order LGF and the initial structure features through second-order LGF. Then, the attention and competition mechanisms are used for the two order features, respectively. Taking the large-scale branch as an example, dual-order features are generated by cascading first-order features and second-order features, and the expression is as follows:

Fls=Concat(F1,F2),subscript𝐹𝑙𝑠Concatsubscript𝐹1subscript𝐹2F_{ls}=\mathrm{Concat}(F_{1},F_{2}),italic_F start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT = roman_Concat ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , (2)

where Flssubscript𝐹𝑙𝑠F_{ls}italic_F start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT denotes the dual-order features in large-scale branch, F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the first-order detailed texture features, F2subscript𝐹2F_{2}italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT signifies the second-order structure features.

The Gabor filter feature extractor utilized in this paper is extensively employed in image processing because of its biological relevance and robust texture extraction capabilities. The Gabor filter is mathematically defined as follows:

g(x,y;λ,θ,ϕ,σ,γ)=exp(x2+γ2y22σ2)exp(i(2πxλ+ϕ)),𝑔𝑥𝑦𝜆𝜃italic-ϕ𝜎𝛾superscript𝑥2superscript𝛾2superscript𝑦22superscript𝜎2𝑖2𝜋superscript𝑥𝜆italic-ϕg(x,y;\lambda,\theta,\phi,\sigma,\gamma)=\exp\left(-\frac{x^{\prime 2}+\gamma^% {2}y^{\prime 2}}{2\sigma^{2}}\right)\exp\left(i\left(2\pi\frac{x^{\prime}}{% \lambda}+\phi\right)\right),italic_g ( italic_x , italic_y ; italic_λ , italic_θ , italic_ϕ , italic_σ , italic_γ ) = roman_exp ( - divide start_ARG italic_x start_POSTSUPERSCRIPT ′ 2 end_POSTSUPERSCRIPT + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ′ 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) roman_exp ( italic_i ( 2 italic_π divide start_ARG italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ end_ARG + italic_ϕ ) ) , (3)

where x=xcosθ+ysinθsuperscript𝑥𝑥𝜃𝑦𝜃x^{\prime}=x\cos\theta+y\sin\thetaitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x roman_cos italic_θ + italic_y roman_sin italic_θ and y=xsinθ+ycosθsuperscript𝑦𝑥𝜃𝑦𝜃y^{\prime}=-x\sin\theta+y\cos\thetaitalic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = - italic_x roman_sin italic_θ + italic_y roman_cos italic_θ. (x, y) denotes the pixel position. λ𝜆\lambdaitalic_λ denotes the wavelength of the sinusoidal plane wave component of the Gabor function. θ𝜃\thetaitalic_θ specifies the angle of the plane wave, while ϕitalic-ϕ\phiitalic_ϕ indicates the phase shift. Additionally, γ𝛾\gammaitalic_γ denotes the ellipticity of the Gaussian support of the Gabor function, and σ𝜎\sigmaitalic_σ determines the standard deviation of the Gaussian filter within the Gabor function.

An effective Gabor filter requires an appropriate combination of parameters to match the given task. Previous studies [14] have typically used handcrafted methods for parameter selection, relying on inherent rules to set the parameters manually and cannot guarantee effectiveness for the current task. To overcome this limitation, we employ the LGF as the texture extractor [15], which facilitates learning optimal parameters (λ,σ,γ𝜆𝜎𝛾\lambda,\sigma,\gammaitalic_λ , italic_σ , italic_γ) to extract texture features. In this paper, we employ the real part of the Gabor function, using Gabor filters with sizes of 7, 17, and 35 for tiny-scale, medium-scale, and large-scale textures. The filters are set to 12, 36, and 6, respectively.

2.2.1 The first-order detailed texture feature extraction

We first use first-order LGF to extract the edge and texture information of the image preliminarily. The expression is as follows:

χ=LGF(X),𝜒LGF𝑋\chi=\mathrm{LGF}(X),italic_χ = roman_LGF ( italic_X ) , (4)

where X represents the input FKP image, χ𝜒\chiitalic_χ is the initial detail texture features through first-order LGF, and LGF(\cdot) is the LGF operation.

For the generated feature χ𝜒\chiitalic_χ, we establish dimensional dependencies through rotation operations and residual transformations, capture cross-dimensional interactions to calculate attention weights, and encode channel and spatial information through the Triplet attention [16]. The architecture of the triplet attention module is shown in Fig. 2. Given an input feature χC×H×W𝜒superscript𝐶𝐻𝑊\chi\in\mathbb{R}^{C\times H\times W}italic_χ ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_H × italic_W end_POSTSUPERSCRIPT, we obtain the (C,H)𝐶𝐻(C,H)( italic_C , italic_H )-branching identity χ^1subscript^𝜒1\hat{\chi}_{1}over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by rotating it 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT anti-clockwise along the H𝐻Hitalic_H axis. We rotate χ𝜒\chiitalic_χ 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT anti-clockwise along the W𝑊Witalic_W axis to obtain the (C,W)𝐶𝑊(C,W)( italic_C , italic_W ) branch feature map χ^2subscript^𝜒2\hat{\chi}_{2}over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. χ𝜒\chiitalic_χ is the (H,W)𝐻𝑊(H,W)( italic_H , italic_W ) branch feature, denoted as χ3subscript𝜒3\chi_{3}italic_χ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. Then, the Z-Pool process connects each branch’s average pooling and maximum pooling results, which can be expressed as:

χ^i=Z_Pool(χ^i)=[MaxPool(χ^i),AvgPool(χ^i)],superscriptsubscript^𝜒𝑖Z_Poolsubscript^𝜒𝑖MaxPoolsubscript^𝜒𝑖AvgPoolsubscript^𝜒𝑖\hat{\chi}_{i}^{*}=\mathrm{Z\_Pool}(\hat{\chi}_{i})=[\mathrm{MaxPool}(\hat{% \chi}_{i}),\mathrm{AvgPool}(\hat{\chi}_{i})],over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_Z _ roman_Pool ( over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = [ roman_MaxPool ( over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , roman_AvgPool ( over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] , (5)

where χ^isuperscriptsubscript^𝜒𝑖\hat{\chi}_{i}^{*}over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT means the feature map through the Z-Pool process.

Finally, the interaction of spatial and channel attention across different dimensions can be represented as:

F1=13(χ^1σ(ψ1(χ^1))¯+χ^2σ(ψ2(χ^2))¯+χσ(ψ3(χ))),subscript𝐹113¯subscript^𝜒1𝜎subscript𝜓1superscriptsubscript^𝜒1¯subscript^𝜒2𝜎subscript𝜓2superscriptsubscript^𝜒2𝜒𝜎subscript𝜓3superscript𝜒F_{1}=\frac{1}{3}(\overline{\hat{\chi}_{1}\sigma(\psi_{1}(\hat{\chi}_{1}^{*}))% }+\overline{\hat{\chi}_{2}\sigma(\psi_{2}(\hat{\chi}_{2}^{*}))}+\chi\sigma(% \psi_{3}(\chi^{*}))),italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG ( over¯ start_ARG over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_σ ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG + over¯ start_ARG over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_σ ( italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG + italic_χ italic_σ ( italic_ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_χ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ) , (6)

where F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the first-order detailed texture feature, σ𝜎\sigmaitalic_σ represents the sigmoid activation function. ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and ψ3subscript𝜓3\psi_{3}italic_ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT represent the standard two-dimensional convolutional layers defined by kernel size 3 in the three branches of triplet attention. The overline represents rotating the input tensor 90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT clockwise.

2.2.2 The second-order structure feature extraction

After the process above, the first-order LGF captures detailed features and ensures a balance between local and global information through the attention mechanism. Building on this, the second-order LGF further strengthens important structural features, making the final feature representation more discriminative. The specific expression is as follows:

g=LGF(F1),𝑔LGFsubscript𝐹1g=\mathrm{LGF}(F_{1}),italic_g = roman_LGF ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , (7)

where F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the generated first-order texture feature, g𝑔gitalic_g means the initial structure features through second-order LGF.

To capture the important structure information, soft competitive code (SCC) [17] is introduced to extract the ordering relationship as the feature using the Softmax function. The process is formulated as follows:

F2=softmax(g)subscript𝐹2softmax𝑔F_{2}=\mathrm{softmax}(g)italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_softmax ( italic_g ) (8)

where F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the input of the competitive mechanism, respectively. Softmax(\cdot) denotes the competition extraction process along the channel dimension.

3 Experiments and Results

3.1 Datasets and Experimental Settings

This section introduces the PolyU Finger Knuckle Print Database (PolyU-FKP) used for experimental analysis [18]. The database comprises FKP images collected by the Hong Kong Polytechnic University using a low-resolution camera in a peg-free environment. The dataset involves 148 individuals and includes images of the left index FKP, left middle FKP, right index FKP, and right middle FKP. The images were captured in a contactless mode with a resolution of 110 × 220 pixels in BMP format. For each FKP image, there are 12 images, resulting in a total of 591 finger classes. We randomly selected six samples from each FKP to create the FKP training set while reserving the remaining samples for the FKP testing phase.

Our method is implemented using the PyTorch framework and optimized with the Adam optimizer [19], utilizing a learning rate of 0.01 and a batch size 1024. The experiments are conducted on an NVIDIA GTX 3090 GPU. In this paper, we develop a loss function by integrating cross-entropy loss with contrastive loss [20]. For comparison, other deep learning methods are implemented by replacing the network while kee** all other parameters consistent.

Table 1: Comparison of the proposed method with state-of-the-art methods.
Methods EER(%)
Left index Left middle Right index
Right middle
Total
BOCV [6] 8.753 8.712 8.845 8.667 8.657
ResNet18 [10] 5.349 4.392 5.545 4.657 3.484
DenseNet101 [22] 5.686 5.968 5.799 6.708 4.725
VGG16 [9] 6.550 4.842 6.250 4.649 4.039
Compnet [17] 2.558 3.228 4.940 3.498 2.934
PCANet-FKP [21] 4.542 2.614 4.054 3.307 2.431
CO3Net [23] 4.748 4.354 5.405 5.291 4.294
CCNet [24] 2.477 3.059 3.941 3.042 2.622
Ours 2.327 2.571 2.909 2.664 2.186
Refer to caption
(a) Left index FKP
Refer to caption
(b) Left middle FKP
Refer to caption
(c) Right index FKP
Refer to caption
(d) Right middle FKP
Refer to caption
(e) All dataset
Figure 3: The ROC curves of the proposed and compared methods on PolyU-FKP.

3.2 Recognition Performance

We compared the proposed method with the classical FKP recognition methods BOCV [6], ResNet18 [10], DenseNet101 [22], VGG16 [9], Compnet [17], and PCANet-FKP [21]. Moreover, we also implemented two advanced texture description methods, CO3Net [23] and CCNet [24]. Through experiments conducted on the PolyU-FKP dataset (comprising left index FKP, left middle FKP, right index FKP, and right middle FKP), we evaluated the performance of these methods. Table 2 presents the FKP recognition performance of various models on the dataset. For instance, our method achieved an Equal Error Rate (EER) of 2.186 on the overall FKP dataset, which is significantly lower than the second-ranked PCANet-FKP, representing an improvement of 11.2%. Our method also demonstrated superior EER performance across the PolyU-FKP dataset subset. Overall, the proposed method achieved the lowest EER across all test datasets, indicating superior accuracy and robustness in FKP recognition tasks.

To further validate the effectiveness of our method, we plotted the corresponding Receiver Operating Characteristic (ROC) curves, as shown in Fig. 3. The closer the curve is to the top left corner of the plot, the better the performance of the corresponding algorithm. These curves illustrate that our method exhibits superior performance across different thresholds, significantly outperforming other comparative methods. Compared with traditional techniques and other deep learning-based approaches, our method achieves the lowest EER values and the most optimal ROC curves on the PolyU-FKP dataset, showcasing its robustness and efficacy.

Table 2: Accuracies (%) and EERs (%) obtained with different scale branches on PolyU-FKP.
Large-scale Midium-scale Tiny-scale ACC(%) EER(%)
×\times× ×\times× 98.84 4.308
×\times× ×\times× 99.40 2.949
×\times× ×\times× 99.32 2.980
×\times× 99.37 2.702
×\times× 99.51 2.338
99.60 2.186
Table 3: Accuracies (%) and EERs (%) obtained with different mechanisms on PolyU-FKP.
1st-order feature 2nd-order feature ACC(%) EER(%)
TAM CM TAM CM
×\times× ×\times× 99.37 2.881
99.49 2.610
×\times× ×\times× 99.51 2.570
×\times× ×\times× 99.49 2.585
×\times× ×\times× 99.54 2.468
×\times× ×\times× 99.60 2.186

3.3 Ablation Experiments

3.3.1 The efficiency of multi-scale branches

To validate the necessity of multi-scale branches in recognition performance, we conducted several ablation experiments on the PolyU-FKP dataset. The entire experimental process ensured consistent network layers to ensure result comparability. The experimental results are shown in Table 1. First, we tested the recognition performance using only the large, medium, and tiny individual branches separately. The experimental results showed that the medium-scale branch outperformed the large-scale and tiny-scale branches, performing the best. Next, based on the medium-scale branch, we added the large-scale and tiny-scale branches for combination testing. The results indicated that the combination of medium-scale and tiny-scale branches outperformed the combination of medium-scale and large-scale branches, further validating the importance of the tiny-scale branch in the multi-scale structure. Based on the performance of the three branches in the experiments and the analysis of their importance, we allocated appropriate network layers to each branch. Specifically, the large-scale branch was assigned six layers, the medium-scale branch 36 layers, and the tiny-scale branch 12 layers. This allocation of network layers effectively improved the model’s recognition performance.

3.3.2 The efficiency of dual-order texture extraction

We conduct ablation experiments to evaluate the importance of dual-order texture features and the contributions of the Triplet Attention Mechanism (TAM) and the Competition Mechanism (CM) for first-order and second-order textures. These experiments isolate and elucidate the impact of each mechanism on overall model performance. The configurations of the ablation experiments on the PolyU-FKP dataset are as follows: (1) Only retaining the first-order texture extraction, (2) TAM and CM for both first-order and second-order textures, (3) TAM for both first-order and second-order textures, (4) CM for both first-order and second-order textures, (5) CM for first-order texture and TAM for second-order texture, and (6) TAM for first-order texture and CM for second-order texture.

As shown in Table 2, configuration (1) is compared with (2), and it is concluded that retaining dual-order features is significantly better than retaining only first-order features. The highest performance was observed in configuration (5), indicating that TAM effectively captured the essential features of first-order textures. In contrast, the CM was more suitable for handling the complexity and feature differentiation in second-order textures. This suggests that first-order textures benefit more from TAM’s focused attention on salient features, whereas second-order textures require effectively differentiating between significantly varying directional structure features.

4 Conclusion

In this paper, we propose a novel FKP recognition network termed DOTCNet. This network incorporates multi-scale branches and the DTCM to achieve comprehensive feature extraction. For DTCM, detailed texture features are captured by the first-order LGF. At the same time, inter-dimensional dependencies are established through rotational operations and residual transformations within the triplet attention mechanism, preserving the cross-dimensional texture features. Subsequently, structural features are captured by the second-order LGF and processed via a competitive mechanism to differentiate feature directions. Finally, the dual-order features are concatenated to achieve comprehensive feature extraction. To validate the effectiveness of the proposed method, we conducted extensive experiments on the open-access dataset. The experimental results indicate that our method has significant advantages over several other methods. Future research directions include establishing the consistency of templates for left and right FKP images. This approach utilizes registered left (right) FKP images to verify the identity of right (left) FKP images.

References

  • [1] Yang, Z., Teoh, A.B.J., Zhang, B., et al.: Physics-driven spectrum-consistent federated learning for palmprint verification. International Journal of Computer Vision pp. 1-16 (2024)
  • [2] Hattab, A., Behloul, A.: Face-iris multimodal biometric recognition system based on deep learning. Multimedia Tools and Applications 83(14), 43349-43376 (2024)
  • [3] Su, L., Fei, L., Zhang, B., et al.: Complete region of interest for unconstrained palmprint recognition. IEEE Transactions on Image Processing (2024)
  • [4] Fei, L., Zhang, B., Wen, J., et al.: Jointly learning compact multi-view hash codes for few-shot fkp recognition. Pattern Recognition 115, 107894 (2021)
  • [5] Yang, Y., Fei, L., Alshehri, A.H., et al.: Joint multi-type feature learning for multi-modality fkp recognition. Engineering Applications of Artificial Intelligence 126, 106960 (2023)
  • [6] Guo, Z., Zhang, D., Zhang, L., et al.: Palmprint verification using binary orientation co-occurrence vector. Pattern Recognition Letters 30(13), 1219-1227 (2009)
  • [7] Zhang, D., Lu, G., Zhang, L., et al.: Finger-knuckle-print verification. Advanced biometrics pp. 85-109 (2018)
  • [8] Li, S., Fei, L., Zhang, B., et al.: Hand-based multimodal biometric fusion: A review. Information Fusion p. 102418 (2024)
  • [9] Hong, H.G., Lee, M.B., Park, K.R.: Convolutional neural network-based finger-vein recognition using nir image sensors. Sensors 17(6), 1297 (2017)
  • [10] Kim, W., Song, J.M., Park, K.R.: Multimodal biometric recognition based on convolutional neural network by the fusion of finger-vein and finger shape using near-infrared (nir) camera sensor. Sensors 18(7), 2296 (2018)
  • [11] Cheng, K.H., Kumar, A.: Accurate 3d finger knuckle recognition using auto-generated similarity functions. IEEE Transactions on Biometrics, Behavior, and Identity Science 3(2), 203-213 (2021)
  • [12] Li, S., Zhang, B., Fei, L., et al.: Learning sparse and discriminative multimodal feature codes for finger recognition. IEEE Transactions on Multimedia 25, 805-815 (2021)
  • [13] Li, S., Zhang, B., Fei, L., et al.: Joint discriminative feature learning for multimodal finger recognition. Pattern Recognition 111, 107704 (2021)
  • [14] Aliraid, R., Ouamane, A.: A novel descriptor (lgbq) based on gabor filters. Multimedia Tools and Applications 83(4), 11669-11686 (2024)
  • [15] Chen, P., Li, W., Sun, L., et al.: Lgcn: learnable gabor convolution network for human gender recognition in the wild. IEICE TRANSACTIONS on Information and Systems 102(10), 2067-2071 (2019)
  • [16] Misra, D., Nalamada, T., Arasanipalai, A.U., et al.: Rotate to attend: Convolutional triplet attention module. In: IEEE/CVF winter conference on applications of computer vision. pp. 3139-3148 (2021)
  • [17] Liang, X., Yang, J., Lu, G., et al.: Compnet: Competitive neural network for palmprint recognition using learnable gabor kernels. IEEE Signal Processing Letters 28, 1739-1743 (2021)
  • [18] Zhang, L., Zhang, L., Zhang, D.: Finger-knuckle-print: a new biometric identifier. In: IEEE international conference on image processing. pp. 1981-1984. (2009)
  • [19] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [20] Gao, C., Yang, Z., Zhu, M., et al.: Scale-aware competition network for palmprint recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 4580-4584. (2024)
  • [21] Attia, A., Mazaa, S., Akhtar, Z., et al.: Deep learning-driven palmprint and finger knuckle pattern-based multimodal person recognition system. Multimedia Tools and Applications 81(8), 10961-10980 (2022)
  • [22] Song, J.M., Kim, W., Park, K.R.: Finger-vein recognition based on deep densenet using composite image. IEEE Access 7, 66845-66863 (2019)
  • [23] Yang, Z., Xia, W., Qiao, Y., et al.: CO3Net: Coordinate-aware contrastive competitive neural network for palmprint recognition. IEEE Transactions on Instrumentation and Measurement (2023)
  • [24] Yang, Z., Huangfu, H., Leng, L., et al.: Comprehensive competition mechanism in palmprint recognition. IEEE Transactions on Information Forensics and Security (2023)