¹¹institutetext: ¹ College of Computer Science, Sichuan University, Chengdu 610045, China,
[email protected],[email protected],[email protected]
² School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, South Korea
[email protected]
^∗ Corresponding author

Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

Chengrui Gao^1,2 Ziyuan Yang¹ Andrew Beng ** Teoh^2,∗ Min Zhu^1,∗

Abstract

Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order textures, which describe the curves and arcs of the textures, encompass this overlooked structural information. This paper introduces a novel FKP recognition approach, the Dual-Order Texture Competition Network (DOTCNet), designed to capture texture information in FKP images comprehensively. DOTCNet incorporates three dual-order texture competitive modules (DTCMs), each targeting textures at different scales. Each DTCM employs a learnable texture descriptor, specifically a learnable Gabor filter (LGF), to extract texture features. By leveraging LGFs, the network extracts first and second order textures to describe fine textures and structural features thoroughly. Furthermore, an attention mechanism enhances relevant features in the first-order features, thereby highlighting significant texture details. For second-order features, a competitive mechanism emphasizes structural information while reducing noise from higher-order features. Extensive experimental results reveal that DOTCNet significantly outperforms several standard algorithms on the publicly available PolyU-FKP dataset.

Keywords:

Finger Knuckle Print Recognition

\cdot

Dual-order texture

\cdot

Learnable Gabor filter

\cdot

Competitive mechanism

1 Introduction

Biometric recognition is becoming increasingly prevalent across various application domains such as healthcare systems, public safety systems, and electronic banking [1]. The range of biometric technologies developed so far includes facial recognition, iris recognition, fingerprints, palmprints, and finger knuckle prints (FKP), among others [2]. Recently, FKP has garnered increasing research attention due to its numerous advantages [3]. For example, unlike fingerprints, which may become damaged or worn from frequent handling of objects, the surface of FKP remains largely intact. Compared to facial recognition, FKP targets are smaller and more difficult to capture maliciously, thus offering greater privacy [5]. Additionally, the skin creases on the outer side of the FKP exhibit unique lines and wrinkles with rich textures and distinct features [4]. Furthermore, the data collection process for FKP is either non-contact or involves minimal contact, enhancing hygiene and making it a more user-friendly biometric modality.

In recent decades, numerous methods for FKP recognition have been developed. These approaches can be broadly categorized into two main types: (1) handcrafted methods and (2) deep learning methods. Since FKP images contain directional features similar to those in palmprints, many recent methods use palmprint encoding techniques [6]. Zhang et al. [7] encoded the dominant directional features of FKP images using a competitive code based on Gabor filter responses. These traditional descriptors are typically manually designed and leverage prior information. However, handcrafted algorithms often struggle to adapt to diverse modalities and varying image quality.

Deep learning methods, such as Convolutional Neural Networks (CNNs), have recently garnered significant attention in FKP recognition [8]. Many current CNN-based methods either utilize generic network models for training or directly adopt pre-trained models (such as VGGNet [9] and ResNet [10]) to extract deep features from corresponding databases. However, these generic models are primarily trained on large-scale image datasets (such as ImageNet), and the images in these datasets often exhibit a significant difference in feature distribution compared to those used in the FKP recognition task. Consequently, the performance of these models is often compromised. Thus, develo** an effective neural network architecture tailored to the characteristics of FKP images is essential. For instance, Cheng et al. [11] achieved FKP recognition by investigating the learning of minimally dimensional discriminative feature vectors to represent FKP images. Li et al. [12] developed a Sparse and Discriminative Multi-modal Feature Coding (SDMFC) model for jointly learning specific and common features. Li et al. [13] proposed a Joint Discriminative Feature Learning (JDFL) model, which extracts discriminative binary codes from Gabor features for FKP recognition. However, these methods overlook the importance of multi-order feature learning despite its crucial role in thoroughly modeling spatial correlations, thereby ultimately enhancing recognition accuracy. As illustrated in Fig. 1, for FKP images, features processed by first-order learnable Gabor filters contain rich, detailed information. In contrast, those processed by second-order learnable Gabor filters encapsulate major structural features crucial for recognition.

To address the aforementioned issues, we introduce DOTCNet, an FKP recognition method that comprises three different scale branches, facilitating the propagation of multi-scale texture information and enhancing the features’ nonlinear representation capabilities. Additionally, each branch incorporates a DTCM that employs diverse feature extraction techniques tailored to different feature orders in FKP images, thereby effectively capturing comprehensive texture and discriminative structure features. Specifically, first-order Gabor filtering is integrated with triplet attention mechanisms to enhance local and global features during the initial feature extraction phase. For second-order Gabor filtering, a competitive mechanism is utilized to select the optimal structure feature response, effectively eliminating redundant information, such as noise and irrelevant data, while preserving the important orientation of structure features. Then, we combine first-order and second-order features, allowing higher-dimensional feature vectors to be created and offering a more comprehensive texture description.

The main contributions of this article can be summarized as follows:

•

We design the DTCM to utilize texture and structure information while avoiding additional noise fully. Attention and competitive mechanisms are leveraged on first-order and second-order features to focus on important characteristics.
•

We propose an advanced FKP recognition method, DOTCNet, which combines the parallel texture-ordering feature extraction branches and the DTCM for comprehensive feature extraction.
•

Experimental results on the open-access PolyU-FKP dataset substantiate the effectiveness of our method and demonstrate significant improvements in recognition performance.

Refer to caption — Figure 1: The FKP image is processed by 1st/2nd order Gabor filters

2 Proposed method

We propose DOTCNet, a network designed to enhance the global flow of multi-scale information. DOTCNet aims to achieve more refined multi-scale features and capture dual-order feature information from each scale, leading to a more comprehensive understanding of the data. The overall network architecture is shown in Fig. 2. This network consists of three branches, each containing the DTCM of different scales. For the DTCM, the first-order LGF captures features that contain rich details, while the second-order LGF captures features that contain the main structural elements. We use a spatial and channel-based attention mechanism for first-order features to extract local and global texture detail features comprehensively. Meanwhile, we focus only on the competitive relationships in the textural structure orientation of second-order features to avoid noise introduced by high-order filtering.

2.1 Multi-scale network structure

Different scales of DTCM are applied to the input FKP image, generating feature maps at various resolutions. This process constructs multi-scale features and integrates multi-scale contextual information. Specifically, DOTCNet is divided into three stages: Stage 1, Stage 2, and Stage 3. Features obtained from different scales of filters — large-scale, medium-scale, and tiny-scale — contain texture information at varying scales. Tiny-scale features often have relatively large spatial extents to capture more detailed texture information, whereas large-scale features contain strong semantic information. Information at different scales is interrelated and complementary. Global multi-scale features are then obtained by concatenating features from different scales. These concatenated features are processed through two fully connected layers to generate the final feature vector. The following expression illustrates this process:

F=\mathrm{FC}\left(\mathrm{FC}\left(\mathrm{Concat}(F_{ls},F_{ms},F_{ts})% \right)\right),

(1)

where FC denotes the full connection layer, $F_{ls}$ , $F_{ms}$ , and $F_{ts}$ represent the feature map generated by the different scale branches, Concat denotes the concatenate operation, and $F$ represents the final vector feature.

2.2 Dual-order texture competitive module

For each DTCM, the process includes capturing the initial detail texture features through first-order LGF and the initial structure features through second-order LGF. Then, the attention and competition mechanisms are used for the two order features, respectively. Taking the large-scale branch as an example, dual-order features are generated by cascading first-order features and second-order features, and the expression is as follows:

F_{ls}=\mathrm{Concat}(F_{1},F_{2}),

(2)

where $F_{ls}$ denotes the dual-order features in large-scale branch, $F_{1}$ represents the first-order detailed texture features, $F_{2}$ signifies the second-order structure features.

The Gabor filter feature extractor utilized in this paper is extensively employed in image processing because of its biological relevance and robust texture extraction capabilities. The Gabor filter is mathematically defined as follows:

g(x,y;\lambda,\theta,\phi,\sigma,\gamma)=\exp\left(-\frac{x^{\prime 2}+\gamma^% {2}y^{\prime 2}}{2\sigma^{2}}\right)\exp\left(i\left(2\pi\frac{x^{\prime}}{% \lambda}+\phi\right)\right),

(3)

where $x^{\prime}=x\cos\theta+y\sin\theta$ and $y^{\prime}=-x\sin\theta+y\cos\theta$ . (x, y) denotes the pixel position. $\lambda$ denotes the wavelength of the sinusoidal plane wave component of the Gabor function. $\theta$ specifies the angle of the plane wave, while $\phi$ indicates the phase shift. Additionally, $\gamma$ denotes the ellipticity of the Gaussian support of the Gabor function, and $\sigma$ determines the standard deviation of the Gaussian filter within the Gabor function.

An effective Gabor filter requires an appropriate combination of parameters to match the given task. Previous studies [14] have typically used handcrafted methods for parameter selection, relying on inherent rules to set the parameters manually and cannot guarantee effectiveness for the current task. To overcome this limitation, we employ the LGF as the texture extractor [15], which facilitates learning optimal parameters ( $\lambda,\sigma,\gamma$ ) to extract texture features. In this paper, we employ the real part of the Gabor function, using Gabor filters with sizes of 7, 17, and 35 for tiny-scale, medium-scale, and large-scale textures. The filters are set to 12, 36, and 6, respectively.

2.2.1 The first-order detailed texture feature extraction

We first use first-order LGF to extract the edge and texture information of the image preliminarily. The expression is as follows:

\chi=\mathrm{LGF}(X),

(4)

where X represents the input FKP image, $\chi$ is the initial detail texture features through first-order LGF, and LGF( $\cdot$ ) is the LGF operation.

For the generated feature $\chi$ , we establish dimensional dependencies through rotation operations and residual transformations, capture cross-dimensional interactions to calculate attention weights, and encode channel and spatial information through the Triplet attention [16]. The architecture of the triplet attention module is shown in Fig. 2. Given an input feature $\chi\in\mathbb{R}^{C\times H\times W}$ , we obtain the $(C,H)$ -branching identity $\hat{\chi}_{1}$ by rotating it $90^{\circ}$ anti-clockwise along the $H$ axis. We rotate $\chi$ $90^{\circ}$ anti-clockwise along the $W$ axis to obtain the $(C,W)$ branch feature map $\hat{\chi}_{2}$ . $\chi$ is the $(H,W)$ branch feature, denoted as $\chi_{3}$ . Then, the Z-Pool process connects each branch’s average pooling and maximum pooling results, which can be expressed as:

\hat{\chi}_{i}^{*}=\mathrm{Z\_Pool}(\hat{\chi}_{i})=[\mathrm{MaxPool}(\hat{% \chi}_{i}),\mathrm{AvgPool}(\hat{\chi}_{i})],

(5)

where $\hat{\chi}_{i}^{*}$ means the feature map through the Z-Pool process.

Finally, the interaction of spatial and channel attention across different dimensions can be represented as:

F_{1}=\frac{1}{3}(\overline{\hat{\chi}_{1}\sigma(\psi_{1}(\hat{\chi}_{1}^{*}))% }+\overline{\hat{\chi}_{2}\sigma(\psi_{2}(\hat{\chi}_{2}^{*}))}+\chi\sigma(% \psi_{3}(\chi^{*}))),

(6)

where $F_{1}$ represents the first-order detailed texture feature, $\sigma$ represents the sigmoid activation function. $\psi_{1}$ , $\psi_{2}$ , and $\psi_{3}$ represent the standard two-dimensional convolutional layers defined by kernel size 3 in the three branches of triplet attention. The overline represents rotating the input tensor $90^{\circ}$ clockwise.

2.2.2 The second-order structure feature extraction

After the process above, the first-order LGF captures detailed features and ensures a balance between local and global information through the attention mechanism. Building on this, the second-order LGF further strengthens important structural features, making the final feature representation more discriminative. The specific expression is as follows:

g=\mathrm{LGF}(F_{1}),

(7)

where $F_{1}$ represents the generated first-order texture feature, $g$ means the initial structure features through second-order LGF.

To capture the important structure information, soft competitive code (SCC) [17] is introduced to extract the ordering relationship as the feature using the Softmax function. The process is formulated as follows:

F_{2}=\mathrm{softmax}(g)

(8)

where $F_{1}$ is the input of the competitive mechanism, respectively. Softmax( $\cdot$ ) denotes the competition extraction process along the channel dimension.

3 Experiments and Results

3.1 Datasets and Experimental Settings

This section introduces the PolyU Finger Knuckle Print Database (PolyU-FKP) used for experimental analysis [18]. The database comprises FKP images collected by the Hong Kong Polytechnic University using a low-resolution camera in a peg-free environment. The dataset involves 148 individuals and includes images of the left index FKP, left middle FKP, right index FKP, and right middle FKP. The images were captured in a contactless mode with a resolution of 110 × 220 pixels in BMP format. For each FKP image, there are 12 images, resulting in a total of 591 finger classes. We randomly selected six samples from each FKP to create the FKP training set while reserving the remaining samples for the FKP testing phase.

Our method is implemented using the PyTorch framework and optimized with the Adam optimizer [19], utilizing a learning rate of 0.01 and a batch size 1024. The experiments are conducted on an NVIDIA GTX 3090 GPU. In this paper, we develop a loss function by integrating cross-entropy loss with contrastive loss [20]. For comparison, other deep learning methods are implemented by replacing the network while kee** all other parameters consistent.

Table 1: Comparison of the proposed method with state-of-the-art methods.

Methods

EER(%)

Left index

Left middle

Right index

Right middle

Total

BOCV [6]

8.753

8.712

8.845

8.667

8.657

ResNet18 [10]

5.349

4.392

5.545

4.657

3.484

DenseNet101 [22]

5.686

5.968

5.799

6.708

4.725

VGG16 [9]

6.550

4.842

6.250

4.649

4.039

Compnet [17]

2.558

3.228

4.940

3.498

2.934

PCANet-FKP [21]

4.542

2.614

4.054

3.307

2.431

CO3Net [23]

4.748

4.354

5.405

5.291

4.294

CCNet [24]

2.477

3.059

3.941

3.042

2.622

Ours

2.327

2.571

2.909

2.664

2.186

3.2 Recognition Performance

We compared the proposed method with the classical FKP recognition methods BOCV [6], ResNet18 [10], DenseNet101 [22], VGG16 [9], Compnet [17], and PCANet-FKP [21]. Moreover, we also implemented two advanced texture description methods, CO3Net [23] and CCNet [24]. Through experiments conducted on the PolyU-FKP dataset (comprising left index FKP, left middle FKP, right index FKP, and right middle FKP), we evaluated the performance of these methods. Table 2 presents the FKP recognition performance of various models on the dataset. For instance, our method achieved an Equal Error Rate (EER) of 2.186 on the overall FKP dataset, which is significantly lower than the second-ranked PCANet-FKP, representing an improvement of 11.2%. Our method also demonstrated superior EER performance across the PolyU-FKP dataset subset. Overall, the proposed method achieved the lowest EER across all test datasets, indicating superior accuracy and robustness in FKP recognition tasks.

To further validate the effectiveness of our method, we plotted the corresponding Receiver Operating Characteristic (ROC) curves, as shown in Fig. 3. The closer the curve is to the top left corner of the plot, the better the performance of the corresponding algorithm. These curves illustrate that our method exhibits superior performance across different thresholds, significantly outperforming other comparative methods. Compared with traditional techniques and other deep learning-based approaches, our method achieves the lowest EER values and the most optimal ROC curves on the PolyU-FKP dataset, showcasing its robustness and efficacy.

Table 2: Accuracies (%) and EERs (%) obtained with different scale branches on PolyU-FKP.

Large-scale	Midium-scale	Tiny-scale	ACC(%)	EER(%)
✓	$\times$	$\times$	98.84	4.308
$\times$	✓	$\times$	99.40	2.949
$\times$	$\times$	✓	99.32	2.980
✓	✓	$\times$	99.37	2.702
$\times$	✓	✓	99.51	2.338
✓	✓	✓	99.60	2.186

Table 3: Accuracies (%) and EERs (%) obtained with different mechanisms on PolyU-FKP.

1st-order feature		2nd-order feature		ACC(%)	EER(%)
TAM	CM	TAM	CM	ACC(%)	EER(%)
✓	✓	$\times$	$\times$	99.37	2.881
✓	✓	✓	✓	99.49	2.610
✓	$\times$	✓	$\times$	99.51	2.570
$\times$	✓	$\times$	✓	99.49	2.585
$\times$	✓	✓	$\times$	99.54	2.468
✓	$\times$	$\times$	✓	99.60	2.186

3.3 Ablation Experiments

3.3.1 The efficiency of multi-scale branches

To validate the necessity of multi-scale branches in recognition performance, we conducted several ablation experiments on the PolyU-FKP dataset. The entire experimental process ensured consistent network layers to ensure result comparability. The experimental results are shown in Table 1. First, we tested the recognition performance using only the large, medium, and tiny individual branches separately. The experimental results showed that the medium-scale branch outperformed the large-scale and tiny-scale branches, performing the best. Next, based on the medium-scale branch, we added the large-scale and tiny-scale branches for combination testing. The results indicated that the combination of medium-scale and tiny-scale branches outperformed the combination of medium-scale and large-scale branches, further validating the importance of the tiny-scale branch in the multi-scale structure. Based on the performance of the three branches in the experiments and the analysis of their importance, we allocated appropriate network layers to each branch. Specifically, the large-scale branch was assigned six layers, the medium-scale branch 36 layers, and the tiny-scale branch 12 layers. This allocation of network layers effectively improved the model’s recognition performance.

3.3.2 The efficiency of dual-order texture extraction

We conduct ablation experiments to evaluate the importance of dual-order texture features and the contributions of the Triplet Attention Mechanism (TAM) and the Competition Mechanism (CM) for first-order and second-order textures. These experiments isolate and elucidate the impact of each mechanism on overall model performance. The configurations of the ablation experiments on the PolyU-FKP dataset are as follows: (1) Only retaining the first-order texture extraction, (2) TAM and CM for both first-order and second-order textures, (3) TAM for both first-order and second-order textures, (4) CM for both first-order and second-order textures, (5) CM for first-order texture and TAM for second-order texture, and (6) TAM for first-order texture and CM for second-order texture.

As shown in Table 2, configuration (1) is compared with (2), and it is concluded that retaining dual-order features is significantly better than retaining only first-order features. The highest performance was observed in configuration (5), indicating that TAM effectively captured the essential features of first-order textures. In contrast, the CM was more suitable for handling the complexity and feature differentiation in second-order textures. This suggests that first-order textures benefit more from TAM’s focused attention on salient features, whereas second-order textures require effectively differentiating between significantly varying directional structure features.

4 Conclusion

In this paper, we propose a novel FKP recognition network termed DOTCNet. This network incorporates multi-scale branches and the DTCM to achieve comprehensive feature extraction. For DTCM, detailed texture features are captured by the first-order LGF. At the same time, inter-dimensional dependencies are established through rotational operations and residual transformations within the triplet attention mechanism, preserving the cross-dimensional texture features. Subsequently, structural features are captured by the second-order LGF and processed via a competitive mechanism to differentiate feature directions. Finally, the dual-order features are concatenated to achieve comprehensive feature extraction. To validate the effectiveness of the proposed method, we conducted extensive experiments on the open-access dataset. The experimental results indicate that our method has significant advantages over several other methods. Future research directions include establishing the consistency of templates for left and right FKP images. This approach utilizes registered left (right) FKP images to verify the identity of right (left) FKP images.

References

[1] Yang, Z., Teoh, A.B.J., Zhang, B., et al.: Physics-driven spectrum-consistent federated learning for palmprint verification. International Journal of Computer Vision pp. 1-16 (2024)
[2] Hattab, A., Behloul, A.: Face-iris multimodal biometric recognition system based on deep learning. Multimedia Tools and Applications 83(14), 43349-43376 (2024)
[3] Su, L., Fei, L., Zhang, B., et al.: Complete region of interest for unconstrained palmprint recognition. IEEE Transactions on Image Processing (2024)
[4] Fei, L., Zhang, B., Wen, J., et al.: Jointly learning compact multi-view hash codes for few-shot fkp recognition. Pattern Recognition 115, 107894 (2021)
[5] Yang, Y., Fei, L., Alshehri, A.H., et al.: Joint multi-type feature learning for multi-modality fkp recognition. Engineering Applications of Artificial Intelligence 126, 106960 (2023)
[6] Guo, Z., Zhang, D., Zhang, L., et al.: Palmprint verification using binary orientation co-occurrence vector. Pattern Recognition Letters 30(13), 1219-1227 (2009)
[7] Zhang, D., Lu, G., Zhang, L., et al.: Finger-knuckle-print verification. Advanced biometrics pp. 85-109 (2018)
[8] Li, S., Fei, L., Zhang, B., et al.: Hand-based multimodal biometric fusion: A review. Information Fusion p. 102418 (2024)
[9] Hong, H.G., Lee, M.B., Park, K.R.: Convolutional neural network-based finger-vein recognition using nir image sensors. Sensors 17(6), 1297 (2017)
[10] Kim, W., Song, J.M., Park, K.R.: Multimodal biometric recognition based on convolutional neural network by the fusion of finger-vein and finger shape using near-infrared (nir) camera sensor. Sensors 18(7), 2296 (2018)
[11] Cheng, K.H., Kumar, A.: Accurate 3d finger knuckle recognition using auto-generated similarity functions. IEEE Transactions on Biometrics, Behavior, and Identity Science 3(2), 203-213 (2021)
[12] Li, S., Zhang, B., Fei, L., et al.: Learning sparse and discriminative multimodal feature codes for finger recognition. IEEE Transactions on Multimedia 25, 805-815 (2021)
[13] Li, S., Zhang, B., Fei, L., et al.: Joint discriminative feature learning for multimodal finger recognition. Pattern Recognition 111, 107704 (2021)
[14] Aliraid, R., Ouamane, A.: A novel descriptor (lgbq) based on gabor filters. Multimedia Tools and Applications 83(4), 11669-11686 (2024)
[15] Chen, P., Li, W., Sun, L., et al.: Lgcn: learnable gabor convolution network for human gender recognition in the wild. IEICE TRANSACTIONS on Information and Systems 102(10), 2067-2071 (2019)
[16] Misra, D., Nalamada, T., Arasanipalai, A.U., et al.: Rotate to attend: Convolutional triplet attention module. In: IEEE/CVF winter conference on applications of computer vision. pp. 3139-3148 (2021)
[17] Liang, X., Yang, J., Lu, G., et al.: Compnet: Competitive neural network for palmprint recognition using learnable gabor kernels. IEEE Signal Processing Letters 28, 1739-1743 (2021)
[18] Zhang, L., Zhang, L., Zhang, D.: Finger-knuckle-print: a new biometric identifier. In: IEEE international conference on image processing. pp. 1981-1984. (2009)
[19] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[20] Gao, C., Yang, Z., Zhu, M., et al.: Scale-aware competition network for palmprint recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 4580-4584. (2024)
[21] Attia, A., Mazaa, S., Akhtar, Z., et al.: Deep learning-driven palmprint and finger knuckle pattern-based multimodal person recognition system. Multimedia Tools and Applications 81(8), 10961-10980 (2022)
[22] Song, J.M., Kim, W., Park, K.R.: Finger-vein recognition based on deep densenet using composite image. IEEE Access 7, 66845-66863 (2019)
[23] Yang, Z., Xia, W., Qiao, Y., et al.: CO3Net: Coordinate-aware contrastive competitive neural network for palmprint recognition. IEEE Transactions on Instrumentation and Measurement (2023)
[24] Yang, Z., Huangfu, H., Leng, L., et al.: Comprehensive competition mechanism in palmprint recognition. IEEE Transactions on Information Forensics and Security (2023)