License: arXiv.org perpetual non-exclusive license
arXiv:2401.07525v2 [cs.CL] 17 Jan 2024

TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit

Abstract

Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation. Recently, pretrained large language models have further enhanced the effectiveness by leveraging richer textual information in user profiles and job descriptions apart from user behavior features and job metadata. However, the general domain-oriented design struggles to capture the unique structural information within user profiles and job descriptions, leading to a loss of latent semantic correlations. We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings. TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level. Experiments on a real-world LinkedIn dataset show significant performance improvements, proving its effectiveness in person-job fit tasks.

Index Terms—  person-job fit, structured text, multi-task

1 Introduction

Refer to caption
Fig. 1: The framework of TAROT. Gray boxes refer to different pretraining tasks at each level.

Reducing unemployment is a permanent theme for labor and government, especially during epidemic [1]. Enhancing recruitment efficiency, such as Person-Job fit accuracy, can significantly lower unemployment rates, reduce expensive recruitment costs and eliminate job seekers’ wasted efforts [2].

Traditional efforts on Person-Job fit intent to utilize features from user behaviors or job metadata, like collaborative filtering for job recommendation [3, 4]. The rapid development of online recruitment platforms like LinkedIn and Indeed has facilitated the use of deep learning to obtain text embeddings from large-scale user profiles and job descriptions [5, 6]. Recently, large language models (LLMs, e.g., BERT [7] and GPT-3 [8]) have proven to be effective in natural language processing and understanding, which become a new choice for learning text representations [9, 10, 11]. However, text organized in structures and domain-specific semantics in Person-Job fit may lead to failures of general domain text-oriented LLMs.

The practical scenario raises new challenges in pretraining LLMs for semi-structured data from specific domains. Firstly, pretrained LLMs are often oriented to general domain corpora, being of low relevance or even contradictory with domain-specific corpora (e.g., abbreviations), leading to embedding collapse of LLMs without domain-specific pretraining [12, 13]. Secondly, unlike plain text, texts in user profiles and job descriptions are primarily organized in a domain-specific hierarchical structures since people tend to format them for better illustration of their purposes. Such domain-specific information can undoubtedly promote models in understanding semantics [14, 15, 16, 17, 18], while it is ignored in current approaches, leaving an improving opportunity.

To tackle these challenges, we propose a hierarchically designed multitask framework TAROT to co-pretrain large language models by incorporating structure information. As Figure 1 shows, some unstructured job descriptions are segmented by LinkedIn services according to pre-defined sections, constituting recruitment data together with naturally structural user profiles and the rest structured jobs. To match the structure, we elaborate TAROT with corresponding hierarchical structures: sentence \rightarrow section \rightarrow individual \rightarrow interaction level from bottom to top. The semantics of sentences are extracted by BERT, and upper-level embeddings are derived from the attention fusion layer on the lower level. Interaction between job and user embeddings is enhanced via a cross-attention layer so that they can be adaptively adjusted based on needs from the other side. In addition, hierarchical pretraining tasks are designed to encourage integration of structural information and to constrain the model on learning domain-relevant semantics at each level. We use the output embeddings in Person-Job Fit downstream tasks for evaluations. Experimental results on two tasks demonstrate the superiority of TAROT over other baseline methods. The main contributions can be summarized as follows:

  • We propose a hierarchically structured framework for representation learning of person-job fit domain-specific text data as a complement to traditional features.

  • We propose multi-grained pretraining tasks specifically for the person-job fit area.

  • Extensive experiments are conducted to verify the effectiveness of our design and the benefits to downstream tasks.

2 TAROT

2.1 Preliminaries

We denote the set of users as U={u}𝑈𝑢U=\{u\}italic_U = { italic_u } and jobs as J={j}𝐽𝑗J=\{j\}italic_J = { italic_j }. Job descriptions j𝑗jitalic_j are divided into sections Sjsuperscript𝑆𝑗S^{j}italic_S start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT: Responsibilities, Qualifications, Requirements, Job Title, Functions, Skills, Benefits and Company; and profile sections Susuperscript𝑆𝑢S^{u}italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT include Summary, Headline, Education, Position and Skills. Sections consist of sentences like Sj=[s1j,,skj]superscript𝑆𝑗subscriptsuperscript𝑠𝑗1subscriptsuperscript𝑠𝑗𝑘S^{j}=[s^{j}_{1},\cdots,s^{j}_{k}]italic_S start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = [ italic_s start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] where k𝑘kitalic_k is the number of sentences in Sjsuperscript𝑆𝑗S^{j}italic_S start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT. The objective of Person-job fit is to predict the matching degree between u𝑢uitalic_u and j𝑗jitalic_j based on the output embedding of a learned language model.

2.2 Hierarchical Structured Language Model

2.2.1 Language Model

As a pretraining language model, BERT [7] has demonstrated promising capabilities in natural language processing tasks in recent years. To empower BERT, TAROT continue-pretrains BERT on large-scale corpora from user profiles and job descriptions on LinkedIn. Sentences are fed into TAROT’s language model section-by-section to obtain embeddings.

2.2.2 Attention Fusion Layer

As a hierarchically structured model, it is crucial for TAROT to aggregate current level information for the upper level. Section level representations require measuring the importance of different sentences, while individual level embeddings demand distinguishments between sections. Therefore, we adopt the attention-based fusion method [19] to adaptively learn the difference for embeddings at these two levels. Take profile representation learning at the individual level for example. The embedding sequence of sections is denoted as Eu=[E*u]superscript𝐸𝑢delimited-[]subscriptsuperscript𝐸𝑢E^{u}=[E^{u}_{*}]italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = [ italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ]. Mathematically, the attention-based fusion embedding is generated as:

E~u=𝐏𝐨𝐨𝐥𝐢𝐧𝐠(E*u),Eu=E~u+𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧(Q=E~u,K=Eu,V=Eu).formulae-sequencesuperscript~𝐸𝑢𝐏𝐨𝐨𝐥𝐢𝐧𝐠subscriptsuperscript𝐸𝑢superscriptsubscript𝐸𝑢superscript~𝐸𝑢𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧formulae-sequence𝑄superscript~𝐸𝑢formulae-sequence𝐾superscript𝐸𝑢𝑉superscript𝐸𝑢\begin{split}\tilde{E}^{u}&=\textbf{Pooling}(E^{u}_{*}),\\ E_{u}^{\prime}&=\tilde{E}^{u}+\textbf{Attention}(Q=\tilde{E}^{u},K=E^{u},V=E^{% u}).\end{split}start_ROW start_CELL over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_CELL start_CELL = Pooling ( italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL = over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT + Attention ( italic_Q = over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_K = italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_V = italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) . end_CELL end_ROW (1)

The pooled embedding E~usuperscript~𝐸𝑢\tilde{E}^{u}over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT guides the attention fusion layer to acquire appropriate weights for each section and generate global context-aware individual representations Eusuperscriptsubscript𝐸𝑢E_{u}^{\prime}italic_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for user u𝑢uitalic_u.

2.2.3 Cross-Attention Layer

Empirical evidence suggests that different users will be attracted by different sections of the same job description, and this is similar in the recruiter-profile relationship. It inspires us that profile or job description representation learning should not be isolated. Hence, we design the cross-attention layer where the job-oriented attention takes job embeddings Ejsubscriptsuperscript𝐸𝑗E^{\prime}_{j}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as queries on the profile embeddings Eusubscriptsuperscript𝐸𝑢E^{\prime}_{u}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and obtain Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Similarly, we have Ausubscript𝐴𝑢A_{u}italic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT from user-oriented attention, and concatenate Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Ausubscript𝐴𝑢A_{u}italic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT as the output. It is then combined with Ejsubscriptsuperscript𝐸𝑗E^{\prime}_{j}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT or Eusubscriptsuperscript𝐸𝑢E^{\prime}_{u}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT as the final embedding for jobs/user.

2.3 Hierarchical Pretraining Tasks

2.3.1 Sentence-level: Masked Language Model

Although job descriptions and user profiles are semi-structured data, the sequence of sentences still remains a critical role. Therefore, the classical Masked Language Model (MLM) [7] is adopted to allow TAROT to emphasize more on the recruitment-related corpora.

2.3.2 Section-level: Experience Classification

Given a section Susuperscript𝑆𝑢S^{u}italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT from user profile u𝑢uitalic_u, the Experience Classification task is defined as a multi-class classification that utilizes the context under Susuperscript𝑆𝑢S^{u}italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT to predict its section name from {Summary,Headline,Education,Position,Skills}𝑆𝑢𝑚𝑚𝑎𝑟𝑦𝐻𝑒𝑎𝑑𝑙𝑖𝑛𝑒𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑆𝑘𝑖𝑙𝑙𝑠\{Summary,Headline,Education,Position,Skills\}{ italic_S italic_u italic_m italic_m italic_a italic_r italic_y , italic_H italic_e italic_a italic_d italic_l italic_i italic_n italic_e , italic_E italic_d italic_u italic_c italic_a italic_t italic_i italic_o italic_n , italic_P italic_o italic_s italic_i italic_t italic_i italic_o italic_n , italic_S italic_k italic_i italic_l italic_l italic_s }. Technically, sentences of Susuperscript𝑆𝑢S^{u}italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT will be fed into BERT to get the representations, and then the output of the entire section is taken as the input of a Multi-Layer Perceptron (MLP) to predict the label of section name y~Susubscript~𝑦superscript𝑆𝑢\tilde{y}_{S^{u}}over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Experience Classification is to minimize the cross-entropy loss between y~Susubscript~𝑦superscript𝑆𝑢\tilde{y}_{S^{u}}over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and the real section name label ySusubscript𝑦superscript𝑆𝑢y_{S^{u}}italic_y start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT of nExpsubscript𝑛𝐸𝑥𝑝n_{Exp}italic_n start_POSTSUBSCRIPT italic_E italic_x italic_p end_POSTSUBSCRIPT samples. For convenience, the objective is formulated as Eq. 2, e.g., Experience Classification objective is LExp(nExp,y~Su,ySu)subscript𝐿𝐸𝑥𝑝subscript𝑛𝐸𝑥𝑝subscript~𝑦superscript𝑆𝑢subscript𝑦superscript𝑆𝑢L_{Exp}(n_{Exp},\tilde{y}_{S^{u}},y_{S^{u}})italic_L start_POSTSUBSCRIPT italic_E italic_x italic_p end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT italic_E italic_x italic_p end_POSTSUBSCRIPT , over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ):

Ltask(ntask,y~,y)=1ntaskylogy~+(1y)log(1y~).subscript𝐿𝑡𝑎𝑠𝑘subscript𝑛𝑡𝑎𝑠𝑘~𝑦𝑦1subscript𝑛𝑡𝑎𝑠𝑘𝑦~𝑦1𝑦1~𝑦L_{task}(n_{task},\tilde{y},y)=-\frac{1}{n_{task}}\sum y\cdot\log\tilde{y}+(1-% y)\cdot\log(1-\tilde{y}).italic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT , over~ start_ARG italic_y end_ARG , italic_y ) = - divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT end_ARG ∑ italic_y ⋅ roman_log over~ start_ARG italic_y end_ARG + ( 1 - italic_y ) ⋅ roman_log ( 1 - over~ start_ARG italic_y end_ARG ) . (2)

2.3.3 Individual-level: Attribute Validation

To focus on some co-occurring attributes for embeddings on both sides, we carefully design the attribute validation task at the individual level so that key attributes are better incorporated. The attribute validation task leverages embeddings from previous layers to predict attributes of users and jobs. Here Skill is chosen as the key individual information. The reason is that under the person-job fit scenario, a match between a job and a user profile highly depends on the skills commanded by the user and required by the job. The label is the unified “skill_ids” that are extracted by LinkedIn service from the skills section. Representations for attribute validation are obtained from the individual-level attention fusion layer in two steps: skill section will be removed from E~usuperscript~𝐸𝑢\tilde{E}^{u}over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT during pooling (denoted as E~mskusubscriptsuperscript~𝐸𝑢𝑚𝑠𝑘\tilde{E}^{u}_{msk}over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_s italic_k end_POSTSUBSCRIPT), and it is also masked when conducting self-attention:

Eu,msk=E~msku+𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧(Q=E~msku,K=Eu,V=Eu).subscriptsuperscript𝐸𝑢𝑚𝑠𝑘subscriptsuperscript~𝐸𝑢𝑚𝑠𝑘𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧formulae-sequence𝑄subscriptsuperscript~𝐸𝑢𝑚𝑠𝑘formulae-sequence𝐾superscript𝐸𝑢𝑉superscript𝐸𝑢E^{\prime}_{u,msk}\!\!=\!\tilde{E}^{u}_{msk}\!+\!\textbf{Attention}(Q\!=\!% \tilde{E}^{u}_{msk},K\!=\!E^{u},V\!=\!E^{u}).italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u , italic_m italic_s italic_k end_POSTSUBSCRIPT = over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_s italic_k end_POSTSUBSCRIPT + Attention ( italic_Q = over~ start_ARG italic_E end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m italic_s italic_k end_POSTSUBSCRIPT , italic_K = italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT , italic_V = italic_E start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) . (3)

Predicted label y~Attsubscript~𝑦𝐴𝑡𝑡\tilde{y}_{Att}over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A italic_t italic_t end_POSTSUBSCRIPT is generated through a single layer MLP on Eu,msksubscriptsuperscript𝐸𝑢𝑚𝑠𝑘E^{\prime}_{u,msk}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u , italic_m italic_s italic_k end_POSTSUBSCRIPT. Multi-label one-versus-all loss is used to transform it into a series of binary classification problems, and the Attribute Validation objective can be denoted as L(nAtt,y~Att,yAtt)𝐿subscript𝑛𝐴𝑡𝑡subscript~𝑦𝐴𝑡𝑡subscript𝑦𝐴𝑡𝑡L(n_{Att},\tilde{y}_{Att},y_{Att})italic_L ( italic_n start_POSTSUBSCRIPT italic_A italic_t italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A italic_t italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_A italic_t italic_t end_POSTSUBSCRIPT ). A similar objective can be defined for job representations.

2.3.4 Interaction-level: Application Classification

Application Classification task is designed to predict whether a user will apply for a job in order to strengthen interactions between user profile embedding and job description embedding. It is worth mentioning that negative samples are randomly generated and accounts for 3/4 of the dataset. The reason is that the collected application data highly rely on job recommendations, and users only react to recommended jobs that are already considered to be suitable ones in the system. Practically, when recommended a job, the user can choose to skip/dismiss/save/apply for this job. The “skip” action is excluded while “apply” and “save” are considered positive and “dismiss” is negative. The final output of TAROT will be fed into a single layer MLP to generate the predicted label y~Appsubscript~𝑦𝐴𝑝𝑝\tilde{y}_{App}over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A italic_p italic_p end_POSTSUBSCRIPT and trained with corresponding object L(nApp,y~App,yApp)𝐿subscript𝑛𝐴𝑝𝑝subscript~𝑦𝐴𝑝𝑝subscript𝑦𝐴𝑝𝑝L(n_{App},\tilde{y}_{App},y_{App})italic_L ( italic_n start_POSTSUBSCRIPT italic_A italic_p italic_p end_POSTSUBSCRIPT , over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A italic_p italic_p end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_A italic_p italic_p end_POSTSUBSCRIPT ).

2.4 Pretraining and Downstream Evaluation

Text from job descriptions and user profiles are co-pretrained with hierarchical tasks, and the overall objective can be formulated as:

L=*λ*L*,where *{MLM,Exp,Att,App},L=\sum_{*}\lambda_{*}L_{*},\text{where }*\in\{MLM,Exp,Att,App\},italic_L = ∑ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT * end_POSTSUBSCRIPT , where * ∈ { italic_M italic_L italic_M , italic_E italic_x italic_p , italic_A italic_t italic_t , italic_A italic_p italic_p } , (4)

where λ*subscript𝜆\lambda_{*}italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT is the hyper-parameter for the corresponding pretraining task. We select two downstream tasks including job recommendations for users and user recommendations for jobs (recruiters).

3 Experiments

Task Job Recommendation Candidate Recommendation
Models AUC Recall@3 Precision@3 NDCG@3 MRR AUC Recall@3 Precision@3 NDCG@3 MRR
PJFNN - - - - - - - - - -
PJFNN+Bert +2.538% +2.280% +0.899% +2.027% +0.582% +0.334% +1.464% +1.569% +1.768% +0.735%
PJFNN+TAROT +4.477% +3.941% +4.780% +3.896% +3.831% +6.977% +8.176% +9.765% +13.494% +9.007%
BPJFNN +4.658% -0.732% +2.130% -1.211% -0.436% +2.765% +2.990% -0.959% +2.475% +0.827%
BPJFNN+Bert +5.057% +0.929% +3.739% -0.211% +1.334% +3.229% +4.637% +1.831% +4.537% +2.987%
BPJFNN+TAROT +6.163% +4.110% +6.152% +1.003% +3.177% +7.385% +10.189% +8.718% +14.555% +7.721%
APJFNN +9.679% +3.941% +6.294% +0.684% +3.637% +6.198% +7.444% +12.119% +11.314% +6.985%
APJFNN+Bert +10.694% +4.899% +7.856% +2.237% +4.680% +6.866% +9.457% +15.606% +12.080% +7.675%
APJFNN+TAROT +11.891% +6.278% +10.554% +4.396% +5.941% +8.295% +14.216% +18.309% +17.030% +10.386%
Table 1: Performance comparison on two downstream tasks. Results are relative improvements compared to PJFNN.

3.1 Experiment Settings

Task Job Recommendation
Models AUC HR@1 NDCG@5 NDCG@25 MRR
w/o MLM -4.9% -11.0% -8.3% -10.0% -8.7%
w/o EXP -5.1% -6.8 % -4.5% -8.0 % -8.7%
w/o ATT -4.6% -2.3 % -4.9% -2.5 % +0.4%
w/o APP -9.5% -13.7% -11.3% -9.3 % -10.5%
Table 2: Multitask ablation study with TAROT as baseline.
Task Job Recommendation Candidate Recommendation
Models AUC NDCG@5 MRR AUC NDCG@5 MRR
OF - - - - - -
OF+BERT +0.3% -0.7 % -1.9% +1.8% +0.6% +0.6%
OF+TAROT +6.0% +12.9% +6.0% +5.5% +4.3% +4.5%
Table 3: Improvement to online service features. “OF” refers to online features used in LinkedIn system.

Dataset The data for pre-training is collected from user activity records of LinkedIn with anonymized user profiles and job descriptions for security. We filter out incomplete user profiles, and the training data contains over 800k job application records, including 193k users and 331k jobs. There are two downstream tasks.

For job recommendation task, the data is composed of job and user profile pairs. “Skip” and “Dismiss” actions are labeled negative, and “Save” and “Apply” are positive. The dataset contains 31k users and 54k jobs, with 150k samples. The candidate recommendation task recommends users to recruiters according to their posting jobs. When a user is recommended, the recruiter can contact or skip the candidate. The dataset contains 133k users and 19k jobs, with 150k samples. To prevent data leakage, we select a different user group than the training data in the data of these two tasks.

Compared Method We compare TAROT with variants of PJFNNs to examine the benefits of extracting semi-structured text. Hence, the compared methods are divided into three types: (1) PJFNNs [5, 6] are a series of models that are proven to be effective for person-job fit; (2) PJFNNs + BERT that uses BERT as a plugin for semantic embeddings. (3) PJFNNs + TAROT utilizes TAROT embeddings that is similar to PJFNNs+BERT.

Implementation Details We choose the small BERT model [20] with 512 hidden neurons. Adam is utilized as the optimizer and the learning rate is 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. We use grid search strategy to find best hyper-parameters. AUC, top-3 recall rate (Recall@3), precision rate (Precision@3), Normalized Discounted Cumulative Gain (NDCG@3) and Mean Reciprocal Rank (MRR) are used to evaluate model performance. We take PJFNN as the foundation and other results are relative improvements against it.

3.2 Downstream Performance Comparison

The results are organized in Table 1. Compared to PJFNNs, additional BERT embeddings can provide improvements on most metrics, showing the superiority of incorporating semantics information from pretrain LLMs. We can also discover that TAROT significantly enhance the recommendation performance on both tasks, demonstrating that embeddings from the hierarchically co-pretraining framework are more expressive and informative than the pretrain-based semantic plugin BERT. Besides, TAROT not only achieves remarkable performance on the job recommendation task that is highly correlated with the pretraining Application Classification task, but also obtains impressive gains on the candidate recommendation task. Note that “headhunter” is not explicitly included in our framework and headhunters will post numerous job so it is also not in the individual-level. However, the candidate recommendation task is headhunter-oriented, which implies that even though there is no corresponding pretraining task, TAROT embeddings can still be beneficial to the generalization of models to unseen Person-Job fit downstream tasks.

3.3 Ablation Study

To evaluate the design of multitask training in our framework, we conduct ablation studies by removing different pretraining tasks, and observe the performance on the downstream job recommendation task as there are corresponding entire pretrain task sets. Here we add the top-1 hit rate (HR@1) metric and the results are shown in Table 2. “w/o App”, “w/o Att”, “w/o Exp” and “w/o MLM” refer to pretraining without the corresponding task. From the results, we can see that Experience Classification and Attribute Validation are essential to our downstream tasks as removing them will degrade the performance. The worst result in w/o App indicates that Application Classification plays the most critical role because it further empowers information interactions between job descriptions and user profiles. In summary, all the results prove the effectiveness of our multitask co-pretraining framework.

3.4 Improvement to Online Service Features

We also combine TAROT embeddings with features used in LinkedIn online service. From Table. 3 we can find that TAROT embeddings can provide additional gains to currently-used features and are more effective than BERT, which implies its value in practice. Notably, individual-level embeddings can be stored to speed up the inference in online products.

4 Conclusion

In this paper, we propose TAROT to provide expressive embeddings for person-job fit applications. To fully leverage the text and interaction information from job descriptions and user profiles, we design a hierarchical multitask co-pretraining framework for a better understanding of the semantic information and correlations of them. To evaluate the effectiveness, we conduct comprehensive experiments on the real data of LinkedIn with several baselines. The experimental results show that our framework can significantly improve downstream task performance and promote the online service feature in LinkedIn.

References

  • [1] Richard Layard, Stephen Nickell, and Richard Jackman, “The unemployment crisis,” 1994.
  • [2] Society For Human Resource Management, “Human capital benchmarking report,” 2016.
  • [3] Yingya Zhang, Cheng Yang, and Zhixiang Niu, “A research of job recommendation system based on collaborative filtering,” in 2014 Seventh International Symposium on Computational Intelligence and Design, 2014, vol. 1, pp. 533–538.
  • [4] Yao Lu, Sandy Ingram, and Denis Gillet, “A recommender system for job seeking and recruiting website,” 05 2013, pp. 963–966.
  • [5] Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Liang Jiang, Enhong Chen, and Hui Xiong, “Enhancing person-job fit for talent recruitment: An ability-aware neural network approach,” 06 2018, pp. 25–34.
  • [6] Shuqing Bian, Wayne Xin Zhao, Yang Song, Tao Zhang, and Ji-Rong Wen, “Domain adaptation for person-job fit with transferable deep global match network,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Nov. 2019, pp. 4810–4820, Association for Computational Linguistics.
  • [7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, June 2019, pp. 4171–4186, Association for Computational Linguistics.
  • [8] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  • [9] Shuqing Bian, Xu Chen, Wayne Xin Zhao, Kun Zhou, Yupeng Hou, Yang Song, Tao Zhang, and Ji-Rong Wen, “Learning to match jobs with resumes from sparse interaction data using multi-view co-teaching network,” in Proceedings of the 29th ACM International Conference on Information and Knowledge Management, New York, NY, USA, 2020, CIKM ’20, p. 65–74, Association for Computing Machinery.
  • [10] Jiayi Liao, Xu Chen, and Lun Du, “Concept understanding in large language models: An empirical study,” 2023.
  • [11] Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, and Dongmei Zhang, “Text-to-image generation for abstract concepts,” 2023.
  • [12] Nils Reimers and Iryna Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3982–3992.
  • [13] Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li, “On the sentence embeddings from pre-trained language models,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 9119–9130.
  • [14] Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, and Dongmei Zhang, “Tabularnet: A neural network architecture for understanding semantic structures of tabular data,” in KDD, 2021, pp. 322–331.
  • [15] Lun Du, Xu Chen, Fei Gao, Qiang Fu, Kunqing Xie, Shi Han, and Dongmei Zhang, “Understanding and improvement of adversarial training for network embedding from an optimization perspective,” in WSDM, 2022, pp. 230–240.
  • [16] Xu Chen, Yuanxing Zhang, Lun Du, Zheng Fang, Yi Ren, Kaigui Bian, and Kunqing Xie, “Tssrgcn: Temporal spectral spatial retrieval graph convolutional network for traffic flow forecasting,” in 2020 ICDM. IEEE, 2020, pp. 954–959.
  • [17] Xu Chen, Junshan Wang, and Kunqing Xie, “Trafficstream: A streaming traffic flow forecasting framework based on graph neural networks and continual learning,” in IJCAI, 2021.
  • [18] Xu Chen, Qiu Qiu, Changshan Li, and Kunqing Xie, “Graphad: A graph neural network for entity-wise multivariate time-series anomaly detection,” in ACM SIGIR, 2022, pp. 2297–2302.
  • [19] Zihang Dai, Guokun Lai, Yiming Yang, and Quoc Le, “Funnel-transformer: Filtering out sequential redundancy for efficient language processing,” 06 2020.
  • [20] Vincent Micheli, Martin d’Hoffschmidt, and François Fleuret, “On the importance of pre-training data volume for compact language models,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 7853–7858, Association for Computational Linguistics.