License: arXiv.org perpetual non-exclusive license
arXiv:2401.10271v1 [cs.DB] 04 Jan 2024

Querying Triadic Concepts through Partial or Complete Matching of Triples

[Uncaptioned image] Pedro Henrique B. Ruas
Department of Computer Science and Engineering
University of Quebec in Outaouais
Gatineau, Quebec, Canada
[email protected]
&[Uncaptioned image] Rokia Missaoui
Department of Computer Science and Engineering
University of Quebec in Outaouais
Gatineau, Quebec, Canada
[email protected]
&[Uncaptioned image] Mohamed Hamza Ibrahim
Department of Computer Science and Engineering
University of Quebec in Outaouais
Gatineau, Quebec, Canada
[email protected]
http://w3.uqo.ca/missaoui/
Abstract

In this paper, we introduce a new method for querying triadic concepts through partial or complete matching of triples using an inverted index, to retrieve already computed triadic concepts that contain a set of terms in their extent, intent and/or modus. As opposed to the approximation approach described in Ananias, this method (i) does not need to keep the initial triadic context or its three dyadic counterparts, (ii) avoids the application of derivation operators on the triple components through context exploration, and (iii) eliminates the requirement for a factorization phase to get triadic concepts as the answer to one-dimensional queries. Additionally, our solution introduces a novel metric for ranking the retrieved triadic concepts based on their similarity to a given query. Lastly, an empirical study is primarily done to illustrate the effectiveness and scalability of our approach against the approximation one. Our solution not only showcases superior efficiency, but also highlights a better scalability, making it suitable for big data scenarios.

Keywords Data mining  \cdot Triadic Concept Analysis  \cdot Querying Triadic Concepts  \cdot Triple Matching

1 Introduction

In information systems, users can be overwhelmed by data and even patterns (knowledge), yet they are often interested by specific knowledge nuggets or would like to find some specific ones. The scope for exploring patterns can vary widely among users and often evolves over time, with a common preference for an exploratory and iterative process that uncovers patterns using relational operations on lattices such as selection and projection (Kwuida et al., 2010).

In this particular context, Formal Concept Analysis (FCA) presents a valuable mathematical framework for representing and extracting knowledge from data. The inception of FCA dates back to the 1980s when Wille and Ganter first proposed it (Ganter and Wille, 1999) as a framework for constructing and exploring concept lattices and extracting association rules.

Despite its success in various applications, the classical approach may not always be sufficient, and some situations require an extension with an additional dimension to obtain a more complete characterization and representation of the data. (Lehmann and Wille, 1995) took the initiative to extend FCA in their work, introducing Triadic Concept Analysis (TCA) to describe a ternary relationship among object, attribute, and condition sets. An instance where such an extension is beneficial is in a social resource sharing system, commonly referred to as a folksonomy (Jäschke et al., 2008), where users, resources, and keywords (tags) establish a ternary relationship through users’ annotations. In their paper, (Lehmann and Wille, 1995) proposed a three-dimensional graphical representation known as a trilattice.

A few researchers have attempted to simplify the visualization and navigation through triadic concepts (Missaoui et al., 2020; Rudolph et al., 2015) by proposing navigation strategies and graphical representations based on the classical Hasse diagram.

In our previous work (Missaoui et al., 2020), we introduced the T-iPred algorithm, which is an adaptation of the iPred algorithm (Baixeries et al., 2009) used for efficiently computing links between concepts in FCA. This adaptation aims to represent the Hasse diagram of triadic concepts, and its graphical representation facilitates the exploration and discovery of hidden knowledge within triadic contexts.

However, when exploring triadic concepts through the Hasse diagram, it is common to encounter a diagram with hundreds or even thousands of triadic concepts. Indeed, yet triadic contexts with a reduced number of objects, attributes, and conditions can produce a vast amount of triadic concepts.

In this regard, manually exploring triadic concepts can become an impractical task due to the large number of concepts involved. It would require the user to navigate through the entire diagram to analyze the concepts of interest. In this study, we propose an approach based on an inverted index which allows users to make queries on all triadic concepts. Users can search for a triple (A1,A2,A3)subscript𝐴1subscript𝐴2subscript𝐴3(A_{1},A_{2},A_{3})( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), and if the query corresponds to a triadic concept, the platform will return this concept. In case the searched triple is not a triadic concept, a set of more similar triadic concepts will be returned. Furthermore, our solution enables users to conduct searches through queries specifying only one dimension (objects or attributes or conditions), two dimensions (objects and attributes, objects and conditions, or attributes and conditions), or all three dimensions.

The structure of this article is as follows: Section 2 provides an overview of the theoretical foundation of Triadic Concept Analysis and inverted index, whereas Section 3 presents the related work. Section 4 presents the proposed approach to query triadic concepts. Section 5 describes an empirical study using both synthetics and real data sets. A conclusion and further investigations are given in Section 6.

2 Background

Within this section, we aim at introducing fundamental definitions of Formal Concept Analysis and Triadic Concept Analysis. Our primary focus will be on the triadic approach.

2.1 Formal Concept Analysis

Formal Concept Analysis (FCA) was introduced in (Ganter and Wille, 1999) as a branch of applied mathematics, which is based on a formalization of concept and concept hierarchy. It starts from a formal binary context 𝕂:=(𝒢,,)assign𝕂𝒢\mathbb{K}:=(\mathcal{G},\mathcal{M},\mathcal{I})blackboard_K := ( caligraphic_G , caligraphic_M , caligraphic_I ) where 𝒢𝒢\mathcal{G}caligraphic_G, \mathcal{M}caligraphic_M and \mathcal{I}caligraphic_I are a set of objects, a set of attributes, and a binary relation between 𝒢𝒢\mathcal{G}caligraphic_G and \mathcal{M}caligraphic_M respectively , to construct a concept (Galois) lattice whose nodes are formal concepts (maximal rectangles) described by an extent (set of objects) and an intent (set of attributes).

Given arbitrary subsets A𝒢𝐴𝒢A\subseteq\mathcal{G}italic_A ⊆ caligraphic_G and B𝐵B\subseteq\mathcal{M}italic_B ⊆ caligraphic_M, the following derivation operators are defined:
A={mgA,(g,m)},A𝒢formulae-sequencesuperscript𝐴conditional-set𝑚formulae-sequencefor-all𝑔𝐴𝑔𝑚𝐴𝒢A^{\prime{}}=\{m\in\mathcal{M}\mid\forall g\in A,(g,m)\in\mathcal{I}\},\;A% \subseteq\mathcal{G}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_m ∈ caligraphic_M ∣ ∀ italic_g ∈ italic_A , ( italic_g , italic_m ) ∈ caligraphic_I } , italic_A ⊆ caligraphic_G and
B={g𝒢mB,(g,m)},Bformulae-sequencesuperscript𝐵conditional-set𝑔𝒢formulae-sequencefor-all𝑚𝐵𝑔𝑚𝐵B^{\prime{}}=\{g\in\mathcal{G}\mid\forall m\in B,(g,m)\in\mathcal{I}\},\;B% \subseteq\mathcal{M}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_g ∈ caligraphic_G ∣ ∀ italic_m ∈ italic_B , ( italic_g , italic_m ) ∈ caligraphic_I } , italic_B ⊆ caligraphic_M
where Asuperscript𝐴A^{\prime{}}italic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the set of attributes common to all objects of A𝐴Aitalic_A and Bsuperscript𝐵B^{\prime{}}italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the set of objects sharing all attributes from B𝐵Bitalic_B.

The pair c=(A,B)𝑐𝐴𝐵c=(A,B)italic_c = ( italic_A , italic_B ) is called a formal concept of 𝕂𝕂\mathbb{K}blackboard_K with extent A𝐴Aitalic_A and intent B𝐵Bitalic_B if A=Bsuperscript𝐴𝐵A^{\prime{}}=Bitalic_A start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_B, and B=Asuperscript𝐵𝐴B^{\prime{}}=Aitalic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_A.

A partial order precedes-or-equals\preceq exists between two concepts c1=(A1,B1)subscript𝑐1subscript𝐴1subscript𝐵1c_{1}=(A_{1},B_{1})italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and c2=(A2,B2)subscript𝑐2subscript𝐴2subscript𝐵2c_{2}=(A_{2},B_{2})italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) A1A2iffabsentsubscript𝐴1subscript𝐴2\iff A_{1}\subseteq A_{2}⇔ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or, equivalently, B2B1subscript𝐵2subscript𝐵1B_{2}\subseteq B_{1}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT). The set 𝒞𝒞\mathcal{C}caligraphic_C of all concepts together with the partial order form a concept lattice.

2.2 Triadic Concept Analysis

Triadic Concept Analysis (TCA) was initially introduced by Lehmann and Wille (Lehmann and Wille, 1995; Wille, 1995) as an extension of Formal Concept Analysis (Ganter and Wille, 1999). It serves to analyze data described by three sets: K1subscript𝐾1K_{1}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (objects), K2subscript𝐾2K_{2}italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (attributes), and K3subscript𝐾3K_{3}italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT (conditions), along with a ternary relation YK1×K2×K3𝑌subscript𝐾1subscript𝐾2subscript𝐾3Y\subseteq K_{1}{\times}K_{2}{\times}K_{3}italic_Y ⊆ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. This structure is named a triadic context and denoted as 𝕂:=(K1,K2,K3,Y)assign𝕂subscript𝐾1subscript𝐾2subscript𝐾3𝑌\mathbb{K}:=(K_{1},K_{2},K_{3},Y)blackboard_K := ( italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_Y ). An example of such a context is illustrated in Table 1, representing customers’ purchases in K1={1,2,3,4,5,6}subscript𝐾1123456K_{1}=\{1,2,3,4,5,6\}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { 1 , 2 , 3 , 4 , 5 , 6 } from suppliers in K2={𝐏eter,𝐍elson,𝐑ick,𝐊evin,𝐒imon,𝐓revor}subscript𝐾2𝐏𝑒𝑡𝑒𝑟𝐍𝑒𝑙𝑠𝑜𝑛𝐑𝑖𝑐𝑘𝐊𝑒𝑣𝑖𝑛𝐒𝑖𝑚𝑜𝑛𝐓𝑟𝑒𝑣𝑜𝑟K_{2}=\{\textbf{P}eter,\textbf{N}elson,\textbf{R}ick,\textbf{K}evin,\textbf{S}% imon,\textbf{T}revor\}italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { P italic_e italic_t italic_e italic_r , N italic_e italic_l italic_s italic_o italic_n , R italic_i italic_c italic_k , K italic_e italic_v italic_i italic_n , S italic_i italic_m italic_o italic_n , T italic_r italic_e italic_v italic_o italic_r } of products in K3={𝐚ccessories,𝐛ooks,𝐜omputers,𝐝igitalcameras}subscript𝐾3𝐚𝑐𝑐𝑒𝑠𝑠𝑜𝑟𝑖𝑒𝑠𝐛𝑜𝑜𝑘𝑠𝐜𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑠𝐝𝑖𝑔𝑖𝑡𝑎𝑙𝑐𝑎𝑚𝑒𝑟𝑎𝑠K_{3}=\{\textbf{a}ccessories,\textbf{b}ooks,\textbf{c}omputers,\textbf{d}% igital\ cameras\}italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = { a italic_c italic_c italic_e italic_s italic_s italic_o italic_r italic_i italic_e italic_s , b italic_o italic_o italic_k italic_s , c italic_o italic_m italic_p italic_u italic_t italic_e italic_r italic_s , d italic_i italic_g italic_i italic_t italic_a italic_l italic_c italic_a italic_m italic_e italic_r italic_a italic_s }.

P𝑃Pitalic_P N𝑁Nitalic_N R𝑅Ritalic_R K𝐾Kitalic_K S𝑆Sitalic_S T𝑇Titalic_T
1111 abd abd ac ab a a
2222 ad abcd abd ad ad a
3333 abd ad ab ab a a
4444 abd abd ab ab ad a
5555 ad ad abd abc a ab
6666 abcd abcd abcd abcd abcd abcd
Table 1: A triadic context

The notation (a1,a2,a3)Ysubscript𝑎1subscript𝑎2subscript𝑎3𝑌(a_{1},a_{2},a_{3})\in Y( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ italic_Y indicates that the object a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT possesses the attribute a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT under the condition a3subscript𝑎3a_{3}italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. For instance, the value ac𝑎𝑐acitalic_a italic_c at the intersection of Row 1 and Column R𝑅Ritalic_R means that Customer 1 orders products a𝑎aitalic_a and c𝑐citalic_c from Supplier R𝑅Ritalic_R.

The triadic context 𝕂:=(K1,K2,K3,𝑌)assign𝕂subscript𝐾1subscript𝐾2subscript𝐾3𝑌\mathbb{K}:=(K_{1},K_{2},K_{3},\textit{Y})blackboard_K := ( italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , Y ) can be converted into three dyadic contexts as follows.

𝕂(1):=(K1,K2×K3,Y(1))witha1Y(1)(a2,a3)(a1,a2,a3)Y𝕂(2):=(K2,K1×K3,Y(2))witha2Y(2)(a1,a3)(a1,a2,a3)Y𝕂(3):=(K3,K1×K2,Y(3))witha3Y(3)(a1,a2)(a1,a2,a3)Yassignsuperscript𝕂1subscript𝐾1subscript𝐾2subscript𝐾3superscript𝑌1withsubscript𝑎1superscript𝑌1subscript𝑎2subscript𝑎3subscript𝑎1subscript𝑎2subscript𝑎3𝑌superscript𝕂2assignsubscript𝐾2subscript𝐾1subscript𝐾3superscript𝑌2withsubscript𝑎2superscript𝑌2subscript𝑎1subscript𝑎3subscript𝑎1subscript𝑎2subscript𝑎3𝑌superscript𝕂3assignsubscript𝐾3subscript𝐾1subscript𝐾2superscript𝑌3withsubscript𝑎3superscript𝑌3subscript𝑎1subscript𝑎2subscript𝑎1subscript𝑎2subscript𝑎3𝑌\displaystyle\begin{split}\mathbb{K}^{(1)}:=(K_{1},K_{2}\times K_{3},Y^{(1)})% \ \textrm{with}\ a_{1}Y^{(1)}(a_{2},a_{3})\Leftrightarrow(a_{1},a_{2},a_{3})% \in Y\\ \mathbb{K}^{(2)}:=(K_{2},K_{1}\times K_{3},Y^{(2)})\ \textrm{with}\ a_{2}Y^{(2% )}(a_{1},a_{3})\Leftrightarrow(a_{1},a_{2},a_{3})\in Y\\ \mathbb{K}^{(3)}:=(K_{3},K_{1}\times K_{2},Y^{(3)})\ \textrm{with}\ a_{3}Y^{(3% )}(a_{1},a_{2})\Leftrightarrow(a_{1},a_{2},a_{3})\in Y\\ \end{split}start_ROW start_CELL blackboard_K start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT := ( italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_Y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) with italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ⇔ ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ italic_Y end_CELL end_ROW start_ROW start_CELL blackboard_K start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT := ( italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_Y start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) with italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_Y start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ⇔ ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ italic_Y end_CELL end_ROW start_ROW start_CELL blackboard_K start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT := ( italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_Y start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT ) with italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_Y start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⇔ ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) ∈ italic_Y end_CELL end_ROW

A Triadic Concept (TC) or closed tri-set within a triadic context 𝕂𝕂\mathbb{K}blackboard_K is a triple (A1,A2,A3)subscript𝐴1subscript𝐴2subscript𝐴3(A_{1},A_{2},A_{3})( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) with A1K1subscript𝐴1subscript𝐾1A_{1}\subseteq K_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, A2K2subscript𝐴2subscript𝐾2A_{2}\subseteq K_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, A3K3subscript𝐴3subscript𝐾3A_{3}\subseteq K_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and A1×A2×A3Ysubscript𝐴1subscript𝐴2subscript𝐴3𝑌A_{1}{\times}A_{2}{\times}A_{3}\subseteq Yitalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⊆ italic_Y that is maximal in Y𝑌Yitalic_Y. In other words, none of the three subsets can be expanded without violating the ternary relation Y𝑌Yitalic_Y. Thus, this triple represents a maximal cuboid filled with ones (or crosses). For instance, the tri-set (5 6,KR,ab)56𝐾𝑅𝑎𝑏(5\,6,K\,R,a\,b)( 5 6 , italic_K italic_R , italic_a italic_b ) is not closed because 5 6×KR×ab3 4 5 6×KR×abY56𝐾𝑅𝑎𝑏3456𝐾𝑅𝑎𝑏𝑌5\,6\times K\,R\times a\,b\subsetneq 3\,4\,5\,6\times K\,R\times a\,b\subseteq Y5 6 × italic_K italic_R × italic_a italic_b ⊊ 3 4 5 6 × italic_K italic_R × italic_a italic_b ⊆ italic_Y. We use (A1,A2,A3)subscript𝐴1subscript𝐴2subscript𝐴3(A_{1},A_{2},A_{3})( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) or A1×A2×A3subscript𝐴1subscript𝐴2subscript𝐴3A_{1}\times A_{2}\times A_{3}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT interchangeably to refer to a triadic concept.

The terms extent, intent and modus refer to A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT of the concept, respectively. We propose to name the pair (A2,A3)subscript𝐴2subscript𝐴3({A}_{2},{A}_{3})( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) as the feature associated with A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

To compute triadic concepts, two derivation operators are introduced. Let 𝕂:=(K1,K2,K3,Y)assign𝕂subscript𝐾1subscript𝐾2subscript𝐾3𝑌\mathbb{K}:=(K_{1},K_{2},K_{3},Y)blackboard_K := ( italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_Y ) be a triadic context, and let {i,j,k}={1,2,3}𝑖𝑗𝑘123\{i,j,k\}=\{1,2,3\}{ italic_i , italic_j , italic_k } = { 1 , 2 , 3 } with j<k𝑗𝑘j<kitalic_j < italic_k. Consider XiKisubscript𝑋𝑖subscript𝐾𝑖X_{i}\subseteq K_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and (Xj,Xk)Kj×Kksubscript𝑋𝑗subscript𝑋𝑘subscript𝐾𝑗subscript𝐾𝑘(X_{j},X_{k})\subseteq K_{j}{\times}K_{k}( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⊆ italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT111We use (Xj,Xk)Kj×Kksubscript𝑋𝑗subscript𝑋𝑘subscript𝐾𝑗subscript𝐾𝑘(X_{j},X_{k})\subseteq K_{j}{\times}K_{k}( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⊆ italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to indicate that XjKjsubscript𝑋𝑗subscript𝐾𝑗X_{j}\subseteq K_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and XkKksubscript𝑋𝑘subscript𝐾𝑘X_{k}\subseteq K_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.. The (i)𝑖{}^{(i)}start_FLOATSUPERSCRIPT ( italic_i ) end_FLOATSUPERSCRIPT-derivation (Lehmann and Wille, 1995) is defined as follows:

Xi(i)superscriptsubscript𝑋𝑖𝑖\displaystyle X_{i}^{(i)}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT :={(aj,ak)Kj×Kk(ai,aj,ak)YaiXi}assignabsentconditional-setsubscript𝑎𝑗subscript𝑎𝑘subscript𝐾𝑗subscript𝐾𝑘subscript𝑎𝑖subscript𝑎𝑗subscript𝑎𝑘𝑌for-allsubscript𝑎𝑖subscript𝑋𝑖\displaystyle:=\{(a_{j},a_{k})\in K_{j}{\times}K_{k}\mid(a_{i},a_{j},a_{k})\in Y% ~{}\forall a_{i}\in X_{i}\}:= { ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∣ ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ italic_Y ∀ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }
(Xj,Xk)(i)superscriptsubscript𝑋𝑗subscript𝑋𝑘𝑖\displaystyle(X_{j},X_{k})^{(i)}( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT :={aiKi(ai,aj,ak)Yforall(aj,ak)Xj×Xk}assignabsentconditional-setsubscript𝑎𝑖subscript𝐾𝑖subscript𝑎𝑖subscript𝑎𝑗subscript𝑎𝑘𝑌forallsubscript𝑎𝑗subscript𝑎𝑘subscript𝑋𝑗subscript𝑋𝑘\displaystyle:=\{a_{i}\in K_{i}\mid(a_{i},a_{j},a_{k})\in Y~{}{\rm for~{}all}~% {}(a_{j},a_{k})\in X_{j}{\times}X_{k}\}:= { italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_K start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ italic_Y roman_for roman_all ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }

As an illustration, the application of the first derivation operator on the set 3456345634563456222Throughout this paper, we will frequently employ simplified notations for sets. For instance, 1 2 51251\,2\,51 2 5 (or simply 125125125125) represents the set {1,2,5}125\{1,2,5\}{ 1 , 2 , 5 }, while ab𝑎𝑏a\,bitalic_a italic_b (or simply ab𝑎𝑏abitalic_a italic_b) denotes {a,b}𝑎𝑏\{a,b\}{ italic_a , italic_b }. leads to two concept features (K,ab)𝐾𝑎𝑏(K,ab)( italic_K , italic_a italic_b ) and (R,ab)𝑅𝑎𝑏(R,ab)( italic_R , italic_a italic_b ) after the factorization of the pairs (K,a)𝐾𝑎(K,a)( italic_K , italic_a ), (K,b)𝐾𝑏(K,b)( italic_K , italic_b ), (R,a)𝑅𝑎(R,a)( italic_R , italic_a ) and (R,b)𝑅𝑏(R,b)( italic_R , italic_b ), i.e., after the computation of maximal rectangles involving attributes and conditions in this example. Similarly, by applying the second derivation operator on the pair (KPR,ab)𝐾𝑃𝑅𝑎𝑏(KPR,ab)( italic_K italic_P italic_R , italic_a italic_b ), we find the extent 346346346346, meaning that only Customers 3, 4 and 6 buy Products a𝑎aitalic_a and b𝑏bitalic_b from Suppliers K𝐾Kitalic_K, P𝑃Pitalic_P and R𝑅Ritalic_R.

Once computed, the set of triadic concepts grouped under the same extent can be ordered using the order relations 1subscript1\leq_{1}≤ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT determined by the quasi-order 1subscriptless-than-or-similar-to1\lesssim_{1}≲ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to form a poset. This sorting creates a Hasse diagram where each node represents all the triadic concepts with the same extent. However, it is important to note that the generated nodes do not constitute a complete lattice since the intersection of extents is not necessarily an extent in the triadic setting.

We recall that for two elements x𝑥xitalic_x and y𝑦yitalic_y in a poset, if xy𝑥𝑦x\leq yitalic_x ≤ italic_y (resp. x<y𝑥𝑦x<yitalic_x < italic_y), then x𝑥xitalic_x is below (resp. strictly below) y𝑦yitalic_y. If x<y𝑥𝑦x<yitalic_x < italic_y and there is no element between x𝑥xitalic_x and y𝑦yitalic_y, x𝑥xitalic_x is called a lower cover of y𝑦yitalic_y, and y𝑦yitalic_y an upper cover of x𝑥xitalic_x, and we write xyprecedes𝑥𝑦x\prec yitalic_x ≺ italic_y.

Figure 1 presents the Hasse diagram generated by the T-iPred algorithm, where the value inside each node represents an extent while the pairs of values attached to a dotted line are the corresponding features. For instance, the node labelled 256256256256 encompasses the extent 256256256256 of the TCs (256,R,abd)256𝑅𝑎𝑏𝑑(256,R,abd)( 256 , italic_R , italic_a italic_b italic_d ) and (256,NPR,ad)256𝑁𝑃𝑅𝑎𝑑(256,NPR,ad)( 256 , italic_N italic_P italic_R , italic_a italic_d ).

Refer to caption
Figure 1: The Hasse diagram of triadic concepts

2.3 Inverted index

An inverted index is a powerful data structure commonly used in information retrieval systems. Its primary purpose is to facilitate efficient and quick access to relevant information from large data collections. The main idea behind an inverted index is to map terms or keywords found in documents to the corresponding positions where these terms appear. Instead of looking for the entire document set for a specific term, the inverted index allows direct access to the documents containing that specific term, significantly reducing search time.

The benefits of utilizing an inverted index are manifold. Firstly, it greatly accelerates search operations, making it ideal for applications involving vast amounts of text data, such as search engines and document retrieval systems. Additionally, an inverted index supports diverse advanced search functionalities like Boolean queries, phrase searches, and ranking, enabling precise and context-aware retrieval (Zobel and Moffat, 2006).

In the context of inverted index-based information retrieval, re-rank algorithms have a crucial role in fine-tuning the ranking of retrieved documents to provide users with more relevant results. After using the inverted index to quickly identify documents containing the query terms, the initial ranking may not fully capture the user’s intent or context. Re-rank algorithms step in to reevaluate and adjust the document rankings based on additional features or criteria, improving the overall accuracy of the search results. By incorporating re-rank algorithms, information retrieval systems can enhance the user experience and deliver more accurate results.

3 Related work

To the best of our knowledge, there is only one related work to our proposed approach. In a recent study, (Ananias et al., 2021) introduced an approach to unveil patterns from triadic contexts by querying the computed triadic concepts. The proposed search strategy utilizes the diagram generated by T-iPred to showcase the query’s exact answer or the upper and lower covers for concepts that approximate the given query expressed as a triple of object, attribute, and condition sets. This methodology provides a powerful means to explore and extract valuable insights from triadic data structures.

In the approach proposed by (Ananias et al., 2021), the triadic context is first converted into three distinct dyadic contexts as indicated earlier. However, the computational cost and execution time to perform the Cartesian product among the two out of the three dimensions of the triadic context can be prohibitive even for a context with just a few dozen attributes and/or conditions. Furthermore, the derivation operation is performed several times during the process of querying triadic concepts, which can result in excessive processing time. Another limitation concerns the number of triadic concepts returned. By adopting derivation operators, the answer to queries often leads to one triadic concept in the scenario of an approximate search. This limitation prevents the user from exploring other triadic concepts that are, in fact, similar to the specified query and hence could be relevant to the user.

In (Ananias et al., 2021), when the user asks for a one-dimensional query (X1,,)subscript𝑋1(X_{1},-,-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , - ), X1superscriptsubscript𝑋1X_{1}^{\prime}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is first computed then factorized to get maximal rectangles as the set \mathcal{F}caligraphic_F of individual features Fi=(X2i,X3i)subscript𝐹𝑖subscript𝑋2𝑖subscript𝑋3𝑖F_{i}=(X_{2i},X_{3i})italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 italic_i end_POSTSUBSCRIPT ) associated with X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Then, the set of the smallest triadic concepts that have at least X1K1subscript𝑋1subscript𝐾1X_{1}\subseteq K_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in their extent is calculated. For a two-dimensional query (X1,X2,)subscript𝑋1subscript𝑋2(X_{1},X_{2},-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - ), Proposition 3 in (Wille, 1995) is used to compute (A1,A2,A3)subscript𝐴1subscript𝐴2subscript𝐴3(A_{1},A_{2},A_{3})( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) by first calculating A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and then A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT followed by A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT followed by A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) using the second derivation on pairs (see Section 2), leading to one or two triadic concepts. The case of three-dimensional queries (X1,X2,X3)subscript𝑋1subscript𝑋2subscript𝑋3(X_{1},X_{2},X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) is converted into three two-dimensional queries whenever the triple is not a triadic concept.

Regarding the exploration of triadic concepts, the prototype known as “FCA Tools Bundle" (Kis et al., 2016) offers visualization and navigation capabilities through a set of triadic concepts by utilizing dyadic projections. It also aids in identifying concepts within a triadic context without computing all closed trisets. Nevertheless, the present paper investigates triple matching, which has not been examined in the mentioned prototype. Another approach to approximating triadic concepts is through OAC-triclusters, akin to OA-biclusters approximating formal concepts by relaxing the closed triset properties. However, no specific query formulation is pursued in the quest for triadic concepts that precisely or partially match a given input triple.

4 Partial or Complete Matching of Triples

In this section, we present a new approach for extracting knowledge from triadic contexts through partial or complete matching of triples. The process of querying triples within the TCA framework involves searching for patterns in the form of triadic concepts, utilizing our proposed approach that requires only the triadic concepts as input.

Our primary objective with this solution is to identify all the triadic concepts that either match the user’s query completely or partially. For instance, consider the triadic context of customers represented in Table 1. This context captures transactions between customers and suppliers over products. A specialized professional might be interested in determining if there exists a set of products bought by a specific group of customers from suppliers. For example, the formal concept (346,KPR,ab)346𝐾𝑃𝑅𝑎𝑏(346,KPR,ab)( 346 , italic_K italic_P italic_R , italic_a italic_b ) denotes that customers 3, 4, and 6 purchased the same products a and b from the suppliers K𝐾Kitalic_K, P𝑃Pitalic_P, and R𝑅Ritalic_R.

If one seeks concepts with an extent equal to {146}146\{146\}{ 146 }, the resulting set would be (146,KNP,ab)146𝐾𝑁𝑃𝑎𝑏(146,KNP,ab)( 146 , italic_K italic_N italic_P , italic_a italic_b ) and (146,NP,abd)146𝑁𝑃𝑎𝑏𝑑(146,NP,abd)( 146 , italic_N italic_P , italic_a italic_b italic_d ). Conversely, specifying the set {1,3}13\{1,3\}{ 1 , 3 } reveals that there is no concept with just these two objects in the Hasse diagram (see Figure 1). This indicates that Customers 1 and 3 likely made purchases alongside other ones. However, there are six triadic concepts in which the set {1,3}13\{1,3\}{ 1 , 3 } is a subset of the extent : (1346,KP,ab)1346𝐾𝑃𝑎𝑏(1346,KP,ab)( 1346 , italic_K italic_P , italic_a italic_b ), (1346,P,abd)1346𝑃𝑎𝑏𝑑(1346,P,abd)( 1346 , italic_P , italic_a italic_b italic_d ), (13456,K,ab)13456𝐾𝑎𝑏(13456,K,ab)( 13456 , italic_K , italic_a italic_b ), (123456,NP,ad)123456𝑁𝑃𝑎𝑑(123456,NP,ad)( 123456 , italic_N italic_P , italic_a italic_d ), (123456,KNPRST,a)123456𝐾𝑁𝑃𝑅𝑆𝑇𝑎(123456,KNPRST,a)( 123456 , italic_K italic_N italic_P italic_R italic_S italic_T , italic_a ) and (123456,,abcd)123456𝑎𝑏𝑐𝑑(123456,\varnothing,abcd)( 123456 , ∅ , italic_a italic_b italic_c italic_d ). These concepts provide valuable information about the purchases that the two clients share with other costumers.

Another user might be interested in triadic concepts that partially match the elements in the query, in case there is no exact match. For instance, when searching for {1,3}13\{1,3\}{ 1 , 3 }, we could return the concepts that have only one of the two elements in the query. In this scenario, the following triadic concepts would be returned to the user: (16,R,ac)16𝑅𝑎𝑐(16,R,ac)( 16 , italic_R , italic_a italic_c ), (146,KNP,ab)146𝐾𝑁𝑃𝑎𝑏(146,KNP,ab)( 146 , italic_K italic_N italic_P , italic_a italic_b ), (146,NP,abd)146𝑁𝑃𝑎𝑏𝑑(146,NP,abd)( 146 , italic_N italic_P , italic_a italic_b italic_d ), (346,KPR,ab)346𝐾𝑃𝑅𝑎𝑏(346,KPR,ab)( 346 , italic_K italic_P italic_R , italic_a italic_b ), (3456,KR,ab)3456𝐾𝑅𝑎𝑏(3456,KR,ab)( 3456 , italic_K italic_R , italic_a italic_b ), (1246,N,abd)1246𝑁𝑎𝑏𝑑(1246,N,abd)( 1246 , italic_N , italic_a italic_b italic_d ), (23456,R,ab)23456𝑅𝑎𝑏(23456,R,ab)( 23456 , italic_R , italic_a italic_b ).

Both scenarios are possible to be executed in our approach. The proposed algorithm uses an inverted index that is created based on the triadic concepts, which enables not only an extremely efficient search, but also allows greater flexibility in the number of returned triadic concepts given a query.

However, this flexibility can lead to a significant number of triadic concepts being returned, and in some cases, we might have several dozen concepts. In this regard, we also propose the Re-rank algorithm to indicate the most similar triadic concepts to the user’s query. Each returned concept is assigned a score based on the similarity of the query with the triadic concepts in the inverted index.

Refer to caption
Figure 2: Workflow of our proposed solution

Figure 2 presents a diagram of how the proposed solution works. The initial step in the diagram involves comprehending all the triadic concepts. Subsequently, each constituent element of a triadic concept is scrutinized, and the inverted index is established, encompassing the map** of elements to the concepts in which they appear. When a user submits a query, the inverted index is queried for concepts possessing one or more elements present in the query. The assemblage of concepts that intersect with the query is referred to as “Relevant Triadic Concepts”. Then, this set of triadic concepts is ranked based on their similarity with the query and then presented to the user.

For a better understanding of the proposed approach, we illustrate the first three steps of the workflow in Figure 3. On the left, we have the set of triadic concepts associated with a unique identifier. Then, in the center of the figure, each element (objects, attributes and conditions) composing the concepts is examined (terms), and an inverted index is created, indicating in which triadic concept (ID) each term occurs. This structure is used to perform matches based on a user’s query. For example, when executing the query (16,R,c)16𝑅𝑐(16,\ R,\ c)( 16 , italic_R , italic_c ) and submitting it to the inverted index, the response is the document with ID 2, corresponding to the concept (16,R,ac)16𝑅𝑎𝑐(16,\ R,\ ac)( 16 , italic_R , italic_a italic_c ).

Refer to caption
Figure 3: Process of creating the Inverted-Index from triadic concepts and retrieving a matched concept

The proposed metric is primarily based on the intuition that in the query (X1,X2,X3subscript𝑋1subscript𝑋2subscript𝑋3X_{1},X_{2},X_{3}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT), the component Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with the greatest number of elements should be assigned a greater weight than the other dimensions, indicating a likely higher level of interest. For instance, when the user addresses the query (,R,a)𝑅𝑎(-,\ R,\ a)( - , italic_R , italic_a ), the elements of the second and third dimensions of the query will carry equal weight in ranking the triadic concepts that include these elements. However, when entering the query (,R,abcd)𝑅𝑎𝑏𝑐𝑑(-,\ R,\ abcd)( - , italic_R , italic_a italic_b italic_c italic_d ), the third dimension will hold a higher weight, as out of the five elements present in the query, four pertain to the conditions component (third dimension). Consequently, triadic concepts that exactly match the query’s elements in the conditions (abcd𝑎𝑏𝑐𝑑abcditalic_a italic_b italic_c italic_d) should possess a higher score than concepts that solely possess the attribute R𝑅Ritalic_R and/or a subset of the elements forming the set of conditions. The Re-rank procedure is presented in Algorithm 1.

Algorithm 1 Re-rank algorithm

Input: Invert,Query,ToleranceΘ𝐼𝑛𝑣𝑒𝑟𝑡𝑄𝑢𝑒𝑟𝑦𝑇𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒ΘInvert,Query,Tolerance\Thetaitalic_I italic_n italic_v italic_e italic_r italic_t , italic_Q italic_u italic_e italic_r italic_y , italic_T italic_o italic_l italic_e italic_r italic_a italic_n italic_c italic_e roman_Θ
Output:TC-Set: Ranked relevant TCs.

1:TCsgetRelevantConcepts(Invert,Query,Θ)𝑇𝐶𝑠𝑔𝑒𝑡𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡𝐶𝑜𝑛𝑐𝑒𝑝𝑡𝑠𝐼𝑛𝑣𝑒𝑟𝑡𝑄𝑢𝑒𝑟𝑦ΘTCs\leftarrow getRelevantConcepts(Invert,Query,\Theta)italic_T italic_C italic_s ← italic_g italic_e italic_t italic_R italic_e italic_l italic_e italic_v italic_a italic_n italic_t italic_C italic_o italic_n italic_c italic_e italic_p italic_t italic_s ( italic_I italic_n italic_v italic_e italic_r italic_t , italic_Q italic_u italic_e italic_r italic_y , roman_Θ );
2:X1,X2,X3Querysubscript𝑋1subscript𝑋2subscript𝑋3𝑄𝑢𝑒𝑟𝑦X_{1},X_{2},X_{3}\leftarrow Queryitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ← italic_Q italic_u italic_e italic_r italic_y;
3:rankedConcepts[]𝑟𝑎𝑛𝑘𝑒𝑑𝐶𝑜𝑛𝑐𝑒𝑝𝑡𝑠rankedConcepts\leftarrow\left[\right]italic_r italic_a italic_n italic_k italic_e italic_d italic_C italic_o italic_n italic_c italic_e italic_p italic_t italic_s ← [ ]
4:for TCinTCs𝑇𝐶𝑖𝑛𝑇𝐶𝑠TC\ in\ TCsitalic_T italic_C italic_i italic_n italic_T italic_C italic_s do
5:   A1,A2,A3TCsubscript𝐴1subscript𝐴2subscript𝐴3𝑇𝐶A_{1},A_{2},A_{3}\leftarrow TCitalic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ← italic_T italic_C; ΔA1ΔA2ΔA30Δsubscript𝐴1Δsubscript𝐴2Δsubscript𝐴30\Delta A_{1}\leftarrow\Delta A_{2}\leftarrow\Delta A_{3}\leftarrow 0roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← roman_Δ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← roman_Δ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ← 0;
6:   ΔA10Δsubscript𝐴10\Delta A_{1}\leftarrow 0roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← 0;
7:   ΔA20Δsubscript𝐴20\Delta A_{2}\leftarrow 0roman_Δ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← 0;
8:   ΔA30Δsubscript𝐴30\Delta A_{3}\leftarrow 0roman_Δ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ← 0;
9:   if |X1|>0subscript𝑋10|X_{1}|>0| italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | > 0 then
10:      ΔA1=|X1A1|Δsubscript𝐴1subscript𝑋1subscript𝐴1\Delta A_{1}=|X_{1}\cap A_{1}|roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∩ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT |;
11:      ΔA1+=ΔA1max(|X1|,|A1|)\Delta A_{1}\mathrel{+}=\frac{\Delta A_{1}}{max(|X_{1}|,|A_{1}|)}roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + = divide start_ARG roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_a italic_x ( | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | , | italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ) end_ARG;
12:   end if
13:   if |X2|>0subscript𝑋20|X_{2}|>0| italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | > 0 then
14:      ΔA2=|X2A2|Δsubscript𝐴2subscript𝑋2subscript𝐴2\Delta A_{2}=|X_{2}\cap A_{2}|roman_Δ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = | italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∩ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT |;
15:      ΔA2+=ΔA2max(|X2|,|A2|)\Delta A_{2}\mathrel{+}=\frac{\Delta A_{2}}{max(|X_{2}|,|A_{2}|)}roman_Δ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + = divide start_ARG roman_Δ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_a italic_x ( | italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | , | italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ) end_ARG;
16:   end if
17:   if |X3|>0subscript𝑋30|X_{3}|>0| italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | > 0 then
18:      ΔA3=|X3A3|Δsubscript𝐴3subscript𝑋3subscript𝐴3\Delta A_{3}=|X_{3}\cap A_{3}|roman_Δ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = | italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∩ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT |;
19:      ΔA3+=ΔA3max(|X3|,|A3|)\Delta A_{3}\mathrel{+}=\frac{\Delta A_{3}}{max(|X_{3}|,|A_{3}|)}roman_Δ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + = divide start_ARG roman_Δ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_a italic_x ( | italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | , | italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | ) end_ARG;
20:   end if
21:   ranking=(ΔA1×|X1||query|)+(ΔA2×|X2||query|)+(ΔA3×|X3||query|)𝑟𝑎𝑛𝑘𝑖𝑛𝑔Δsubscript𝐴1subscript𝑋1𝑞𝑢𝑒𝑟𝑦Δsubscript𝐴2subscript𝑋2𝑞𝑢𝑒𝑟𝑦Δsubscript𝐴3subscript𝑋3𝑞𝑢𝑒𝑟𝑦ranking=(\Delta A_{1}\times\frac{|X_{1}|}{|query|})+(\Delta A_{2}\times\frac{|% X_{2}|}{|query|})+(\Delta A_{3}\times\frac{|X_{3}|}{|query|})italic_r italic_a italic_n italic_k italic_i italic_n italic_g = ( roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × divide start_ARG | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_ARG start_ARG | italic_q italic_u italic_e italic_r italic_y | end_ARG ) + ( roman_Δ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × divide start_ARG | italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG start_ARG | italic_q italic_u italic_e italic_r italic_y | end_ARG ) + ( roman_Δ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT × divide start_ARG | italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | end_ARG start_ARG | italic_q italic_u italic_e italic_r italic_y | end_ARG )   
22:   rankedConcepts.append(TC,ranking)formulae-sequence𝑟𝑎𝑛𝑘𝑒𝑑𝐶𝑜𝑛𝑐𝑒𝑝𝑡𝑠𝑎𝑝𝑝𝑒𝑛𝑑𝑇𝐶𝑟𝑎𝑛𝑘𝑖𝑛𝑔rankedConcepts.append(TC,ranking)italic_r italic_a italic_n italic_k italic_e italic_d italic_C italic_o italic_n italic_c italic_e italic_p italic_t italic_s . italic_a italic_p italic_p italic_e italic_n italic_d ( italic_T italic_C , italic_r italic_a italic_n italic_k italic_i italic_n italic_g )
23:end for
24:Return  TC-Set;

In the first line of Algorithm 1, the procedure starts with three input: (i) the inverted index Invert of terms (objects, attributes and conditions) and their corresponding concepts, (ii) the query and (iii) the tolerance parameter ΘΘ\Thetaroman_Θ. The purpose of the tolerance parameter is to allow the user to specify how many elements of the query might not be present in the returned triadic concepts. For instance, given the query (156,,)156(156,-,-)( 156 , - , - ) and a tolerance parameter =0absent0=0= 0, only the triadic concepts containing all the three objects 1111, 5555 and 6666 should be returned. If the tolerance parameter is =1absent1=1= 1, triadic concepts that partially match the query with one missing element (here object) can also be returned, such as the concept (56,K,abc)56𝐾𝑎𝑏𝑐(56,K,abc)( 56 , italic_K , italic_a italic_b italic_c ).

In Line 2, the query is separated into three variables, each representing one of the three components of the triple. In Line 3, the variable that stores the triadic concepts and the calculated score is also initialized to an empty list. In Lines 4 to 23, all the triadic concepts returned after querying the inverted index undergo the Re-rank process. This process begins with the separation of the concept’s components (Line 5). In Lines 6 to 8, three variables that will store the count of different elements between the user’s query and the triadic concept are initialized to 0.

In Line 9, it is checked whether the query includes elements from the first dimension (objects). If affirmative, in Line 10, the cardinality of the intersection between the query’s objects and the elements that comprise the concept’s extent is computed. Subsequently, in Line 11, this value is added to the result of dividing the variable ΔA1Δsubscript𝐴1\Delta A_{1}roman_Δ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by the maximum of X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

The same reasoning is repeated for the second and third components of each triadic concept (Lines 13 to 20). In Line 21, the final score is calculated by summing up all the deltas and weighting them by the cardinality of each component in the query relative to the total length of the search string. To better illustrate the procedure, consider the query (3456,,a)3456𝑎(3456,-,a)( 3456 , - , italic_a ). Out of the five elements present in the query, four are from the first component (objects). Consequently, triadic concepts with a larger intersection with the query’s elements in the first dimension will receive a higher score than concepts that only intersect in the third component (conditions). Finally, in Line 23, the triadic concept along with the score are appended to the variable that stores the output.

4.1 One-dimensional queries

One-dimensional queries carry out the process of matching concepts by focusing on a single dimension from the three available. When dealing with a triple where only one dimension is given, the procedure of conducting a one-dimensional query entails the identification of triadic concepts that exhibit the highest similarity to the elements provided within the query.

Formally, a one-dimensional query can be precisely defined as a triple in which solely one dimension is known, and this approach can be applied to any of the three dimensions. In this context, three distinct triples are established: for the extent (X1,,)subscript𝑋1(X_{1},-,-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , - ), the intent (,X2,)subscript𝑋2(-,X_{2},-)( - , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - ), or the modus (,,X3)subscript𝑋3(-,-,X_{3})( - , - , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).

4.1.1 Example

The list of concepts below contains the top 3 triadic concepts returned for the query (,KP,)𝐾𝑃(-,KP,-)( - , italic_K italic_P , - ).

  1. 1.

    (1346,KP,ab)[3.0]1346𝐾𝑃𝑎𝑏delimited-[]3.0(1346,\ KP,\ ab)-[3.0]( 1346 , italic_K italic_P , italic_a italic_b ) - [ 3.0 ]

  2. 2.

    (146,KNP,ab)[2.67]146𝐾𝑁𝑃𝑎𝑏delimited-[]2.67(146,\ KNP,\ ab)-[2.67]( 146 , italic_K italic_N italic_P , italic_a italic_b ) - [ 2.67 ]

  3. 3.

    (346,KPR,ab)[2.67]346𝐾𝑃𝑅𝑎𝑏delimited-[]2.67(346,\ KPR,\ ab)-[2.67]( 346 , italic_K italic_P italic_R , italic_a italic_b ) - [ 2.67 ]

The first concept is the only one that includes all the elements specified in the query and, thus, has the highest value for the similarity metric (3.03.03.03.0). The second and third concepts have KP𝐾𝑃KPitalic_K italic_P in their intent. However, each one has an additional attribute (N𝑁Nitalic_N and R𝑅Ritalic_R, respectively) and hence, have the same similarity value.

For comparison, we executed the same query using the algorithm proposed by (Ananias et al., 2021) and obtained the unique TC (1346,KP,ab)1346𝐾𝑃𝑎𝑏(1346,\ KP,\ ab)( 1346 , italic_K italic_P , italic_a italic_b ). We observe that both solutions found the same concept as the closest answer to the query. However, in our method, the user has the option to explore more similar concepts with their similarity score value.

4.2 Two-dimensional queries

A two-dimensional query facilitates either the identification of an existing concept or the detection of TCs that offer the closest match to that query. The objective is to uncover information linked to concepts that exhibit the highest similarity with any two components from the available three ones.

The formulation of a two-dimensional query involves a triple in which two dimensions are provided, while the third one remains unknown. Consequently, three distinct variants of this triple can be established: (X1,X2,)subscript𝑋1subscript𝑋2(X_{1},X_{2},-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - ), (X1,,X3)subscript𝑋1subscript𝑋3(X_{1},-,X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), and (,X2,X3)subscript𝑋2subscript𝑋3(-,X_{2},X_{3})( - , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ).

4.2.1 Example

The top 3 concepts for the two-dimensional query (,R,ab)𝑅𝑎𝑏(-,\ R,\ ab)( - , italic_R , italic_a italic_b ) are as follows:

  1. 1.

    (23456,R,ab)[2.67]23456𝑅𝑎𝑏delimited-[]2.67(23456,\ R,\ ab)-[2.67]( 23456 , italic_R , italic_a italic_b ) - [ 2.67 ]

  2. 2.

    (3456,KR,ab)[2.5]3456𝐾𝑅𝑎𝑏delimited-[]2.5(3456,\ KR,\ ab)-[2.5]( 3456 , italic_K italic_R , italic_a italic_b ) - [ 2.5 ]

  3. 3.

    (246,NR,ab)[2.5]246𝑁𝑅𝑎𝑏delimited-[]2.5(246,\ NR,\ ab)-[2.5]( 246 , italic_N italic_R , italic_a italic_b ) - [ 2.5 ]

The concept in the first place is the only triadic concept that exactly matches the elements in the query, and for this reason, it received the highest similarity value (2.672.672.672.67). The second and third returned concepts also encompass the sought-after elements but with an additional element in the intent (K𝐾Kitalic_K and N𝑁Nitalic_N respectively), thus sharing the same value of 2.52.52.52.5.

When we execute the same query by applying the approach from (Ananias et al., 2021), we get only the concept (23456,R,ab)23456𝑅𝑎𝑏(23456,\ R,\ ab)( 23456 , italic_R , italic_a italic_b ) as a response, which once again is the top-1 concept returned by our solution. Despite its correct response, we consider that the number of returned concepts by the approximation approach can discourage or even hinder the exploration of concepts similar to a specific user query, requiring the user to make multiple queries to explore a larger set of options.

4.3 Three-dimensional queries

The final type of query for matching triadic concepts involves a query where the three components are known. The formulation of a three-dimensional query is then (X1,X2,X3)subscript𝑋1subscript𝑋2subscript𝑋3(X_{1},X_{2},X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) and aims at checking if the triple is a triadic concept or identifying the concepts that are the closest to it based on our similarity formula.

4.3.1 Example

The following three triadic concepts constitute the top 3 concepts returned for the three-dimensional query (3,R,c)3𝑅𝑐(3,\ R,\ c)( 3 , italic_R , italic_c ).

  1. 1.

    (16,R,ac)[1.17]16𝑅𝑎𝑐delimited-[]1.17(16,\ R,\ ac)-[1.17]( 16 , italic_R , italic_a italic_c ) - [ 1.17 ]

  2. 2.

    (23456,R,ab)[1.07]23456𝑅𝑎𝑏delimited-[]1.07(23456,\ R,\ ab)-[1.07]( 23456 , italic_R , italic_a italic_b ) - [ 1.07 ]

  3. 3.

    (3456,KR,ab)[0.92]3456𝐾𝑅𝑎𝑏delimited-[]0.92(3456,\ KR,\ ab)-[0.92]( 3456 , italic_K italic_R , italic_a italic_b ) - [ 0.92 ]

As observed, there is no triadic concept that exactly matches the query. However, the most similar concept to this triple is (16,R,ac)16𝑅𝑎𝑐(16,\ R,\ ac)( 16 , italic_R , italic_a italic_c ), where the sought-after extent is absent, but there is an exact match for the attribute component (R𝑅Ritalic_R) and one additional element a𝑎aitalic_a to the requested value of c𝑐citalic_c for the condition component. The second concept lacks the condition c𝑐citalic_c but exactly possesses the sought-after R𝑅Ritalic_R attribute and the element 3333 in its object component along with four extra objects, compared to the requested triple. Lastly, there is the concept (3456,KR,ab)3456𝐾𝑅𝑎𝑏(3456,\ KR,\ ab)( 3456 , italic_K italic_R , italic_a italic_b ), where the condition c𝑐citalic_c is missing, but its intent contains the extra element K𝐾Kitalic_K, and its extent also includes the three additional objects 4444, 5555, and 6666.

Regarding the result generated by (Ananias et al., 2021) for the same query, we obtained the following triadic concepts: (16,R,ac)16𝑅𝑎𝑐(16,\ R,\ ac)( 16 , italic_R , italic_a italic_c ), (123456,,abcd)123456𝑎𝑏𝑐𝑑(123456,\ \varnothing,\ abcd)( 123456 , ∅ , italic_a italic_b italic_c italic_d ), and (23456,R,ab)23456𝑅𝑎𝑏(23456,\ R,\ ab)( 23456 , italic_R , italic_a italic_b ). Comparing the results, we can see that two triadic concepts are present in the output of both approaches: (16,R,ac)16𝑅𝑎𝑐(16,\ R,\ ac)( 16 , italic_R , italic_a italic_c ) and (23456,R,ab)23456𝑅𝑎𝑏(23456,\ R,\ ab)( 23456 , italic_R , italic_a italic_b ).

However, the solution from (Ananias et al., 2021) also returned (123456,,abcd)123456𝑎𝑏𝑐𝑑(123456,\ \varnothing,\ abcd)( 123456 , ∅ , italic_a italic_b italic_c italic_d ), a concept that was ranked as the top 5 in our solution. In other words, our method identified two more relevant concepts to present to the user before the concept (123456,,abcd)123456𝑎𝑏𝑐𝑑(123456,\ \varnothing,\ abcd)( 123456 , ∅ , italic_a italic_b italic_c italic_d ) inside the supremum (see Figure 1), which might not be so interesting to the user.

Furthermore, another crucial aspect to highlight is the significance of ranking the responses, in addition to using a similarity metric. The concepts returned by (Ananias et al., 2021) lack any order. As stated before, the search for a query answer in (Ananias et al., 2021) exploits derivation operators and includes sometimes concepts that are in the neighbourhood of the generated concepts. Concept (123456,,abcd)123456𝑎𝑏𝑐𝑑(123456,\ \varnothing,\ abcd)( 123456 , ∅ , italic_a italic_b italic_c italic_d ) is indeed in the upper cover of (23456,R,ab)23456𝑅𝑎𝑏(23456,\ R,\ ab)( 23456 , italic_R , italic_a italic_b ) and has the particularity that its extent and its modus cover their counterparts in the second concept.

5 Empirical study

The purpose of this section is to carry out a validation process and provide empirical insights on the execution times of each one of the two solutions: ours and the approximation approach (Ananias et al., 2021) using our software platform. The tests were conducted on both real and synthetic data sets.

Given that execution times might exhibit slight variations for the same context across different code runs, the two approaches were executed three times each, and the average execution time was calculated along with the standard deviation.

Furthermore, we conducted a comparative analysis between the algorithms proposed in this study and the approximation solution. The primary goal is to assess the scalability of each approach and evaluate the execution time in large contexts. For all tests, random queries were created in order to compare the solutions.

Both algorithms were implemented in Python (3.11), and all empirical tests were carried out on a macOS 13.4.1-based system equipped with 16 GB of RAM and an Apple M2 Pro 10-core processor.

The experiments are aimed at validating our hypothesis that our solution, based on an inverted index, will exhibit superior scalability and, consequently, improved execution times compared to the approximation approach.

5.1 Mushroom dataset

The Mushroom Data Set333Available at: https://archive.ics.uci.edu/ml/datasets/mushroom is a well-known one in data mining validation tests. This dataset contains descriptions of hypothetical samples corresponding to 23 species of mushrooms, characterized by 22 variables each one with two to twelve modalities. In order to have a synthetic triadic context, we first converted the multivariate dataset into a formal context from which we selected 128 binary attributes, decomposed the latter group into a set of 32 attributes and a set of 4 corresponding conditions. For this experiment, the synthetic triadic context has 8416 objects, 32 attributes and 4 conditions, resulting in a total of 1859 triadic concepts. Table 2 presents the execution time for the three types of queries, namely one, two, and three-dimension ones, along with the execution time for the data structure creation in both solutions.

Table 2: Mushroom dataset - Execution time comparison
Concept approximation Our solution
Create data structure 12.2 s ± 481 ms 65.8 ms ± 4.11 ms
One-dimensional query (,,X3)subscript𝑋3(-,-,X_{3})( - , - , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 3min 9s ± 607 ms 18.5 ms ± 2.7 ms
One-dimensional query (,X2,)subscript𝑋2(-,X_{2},-)( - , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - )
Execution time 1.41 s ± 9.47 ms 0.607 ms ± 0,323 ms
One-dimensional query (X1,,)subscript𝑋1(X_{1},-,-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , - )
Execution time 4.19 ms ± 680 ms 26.9 ms ± 6.66 ms
Two-dimensional query (,X2,X3)subscript𝑋2subscript𝑋3(-,X_{2},X_{3})( - , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 210 ms ± 7.74 ms 2.39 ms ± 0,280 ms
Two-dimensional query (X1,,X3)subscript𝑋1subscript𝑋3(X_{1},-,X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 2.28 ms ± 0.307 ms 8.11 ms ± 0.815 ms
Two-dimensional query (X1,X2,)subscript𝑋1subscript𝑋2(X_{1},X_{2},-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - )
Execution time 48 ms ± 3.44 ms 5.64 ms ± 0.922 ms
Three-dimensional query (X1,X2,X3)subscript𝑋1subscript𝑋2subscript𝑋3(X_{1},X_{2},X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 257 ms ± 9.95 ms 28.8 ms ± 4.49 ms

The first noticeable point is the execution time for creating the data structure used to perform the queries. In the approximation approach, the transformation of the triadic context into three dyadic contexts is required, which involves calculating the Cartesian product of two dimensions out of three in the original context. Thus, just the Cartesian product between the set of objects and the set of attributes resulted in a context with 269,312 cells in the dyadic context 𝕂(3)superscript𝕂3\mathbb{K}^{(3)}blackboard_K start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT. Consequently, this demands a longer execution time, averaging 12.2 seconds for creating this data structure.

In contrast, for creating the inverted index, it is only necessary to traverse the list of triadic concepts once to get the concepts associated with each dimension element. In this way, we have an average execution time of 65.8 milliseconds for its creation.

Regarding the execution of queries, the run time for both algorithms was measured in milliseconds, with the exception of the first two queries in the solution of (Ananias et al., 2021). For the first one-dimensional query, all derivation operations are performed on the context 𝕂(3)superscript𝕂3\mathbb{K}^{(3)}blackboard_K start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT with 269,312 cells. Furthermore, the derivation of a single element can generate a large set of pairs, which subsequently undergoes a factorization process. This excessive amount of processing results in an average execution time of 3 minutes and 9 seconds, while our solution runs in an average of 18.5 milliseconds.

In the second one-dimensional query, the number of cells in the generated dyadic context is also a bottleneck. For this query, the derivations are performed on the context 𝕂(2)superscript𝕂2\mathbb{K}^{(2)}blackboard_K start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT with 33,664 cells, resulting in an average time of 1.41 seconds. On the other hand, our solution took only 0.607 milliseconds on average.

It is also important to highlight the number of triadic concepts returned by each solution. In all queries, our approach was able to identify a higher number of concepts with their similarity score. In the approximation approach, the size of the query answer depends on the query and the TC set without any hint from the user. Moreover, the returned concepts are not ranked, requiring manual and meticulous exploration from the user to identify the most relevant ones.

5.2 Groceries dataset

The Groceries database444Available at: https://www.kaggle.com/heeraldedhia/groceries-dataset contains 38765 transactions, 3898 customers, 167 products (items) and 728 distinct transaction dates. The dataset we are analyzing contains transactions made by customers who bought a set of items during a given month (rather than a specific date) between 2014 and 2015 and has a total of 17456 triadic concepts. Table 3 displays the execution times for each potential query type, namely one-dimensional, two-dimensional, and three-dimensional queries, as well as the time taken for data structure creation in both approaches.

Table 3: Groceries dataset - Execution time comparison
Concept approximation Our solution
Create data structure 1 min 27s ± 456 ms 21.9 ms ± 3.09 ms
One-dimensional query (,,X3)subscript𝑋3(-,-,X_{3})( - , - , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 58.2 s ± 232 ms 23.9 ms ± 3.94 ms
One-dimensional query (,X2,)subscript𝑋2(-,X_{2},-)( - , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - )
Execution time 429 ms ± 18.6 ms 4.53 ms ± 0.313 ms
One-dimensional query (X1,,)subscript𝑋1(X_{1},-,-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , - )
Execution time 3.87 ms ± 0.873 ms 2.05 ms ± 0.375 ms
Two-dimensional query (,X2,X3)subscript𝑋2subscript𝑋3(-,X_{2},X_{3})( - , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 48.8 ms ± 7.02 ms 57.5 ms ± 8.56 ms
Two-dimensional query (X1,,X3)subscript𝑋1subscript𝑋3(X_{1},-,X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , - , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 1.65 s ± 5.02 ms 33.6 ms ± 5.59 ms
Two-dimensional query (X1,X2,)subscript𝑋1subscript𝑋2(X_{1},X_{2},-)( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , - )
Execution time 40.8 ms ± 8 ms 6.02 ms ± 0.779 ms
Three-dimensional query (X1,X2,X3)subscript𝑋1subscript𝑋2subscript𝑋3(X_{1},X_{2},X_{3})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
Execution time 6.74 s ± 9.88 ms 92.9 ms ± 2.59 ms

In this experiment, we can observe that the approximation approach, once again, requires more time for data structure creation. In this case, the fact that we have a larger number of objects, attributes, and conditions has a negative impact on the creation of dyadic contexts, requiring an average time of one minute and 27 seconds for its completion. In contrast, the creation of the inverted index is not significantly affected, even with a considerably large number of triadic concepts in this dataset, taking only 21.9 milliseconds to create the index.

Using this dataset, the limitation of the approximation approach becomes evident, where three queries took more than 1.5 seconds to return a response, with the slowest one being the first query in Table 3, which took an average of 58.2 seconds. Once again, the number of cells in the dyadic context 𝕂(2)superscript𝕂2\mathbb{K}^{(2)}blackboard_K start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT generated by the Cartesian product of objects and attributes (650,966 cells), along with the derivation and factorization operations, prove to be non scalable and impractical for larger contexts.

This dataset led to a large set of 17,456 triadic concepts. However, our solution was efficient to find the concepts using the inverted index as discussed earlier.

6 Conclusion

In this paper, we introduce a novel approach to querying triadic concepts based on partial or complete matching of triples. This approach, based on using an inverted index, does not need to store or explore any context, nor rely on derivation or factorization operations to identify the triadic concepts most similar to a given query.

This solution not only shows higher efficiency compared to the most related existing approach in the literature but also indicates enhanced scalability, making it applicable in big data scenarios. To the best of our knowledge, there are no other algorithms or methods capable of partial or complete matching of triples without the need to perform derivation operators or to transform the triadic context into dyadic ones. Moreover, the query answer sounds close to the user’s triple.

Currently, we are conducting additional tests and code improvement for the two compared approaches as well as working on an open-source platform for exploring and mining patterns in Triadic Concept Analysis.

References

  • Kwuida et al. [2010] Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor, Lahcen Boumedjout, and Jean Vaillancourt. Restrictions on concept lattices for pattern management. In Marzena Kryszkiewicz and Sergei A. Obiedkov, editors, Proceedings of the 7th International Conference on Concept Lattices and Their Applications, Sevilla, Spain, October 19-21, 2010, volume 672 of CEUR Workshop Proceedings, pages 235–246. CEUR-WS.org, 2010.
  • Ganter and Wille [1999] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. Springer Berlin Heidelberg, Berlin, Heidelberg, jun 1999. ISBN 978-3-540-62771-5.
  • Lehmann and Wille [1995] Fritz Lehmann and Rudolf Wille. A triadic approach to formal concept analysis. In Lecture Notes in Computer Science, volume 954, pages 32–43, 1995. ISBN 3540601619.
  • Jäschke et al. [2008] Robert Jäschke, Andreas Hotho, Christoph Schmitz, Bernhard Ganter, and Gerd Stumme. Discovering shared conceptualizations in folksonomies. Web Semantics, 6(1):38–53, 2008. ISSN 15708268.
  • Missaoui et al. [2020] Rokia Missaoui, Pedro H. B. Ruas, Léonard Kwuida, and Mark A. J. Song. Pattern discovery in triadic contexts. In Mehwish Alam, Tanya Braun, and Bruno Yun, editors, Ontologies and Concepts in Mind and Machine, pages 117–131, Cham, 2020. Springer International Publishing. ISBN 978-3-030-57855-8.
  • Rudolph et al. [2015] Sebastian Rudolph, Christian Săcărea, and Diana Troancă. Towards a navigation paradigm for triadic concepts. In Jaume Baixeries, Christian Sacarea, and Manuel Ojeda-Aciego, editors, Formal Concept Analysis, pages 252–267, Cham, 2015. Springer International Publishing. ISBN 978-3-319-19545-2.
  • Baixeries et al. [2009] Jaume Baixeries, Laszlo Szathmary, Petko Valtchev, and Robert Godin. Yet a faster algorithm for building the hasse diagram of a concept lattice. In Sébastien Ferré and Sebastian Rudolph, editors, Formal Concept Analysis, pages 162–177, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. ISBN 978-3-642-01815-2.
  • Wille [1995] Rudolf Wille. The basic theorem of triadic concept analysis. Order, 12(2):149–158, 1995.
  • Zobel and Moffat [2006] Justin Zobel and Alistair Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2):6–es, jul 2006. ISSN 0360-0300.
  • Ananias et al. [2021] Kaio H.A. Ananias, Rokia Missaoui, Pedro H.B. Ruas, Luis E. Zarate, and Mark A.J. Song. Triadic concept approximation. Information Sciences, 572:126–146, 2021.
  • Kis et al. [2016] Levente Lorand Kis, Christian Sacarea, and Diana Troanca. Fca tools bundle-a tool that enables dyadic and triadic conceptual navigation. In FCA4AI@ ECAI, pages 42–50, 2016.