-
Clustering Retail Products Based on Customer Behaviour
Authors:
Vladimír Holý,
Ondřej Sokol,
Michal Černý
Abstract:
The categorization of retail products is essential for the business decision-making process. It is a common practice to classify products based on their quantitative and qualitative characteristics. In this paper we use a purely data-driven approach. Our clustering of products is based exclusively on the customer behaviour. We propose a method for clustering retail products using market basket dat…
▽ More
The categorization of retail products is essential for the business decision-making process. It is a common practice to classify products based on their quantitative and qualitative characteristics. In this paper we use a purely data-driven approach. Our clustering of products is based exclusively on the customer behaviour. We propose a method for clustering retail products using market basket data. Our model is formulated as an optimization problem which is solved by a genetic algorithm. It is demonstrated on simulated data how our method behaves in different settings. The application using real data from a Czech drugstore company shows that our method leads to similar results in comparison with the classification by experts. The number of clusters is a parameter of our algorithm. We demonstrate that if more clusters are allowed than the original number of categories is, the method yields additional information about the structure of the product categorization.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
A Simple Measure of Product Substitutability Based on Common Purchases
Authors:
Ondřej Sokol,
Vladimír Holý
Abstract:
We propose a measure of product substitutability based on correlation of common purchases, which is fast to compute and easy to interpret. In an empirical study of a drugstore retail chain, we demonstrate its properties, compare it to a similarly simple measure of product complementarity, and use it to find small clusters of substitutes.
We propose a measure of product substitutability based on correlation of common purchases, which is fast to compute and easy to interpret. In an empirical study of a drugstore retail chain, we demonstrate its properties, compare it to a similarly simple measure of product complementarity, and use it to find small clusters of substitutes.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Clustering with Penalty for Joint Occurrence of Objects: Computational Aspects
Authors:
Ondřej Sokol,
Vladimír Holý
Abstract:
The method of Holý, Sokol and Černý (Applied Soft Computing, 2017, Vol. 60, p. 752-762) clusters objects based on their incidence in a large number of given sets. The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method. First, we prove that the problem of finding the optimal clustering is N…
▽ More
The method of Holý, Sokol and Černý (Applied Soft Computing, 2017, Vol. 60, p. 752-762) clusters objects based on their incidence in a large number of given sets. The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method. First, we prove that the problem of finding the optimal clustering is NP-hard. Second, to numerically find a suitable clustering, we propose to use the genetic algorithm augmented by a renumbering procedure, a fast task-specific local search heuristic and an initial solution based on a simplified model. Third, in a simulation study, we demonstrate that our improvements of the standard genetic algorithm significantly enhance its computational performance.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
The Role of Shop** Mission in Retail Customer Segmentation
Authors:
Ondřej Sokol,
Vladimír Holý
Abstract:
In retailing, it is important to understand customer behavior and determine customer value. A useful tool to achieve such goals is the cluster analysis of transaction data. Typically, a customer segmentation is based on the recency, frequency and monetary value of shop** or the structure of purchased products. We take a different approach and base our segmentation on the shop** mission - a rea…
▽ More
In retailing, it is important to understand customer behavior and determine customer value. A useful tool to achieve such goals is the cluster analysis of transaction data. Typically, a customer segmentation is based on the recency, frequency and monetary value of shop** or the structure of purchased products. We take a different approach and base our segmentation on the shop** mission - a reason why a customer visits the shop. Shop** missions include focused purchases of specific product categories and general purchases of various sizes. In an application to a Czech drugstore chain, we show that the proposed segmentation brings unique information about customers and should be used alongside the traditional methods.
△ Less
Submitted 19 March, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
How Many Customers Does a Retail Store Have?
Authors:
Ondřej Sokol,
Vladimír Holý
Abstract:
The knowledge of the number of customers is the pillar of retail business analytics. In our setting, we assume that a portion of customers is monitored and easily counted due to the loyalty program while the rest is not monitored. The behavior of customers in both groups may significantly differ making the estimation of the number of unmonitored customers a non-trivial task. We identify shop** p…
▽ More
The knowledge of the number of customers is the pillar of retail business analytics. In our setting, we assume that a portion of customers is monitored and easily counted due to the loyalty program while the rest is not monitored. The behavior of customers in both groups may significantly differ making the estimation of the number of unmonitored customers a non-trivial task. We identify shop** patterns of several customer segments which allows us to estimate the distribution of customers without the loyalty card using the maximum likelihood method. In a simulation study, we find that the proposed approach is quite precise even when the data sample is very small and its assumptions are violated to a certain degree. In an empirical study of a drugstore chain, we validate and illustrate the proposed approach in practice. The actual number of customers estimated by the proposed method is much higher than the number suggested by the naive estimate assuming the constant customer distribution. The proposed method can also be utilized to determine penetration of the loyalty program in the individual customer segments.
△ Less
Submitted 5 April, 2020; v1 submitted 23 April, 2019;
originally announced April 2019.