Pruning-Based Extraction of Descriptions from Probabilistic Circuits
Authors:
Sieben Bocklandt,
Vincent Derkinderen,
Koen Vanderstraeten,
Wouter Pijpops,
Kurt Jaspers,
Wannes Meert
Abstract:
Concept learning is a general task with applications in various domains. As a motivating example we consider the application of music playlist generation, where a playlist is represented as a concept (e.g., `relaxing music') rather than as a fixed collection of songs. In this work we use a probabilistic circuit to learn a concept from positively labelled and unlabelled examples. While these circui…
▽ More
Concept learning is a general task with applications in various domains. As a motivating example we consider the application of music playlist generation, where a playlist is represented as a concept (e.g., `relaxing music') rather than as a fixed collection of songs. In this work we use a probabilistic circuit to learn a concept from positively labelled and unlabelled examples. While these circuits form an attractive tractable model for this task, it is challenging for a domain expert to inspect and analyse them, which impedes their use within certain applications. We propose to resolve this by converting a learned probabilistic circuit into a logic-based discriminative model that covers the high density regions of the circuit. That is, those regions the circuit classifies as certainly being part of the learned concept. As part of this approach we present two contributions: PUTPUT, an algorithm to prune low density regions from a probabilistic circuit while considering both the F1-score and a newly proposed description length that we call aggregated entropy. Our experiments demonstrate the effectiveness of our approach in providing discriminative models, outperforming competitors on the music playlist generation task and similar datasets.
△ Less
Submitted 5 June, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
Automatic Generation of Product Concepts from Positive Examples, with an Application to Music Streaming
Authors:
Kshitij Goyal,
Wannes Meert,
Hendrik Blockeel,
Elia Van Wolputte,
Koen Vanderstraeten,
Wouter Pijpops,
Kurt Jaspers
Abstract:
Internet based businesses and products (e.g. e-commerce, music streaming) are becoming more and more sophisticated every day with a lot of focus on improving customer satisfaction. A core way they achieve this is by providing customers with an easy access to their products by structuring them in catalogues using navigation bars and providing recommendations. We refer to these catalogues as product…
▽ More
Internet based businesses and products (e.g. e-commerce, music streaming) are becoming more and more sophisticated every day with a lot of focus on improving customer satisfaction. A core way they achieve this is by providing customers with an easy access to their products by structuring them in catalogues using navigation bars and providing recommendations. We refer to these catalogues as product concepts, e.g. product categories on e-commerce websites, public playlists on music streaming platforms. These product concepts typically contain products that are linked with each other through some common features (e.g. a playlist of songs by the same artist). How they are defined in the backend of the system can be different for different products. In this work, we represent product concepts using database queries and tackle two learning problems. First, given sets of products that all belong to the same unknown product concept, we learn a database query that is a representation of this product concept. Second, we learn product concepts and their corresponding queries when the given sets of products are associated with multiple product concepts. To achieve these goals, we propose two approaches that combine the concepts of PU learning with Decision Trees and Clustering. Our experiments demonstrate, via a simulated setup for a music streaming service, that our approach is effective in solving these problems.
△ Less
Submitted 15 January, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.