Search | arXiv e-print repository

Kernel KMeans clustering splits for end-to-end unsupervised decision trees

Authors: Louis Ohl, Pierre-Alexandre Mattei, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso

Abstract: Trees are convenient models for obtaining explainable predictions on relatively small datasets. Although there are many proposals for the end-to-end construction of such trees in supervised learning, learning a tree end-to-end for clustering without labels remains an open challenge. As most works focus on interpreting with trees the result of another clustering algorithm, we present here a novel e… ▽ More Trees are convenient models for obtaining explainable predictions on relatively small datasets. Although there are many proposals for the end-to-end construction of such trees in supervised learning, learning a tree end-to-end for clustering without labels remains an open challenge. As most works focus on interpreting with trees the result of another clustering algorithm, we present here a novel end-to-end trained unsupervised binary tree for clustering: Kauri. This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids. We compare this model on multiple datasets with recent unsupervised trees and show that Kauri performs identically when using a linear kernel. For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree. △ Less

Submitted 19 February, 2024; originally announced February 2024.

MSC Class: 62h30 ACM Class: G.3

arXiv:2309.02858 [pdf, other]

Generalised Mutual Information: a Framework for Discriminative Clustering

Authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso

Abstract: In the last decade, recent successes in deep clustering majorly involved the Mutual Information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight h… ▽ More In the last decade, recent successes in deep clustering majorly involved the Mutual Information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the Generalised Mutual Information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training as they are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep discriminative clustering context where the number of clusters is a priori unknown. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: Submitted for review at the IEEE Transactions on Pattern Analysis and Machine Intelligence. This article is an extension of an original NeurIPS 2022 article [arXiv:2210.06300]

MSC Class: 62H30 ACM Class: G.3

arXiv:2305.18277 [pdf, other]

3DTeethSeg'22: 3D Teeth Scan Segmentation and Labeling Challenge

Authors: Achraf Ben-Hamadou, Oussama Smaoui, Ahmed Rekik, Sergi Pujades, Edmond Boyer, Hoyeon Lim, Minchang Kim, Minkyung Lee, Minyoung Chung, Yeong-Gil Shin, Mathieu Leclercq, Lucia Cevidanes, Juan Carlos Prieto, Shaojie Zhuang, Guangshun Wei, Zhiming Cui, Yuanfeng Zhou, Tudor Dascalu, Bulat Ibragimov, Tae-Hoon Yong, Hong-Gi Ahn, Wan Kim, Jae-Hwan Han, Byungsun Choi, Niels van Nistelrooij , et al. (7 additional authors not shown)

Abstract: Teeth localization, segmentation, and labeling from intra-oral 3D scans are essential tasks in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, develo** automated algorithms for teeth analysis presents significant challenges due to variations in dental anatomy, imaging protocols, and limited availability of publicly accessi… ▽ More Teeth localization, segmentation, and labeling from intra-oral 3D scans are essential tasks in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, develo** automated algorithms for teeth analysis presents significant challenges due to variations in dental anatomy, imaging protocols, and limited availability of publicly accessible data. To address these challenges, the 3DTeethSeg'22 challenge was organized in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2022, with a call for algorithms tackling teeth localization, segmentation, and labeling from intraoral 3D scans. A dataset comprising a total of 1800 scans from 900 patients was prepared, and each tooth was individually annotated by a human-machine hybrid algorithm. A total of 6 algorithms were evaluated on this dataset. In this study, we present the evaluation results of the 3DTeethSeg'22 challenge. The 3DTeethSeg'22 challenge code can be accessed at: https://github.com/abenhamadou/3DTeethSeg22_challenge △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 29 pages, MICCAI 2022 Singapore, Satellite Event, Challenge

arXiv:2302.03391 [pdf, other]

Sparse GEMINI for Joint Discriminative Clustering and Feature Selection

Authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso

Abstract: Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on $p(\pmb{x})$, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisatio… ▽ More Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on $p(\pmb{x})$, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple $\ell_1$ penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a clustering model $p_θ(y|\pmb{x})$. We demonstrate the performances of Sparse GEMINI on synthetic datasets as well as large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses. △ Less

Submitted 7 February, 2023; originally announced February 2023.

MSC Class: 62H30 ACM Class: G.3

arXiv:2210.06300 [pdf, other]

Generalised Mutual Information for Discriminative Clustering

Authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frederic Precioso

Abstract: In the last decade, recent successes in deep clustering majorly involved the mutual information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight h… ▽ More In the last decade, recent successes in deep clustering majorly involved the mutual information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the generalised mutual information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training. Some of these metrics are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep clustering context where the number of clusters is a priori unknown. △ Less

Submitted 14 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: To be published in Neural Information Processing Systems 2022

MSC Class: 62H30 ACM Class: G.3

arXiv:2207.04130 [pdf, other]

Multi-view Attention for gestational age at birth prediction

Authors: Mathieu Leclercq, Martin Styner, Juan Carlos Prieto

Abstract: We present our method for gestational age at birth prediction for the SLCN (surface learning for clinical neuroimaging) challenge. Our method is based on a multi-view shape analysis technique that captures 2D renderings of a 3D object from different viewpoints. We render the brain features on the surface of the sphere and then the 2D images are analyzed via 2D CNNs and an attention layer for the r… ▽ More We present our method for gestational age at birth prediction for the SLCN (surface learning for clinical neuroimaging) challenge. Our method is based on a multi-view shape analysis technique that captures 2D renderings of a 3D object from different viewpoints. We render the brain features on the surface of the sphere and then the 2D images are analyzed via 2D CNNs and an attention layer for the regression task. The regression task achieves a MAE of 1.637 +- 1.3 on the Native space and MAE of 1.38 +- 1.14 on the template space. The source code for this project is available in our github repository https://github.com/MathieuLeclercq/SLCN_challenge_UNC △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 7 pages, 5 figures, 1 table

arXiv:cs/0411082 [pdf, ps, other]

Support pour la reconfiguration d'implantation dans les applications a composants Java

Authors: Jakub Kornas, Matthieu Leclercq, Vivien Quema, Jean-Bernard Stefani

Abstract: Nowadays, numerous component models are used for various purposes: to build applications, middleware or even operating systems. Those models commonly support structure reconfiguration, that is modification of application's architecture at runtime. On the other hand, very few allow implementation reconfiguration, that is runtime modification of the code of components building the application. In… ▽ More Nowadays, numerous component models are used for various purposes: to build applications, middleware or even operating systems. Those models commonly support structure reconfiguration, that is modification of application's architecture at runtime. On the other hand, very few allow implementation reconfiguration, that is runtime modification of the code of components building the application. In this article we present the work we performed on JULIA, a Java-based implementation of the FRACTAL component model, in order for it to support implementation reconfigurations. We show how we overcame the limitations of Java class loading mechanism to allow runtime modifications of components' implementation and interfaces. We also describe the integration of our solution with the JULIA ADL. △ Less

Submitted 24 November, 2004; originally announced November 2004.

Journal ref: DECOR04 (2004) 171-184

Showing 1–7 of 7 results for author: Leclercq, M