Skip to main content

Showing 1–22 of 22 results for author: Bouveyron, C

.
  1. arXiv:2309.02858  [pdf, other

    stat.ML cs.AI cs.IT cs.LG stat.ME

    Generalised Mutual Information: a Framework for Discriminative Clustering

    Authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso

    Abstract: In the last decade, recent successes in deep clustering majorly involved the Mutual Information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight h… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Submitted for review at the IEEE Transactions on Pattern Analysis and Machine Intelligence. This article is an extension of an original NeurIPS 2022 article [arXiv:2210.06300]

    MSC Class: 62H30 ACM Class: G.3

  2. arXiv:2304.08242  [pdf, other

    cs.LG cs.CL cs.SI stat.ME

    The Deep Latent Position Topic Model for Clustering and Representation of Networks with Textual Edges

    Authors: Rémi Boutin, Pierre Latouche, Charles Bouveyron

    Abstract: Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a comprehensible visualisation of the data is mandatory. To… ▽ More

    Submitted 13 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: 29 pages including the appendix, 13 figures, 6 tables, journal paper

  3. arXiv:2302.07540  [pdf, other

    stat.ML

    Are labels informative in semi-supervised learning? -- Estimating and leveraging the missing-data mechanism

    Authors: Aude Sportisse, Hugo Schmutz, Olivier Humbert, Charles Bouveyron, Pierre-Alexandre Mattei

    Abstract: Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of ``informative'' labels, which occur when some classes are more likely to be labeled than others. In the missing data literature, such labels are called missing not at random. In this paper, we propose a novel approach to address this issue by… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  4. arXiv:2302.03391  [pdf, other

    stat.ML cs.AI cs.LG stat.CO stat.ME

    Sparse GEMINI for Joint Discriminative Clustering and Feature Selection

    Authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso

    Abstract: Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on $p(\pmb{x})$, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisatio… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    MSC Class: 62H30 ACM Class: G.3

  5. arXiv:2210.06300  [pdf, other

    stat.ML cs.AI cs.IT cs.LG stat.ME

    Generalised Mutual Information for Discriminative Clustering

    Authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frederic Precioso

    Abstract: In the last decade, recent successes in deep clustering majorly involved the mutual information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight h… ▽ More

    Submitted 14 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: To be published in Neural Information Processing Systems 2022

    MSC Class: 62H30 ACM Class: G.3

  6. arXiv:2209.10097  [pdf, other

    cs.SI stat.ME

    Embedded Topics in the Stochastic Block Model

    Authors: Rémi Boutin, Charles Bouveyron, Pierre Latouche

    Abstract: Communication networks such as emails or social networks are now ubiquitous and their analysis has become a strategic field. In many applications, the goal is to automatically extract relevant information by looking at the nodes and their connections. Unfortunately, most of the existing methods focus on analysing the presence or absence of edges and textual data is often discarded. However, all co… ▽ More

    Submitted 25 July, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

  7. arXiv:2106.03821  [pdf, other

    cs.SD cs.CL cs.CV eess.AS

    Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

    Authors: Baptiste Pouthier, Laurent Pilati, Leela K. Gudupudi, Charles Bouveyron, Frederic Precioso

    Abstract: It is now well established from a variety of studies that there is a significant benefit from combining video and audio data in detecting active speakers. However, either of the modalities can potentially mislead audiovisual fusion by inducing unreliable or deceptive information. This paper outlines active speaker detection as a multi-objective learning problem to leverage best of each modalities… ▽ More

    Submitted 15 September, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: In INTERSPEECH 2021

    Journal ref: Proc. Interspeech 2021, 2381-2385

  8. arXiv:2104.03083  [pdf, other

    stat.ME

    Co-clustering of time-dependent data via Shape Invariant Model

    Authors: Alessandro Casa, Charles Bouveyron, Elena Erosheva, Giovanna Menardi

    Abstract: Multivariate time-dependent data, where multiple features are observed over time for a set of individuals, are increasingly widespread in many application domains. To model these data we need to account for relations among both time instants and variables and, at the same time, for subjects heterogeneity. We propose a new co-clustering methodology for clustering individuals and variables simultane… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: 21 pages, 7 figures

  9. arXiv:2103.05928  [pdf, other

    astro-ph.GA astro-ph.IM

    Unsupervised classification of SDSS galaxy spectra

    Authors: Didier Fraix-Burnet, C. Bouveyron, J. Moultaka

    Abstract: Defining templates of galaxy spectra is useful to quickly characterise new observations and organise databases from surveys. These templates are usually built from a pre-defined classification based on other criteria. Aims. We present an unsupervised classification of 702248 spectra of galaxies and quasars with redshifts smaller than 0.25 that were retrieved from the Sloan Digital Sky Survey (SDSS… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Journal ref: A&A 649, A53 (2021)

  10. arXiv:2102.01982  [pdf, other

    stat.ME stat.CO stat.ML

    Unobserved classes and extra variables in high-dimensional discriminant analysis

    Authors: Michael Fop, Pierre-Alexandre Mattei, Charles Bouveyron, Thomas Brendan Murphy

    Abstract: In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle po… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: 29 pages, 29 figures

  11. arXiv:2012.04620  [pdf, other

    stat.ME

    A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering

    Authors: Nicolas Jouvin, Charles Bouveyron, Pierre Latouche

    Abstract: High-dimensional data clustering has become and remains a challenging task for modern statistics and machine learning, with a wide range of applications. We consider in this work the powerful discriminative latent mixture model, and we extend it to the Bayesian framework. Modeling data as a mixture of Gaussians in a low-dimensional discriminative subspace, a Gaussian prior distribution is introduc… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

    Comments: The FisherEM package is available on CRAN, see https://github.com/nicolasJouvin/FisherEM for additional information

  12. Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

    Authors: Etienne Côme, Nicolas Jouvin, Pierre Latouche, Charles Bouveyron

    Abstract: Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective function, this work applies to every discrete latent varia… ▽ More

    Submitted 21 April, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: Adv Data Anal Classif (2021)

  13. Greedy clustering of count data through a mixture of multinomial PCA

    Authors: Nicolas Jouvin, Pierre Latouche, Charles Bouveyron, Guillaume Bataillon, Alain Livartowski

    Abstract: Count data is becoming more and more ubiquitous in a wide range of applications, with datasets growing both in size and in dimension. In this context, an increasing amount of work is dedicated to the construction of statistical models directly accounting for the discrete nature of the data. Moreover, it has been shown that integrating dimension reduction to clustering can drastically improve perfo… ▽ More

    Submitted 10 July, 2020; v1 submitted 2 September, 2019; originally announced September 2019.

    Comments: 34 pages, 11 figures, published in : Computational Statistics

  14. arXiv:1703.02834  [pdf, ps, other

    stat.ME math.ST stat.ML

    Exact Dimensionality Selection for Bayesian PCA

    Authors: Charles Bouveyron, Pierre Latouche, Pierre-Alexandre Mattei

    Abstract: We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components.… ▽ More

    Submitted 21 May, 2019; v1 submitted 8 March, 2017; originally announced March 2017.

  15. Bayesian Variable Selection for Globally Sparse Probabilistic PCA

    Authors: Charles Bouveyron, Pierre Latouche, Pierre-Alexandre Mattei

    Abstract: Sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables is difficult since each axis has its own sparsity pattern and has to be interpreted separately. To ov… ▽ More

    Submitted 20 September, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

    Comments: An earlier version of this paper appeared in the Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016)

  16. The discriminative functional mixture model for a comparative analysis of bike sharing systems

    Authors: Charles Bouveyron, Etienne Côme, Julien Jacques

    Abstract: Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work… ▽ More

    Submitted 29 January, 2016; originally announced January 2016.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS861 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS861

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 4, 1726-1760

  17. Anomaly Detection Based on Confidence Intervals Using SOM with an Application to Health Monitoring

    Authors: Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jerome Lacaille

    Abstract: We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of… ▽ More

    Submitted 30 June, 2015; originally announced August 2015.

    Journal ref: T. Villmann, F.M. Schleif, M. Kaden, M. Lange. 10th International Workshop on Self-Organizing Maps, Jul 2014, Mittweida, Germany. Springer, 295, pp.145-155, 2014, Advances in Self-Organizing Maps and Learning Vector Quantization AISC

  18. The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul

    Authors: Yacine Jernite, Pierre Latouche, Charles Bouveyron, Patrick Rivera, Laurent Jegou, Stéphane Lamassé

    Abstract: In the last two decades many random graph models have been proposed to extract knowledge from networks. Most of them look for communities or, more generally, clusters of vertices with homogeneous connection profiles. While the first models focused on networks with binary edges only, extensions now allow to deal with valued networks. Recently, new models were also introduced in order to characteriz… ▽ More

    Submitted 5 May, 2014; v1 submitted 21 December, 2012; originally announced December 2012.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS691 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS691

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 1, 377-405

  19. arXiv:1204.4021  [pdf, ps, other

    stat.ME stat.AP

    Kernel discriminant analysis and clustering with parsimonious Gaussian process models

    Authors: Charles Bouveyron, Stéphane Girard, Mathieu Fauvel

    Abstract: This work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear map** functions which project the observations int… ▽ More

    Submitted 15 June, 2012; v1 submitted 18 April, 2012; originally announced April 2012.

  20. arXiv:1204.2067  [pdf, ps, other

    stat.ME math.ST

    Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

    Authors: Charles Bouveyron, Camille Brunet

    Abstract: The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in… ▽ More

    Submitted 10 April, 2012; originally announced April 2012.

  21. arXiv:1101.2374  [pdf, ps, other

    stat.ME stat.AP stat.CO stat.ML

    Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

    Authors: Charles Bouveyron, Camille Brunet

    Abstract: Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the ori… ▽ More

    Submitted 19 April, 2011; v1 submitted 12 January, 2011; originally announced January 2011.

    Journal ref: Statistics and Computing, 2011

  22. High-Dimensional Data Clustering

    Authors: Charles Bouveyron, Stéphane Girard, Cordelia Schmid

    Abstract: Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces hidden in the original space. This paper presents a family of Gaussian mixture models designed for high-dimensional data which combine the ideas of dimension… ▽ More

    Submitted 4 January, 2007; v1 submitted 4 April, 2006; originally announced April 2006.

    MSC Class: ACM-G3-Multivariate statistics

    Journal ref: Computational Statistics and Data Analysis 52, 1 (2007) 502-519