Search | arXiv e-print repository

Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning

Authors: G. Charbel N. Kindji, Lina Maria Rojas-Barahona, Elisa Fromont, Tanguy Urvoy

Abstract: We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model tha… ▽ More We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model that allows for quick optimization, achieving nearly equivalent performance at a significantly lower cost.Our benchmark demonstrates that, for most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations. Furthermore, we confirm that diffusion-based models generally outperform other models on tabular data. However, this advantage is not significant when the entire tuning and training process is restricted to the same GPU budget for all models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2404.03703 [pdf, other]

Mitigating analytical variability in fMRI results with style transfer

Authors: Elodie Germani, Elisa Fromont, Camille Maumet

Abstract: We propose a novel approach to improve the reproducibility of neuroimaging results by converting statistic maps across different functional MRI pipelines. We make the assumption that pipelines can be considered as a style component of data and propose to use different generative models, among which, Diffusion Models (DM) to convert data between pipelines. We design a new DM-based unsupervised mult… ▽ More We propose a novel approach to improve the reproducibility of neuroimaging results by converting statistic maps across different functional MRI pipelines. We make the assumption that pipelines can be considered as a style component of data and propose to use different generative models, among which, Diffusion Models (DM) to convert data between pipelines. We design a new DM-based unsupervised multi-domain image-to-image transition framework and constrain the generation of 3D fMRI statistic maps using the latent space of an auxiliary classifier that distinguishes statistic maps from different pipelines. We extend traditional sampling techniques used in DM to improve the transition performance. Our experiments demonstrate that our proposed methods are successful: pipelines can indeed be transferred, providing an important source of data augmentation for future medical studies. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2312.06231 [pdf, other]

Uncovering communities of pipelines in the task-fMRI analytical space

Authors: Elodie Germani, Elisa Fromont, Camille Maumet

Abstract: Analytical workflows in functional magnetic resonance imaging are highly flexible with limited best practices as to how to choose a pipeline. While it has been shown that the use of different pipelines might lead to different results, there is still a lack of understanding of the factors that drive these differences and of the stability of these differences across contexts. We use community detect… ▽ More Analytical workflows in functional magnetic resonance imaging are highly flexible with limited best practices as to how to choose a pipeline. While it has been shown that the use of different pipelines might lead to different results, there is still a lack of understanding of the factors that drive these differences and of the stability of these differences across contexts. We use community detection algorithms to explore the pipeline space and assess the stability of pipeline relationships across different contexts. We show that there are subsets of pipelines that give similar results, especially those sharing specific parameters (e.g. number of motion regressors, software packages, etc.). Those pipeline-to-pipeline patterns are stable across groups of participants but not across different tasks. By visualizing the differences between communities, we show that the pipeline space is mainly driven by the size of the activation area in the brain and the scale of statistic values in statistic maps. △ Less

Submitted 18 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted at the 2024 IEEE International Conference on Image Processing

arXiv:2308.07250 [pdf, other]

LCE: An Augmented Combination of Bagging and Boosting in Python

Authors: Kevin Fauvel, Élisa Fromont, Véronique Masson, Philippe Faverdin, Alexandre Termier

Abstract: lcensemble is a high-performing, scalable and user-friendly Python package for the general tasks of classification and regression. The package implements Local Cascade Ensemble (LCE), a machine learning method that further enhances the prediction performance of the current state-of-the-art methods Random Forest and XGBoost. LCE combines their strengths and adopts a complementary diversification ap… ▽ More lcensemble is a high-performing, scalable and user-friendly Python package for the general tasks of classification and regression. The package implements Local Cascade Ensemble (LCE), a machine learning method that further enhances the prediction performance of the current state-of-the-art methods Random Forest and XGBoost. LCE combines their strengths and adopts a complementary diversification approach to obtain a better generalizing predictor. The package is compatible with scikit-learn, therefore it can interact with scikit-learn pipelines and model selection tools. It is distributed under the Apache 2.0 license, and its source code is available at https://github.com/LocalCascadeEnsemble/LCE. △ Less

Submitted 15 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2211.07205 [pdf, other]

Unique in the Smart Grid -The Privacy Cost of Fine-Grained Electrical Consumption Data

Authors: Antonin Voyez, Tristan Allard, Gildas Avoine, Pierre Cauchois, Elisa Fromont, Matthieu Simonin

Abstract: The collection of electrical consumption time series through smart meters grows with ambitious nationwide smart grid programs. This data is both highly sensitive and highly valuable: strong laws about personal data protect it while laws about open data aim at making it public after a privacy-preserving data publishing process. In this work, we study the uniqueness of large scale real-life fine-gra… ▽ More The collection of electrical consumption time series through smart meters grows with ambitious nationwide smart grid programs. This data is both highly sensitive and highly valuable: strong laws about personal data protect it while laws about open data aim at making it public after a privacy-preserving data publishing process. In this work, we study the uniqueness of large scale real-life fine-grained electrical consumption time-series and show its link to privacy threats. Our results show a worryingly high uniqueness rate in such datasets. In particular, we show that knowing 5 consecutive electric measures allows to re-identify on average more than 90% of households in our 2.5M half-hourly electric time series dataset. Moreover, uniqueness remains high even when data is severely degraded. For example, when data is rounded to the nearest 100 watts, knowing 7 consecutive electric measures allows to re-identify on average more than 40% of the households (same dataset). We also study the relationship between uniqueness and entropy, uniqueness and electric consumption, and electric consumption and temperatures, showing their strong correlation. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2209.10099 [pdf, other]

On the benefits of self-taught learning for brain decoding

Authors: Elodie Germani, Elisa Fromont, Camille Maumet

Abstract: Context. We study the benefits of using a large public neuroimaging database composed of fMRI statistic maps, in a self-taught learning framework, for improving brain decoding on new tasks. First, we leverage the NeuroVault database to train, on a selection of relevant statistic maps, a convolutional autoencoder to reconstruct these maps. Then, we use this trained encoder to initialize a supervise… ▽ More Context. We study the benefits of using a large public neuroimaging database composed of fMRI statistic maps, in a self-taught learning framework, for improving brain decoding on new tasks. First, we leverage the NeuroVault database to train, on a selection of relevant statistic maps, a convolutional autoencoder to reconstruct these maps. Then, we use this trained encoder to initialize a supervised convolutional neural network to classify tasks or cognitive processes of unseen statistic maps from large collections of the NeuroVault database. Results. We show that such a self-taught learning process always improves the performance of the classifiers but the magnitude of the benefits strongly depends on the number of samples available both for pre-training and finetuning the models and on the complexity of the targeted downstream task. Conclusion. The pre-trained model improves the classification performance and displays more generalizable features, less sensitive to individual differences. △ Less

Submitted 24 April, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

arXiv:2208.01515 [pdf, other]

UniRank: Unimodal Bandit Algorithm for Online Ranking

Authors: Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont

Abstract: We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}Δ\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $Δ$. We reduce this bound in two st… ▽ More We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}Δ\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $Δ$. We reduce this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in $O(L\frac{1}Δ\log(T))$. Secondly, we show that by moving the focus towards the main question `\emph{Is user $i$ better than user $j$?}' this regret becomes $O(L\fracΔ{\tildeΔ^2}\log(T))$, where $\TildeΔ > Δ$ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Journal ref: Complex Feedback in Online Learning Workshop at the 39th International Conference on Machine Learning, Jul 2022, Baltimore, United States

arXiv:2109.07519 [pdf, other]

Discovering Useful Compact Sets of Sequential Rules in a Long Sequence

Authors: Erwan Bourrand, Luis Galárraga, Esther Galbrun, Elisa Fromont, Alexandre Termier

Abstract: We are interested in understanding the underlying generation process for long sequences of symbolic events. To do so, we propose COSSU, an algorithm to mine small and meaningful sets of sequential rules. The rules are selected using an MDL-inspired criterion that favors compactness and relies on a novel rule-based encoding scheme for sequences. Our evaluation shows that COSSU can successfully retr… ▽ More We are interested in understanding the underlying generation process for long sequences of symbolic events. To do so, we propose COSSU, an algorithm to mine small and meaningful sets of sequential rules. The rules are selected using an MDL-inspired criterion that favors compactness and relies on a novel rule-based encoding scheme for sequences. Our evaluation shows that COSSU can successfully retrieve relevant sets of closed sequential rules from a long sequence. Such rules constitute an interpretable model that exhibits competitive accuracy for the tasks of next-element prediction and classification. △ Less

Submitted 30 December, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: 8 pages, published in the proceedings of the 33rd IEEE International Conference on Tools with Artificial Intelligence

arXiv:2009.14085 [pdf, other]

Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection

Authors: Heng Zhang, Elisa Fromont, Sébastien Lefevre, Bruno Avignon

Abstract: Most deep learning object detectors are based on the anchor mechanism and resort to the Intersection over Union (IoU) between predefined anchor boxes and ground truth boxes to evaluate the matching quality between anchors and objects. In this paper, we question this use of IoU and propose a new anchor matching criterion guided, during the training phase, by the optimization of both the localizatio… ▽ More Most deep learning object detectors are based on the anchor mechanism and resort to the Intersection over Union (IoU) between predefined anchor boxes and ground truth boxes to evaluate the matching quality between anchors and objects. In this paper, we question this use of IoU and propose a new anchor matching criterion guided, during the training phase, by the optimization of both the localization and the classification tasks: the predictions related to one task are used to dynamically assign sample anchors and improve the model on the other task, and vice versa. Despite the simplicity of the proposed method, our experiments with different state-of-the-art deep learning architectures on PASCAL VOC and MS COCO datasets demonstrate the effectiveness and generality of our Mutual Guidance strategy. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: Accepted by ACCV 2020

arXiv:2009.13181 [pdf, other]

Position-Based Multiple-Play Bandits with Thompson Sampling

Authors: Camille-Sovanneary Gauthier, Romaric Gaudel, Elisa Fromont

Abstract: Multiple-play bandits aim at displaying relevant items at relevant positions on a web page. We introduce a new bandit-based algorithm, PB-MHB, for online recommender systems which uses the Thompson sampling framework. This algorithm handles a display setting governed by the position-based model. Our sampling method does not require as input the probability of a user to look at a given position in… ▽ More Multiple-play bandits aim at displaying relevant items at relevant positions on a web page. We introduce a new bandit-based algorithm, PB-MHB, for online recommender systems which uses the Thompson sampling framework. This algorithm handles a display setting governed by the position-based model. Our sampling method does not require as input the probability of a user to look at a given position in the web page which is, in practice, very difficult to obtain. Experiments on simulated and real datasets show that our method, with fewer prior information, deliver better recommendations than state-of-the-art algorithms. △ Less

Submitted 3 March, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Accepted at IDA 2021

arXiv:2009.12664 [pdf, other]

Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks

Authors: Heng Zhang, Elisa Fromont, Sébastien Lefevre, Bruno Avignon

Abstract: Multispectral images (e.g. visible and infrared) may be particularly useful when detecting objects with the same model in different environments (e.g. day/night outdoor scenes). To effectively use the different spectra, the main technical problem resides in the information fusion process. In this paper, we propose a new halfway feature fusion method for neural networks that leverages the complemen… ▽ More Multispectral images (e.g. visible and infrared) may be particularly useful when detecting objects with the same model in different environments (e.g. day/night outdoor scenes). To effectively use the different spectra, the main technical problem resides in the information fusion process. In this paper, we propose a new halfway feature fusion method for neural networks that leverages the complementary/consistency balance existing in multispectral features by adding to the network architecture, a particular module that cyclically fuses and refines each spectral feature. We evaluate the effectiveness of our fusion method on two challenging multispectral datasets for object detection. Our results show that implementing our Cyclic Fuse-and-Refine module in any network improves the performance on both datasets compared to other state-of-the-art multispectral object detection methods. △ Less

Submitted 26 September, 2020; originally announced September 2020.

Comments: Accepted by ICIP 2020

arXiv:2009.04796 [pdf, other]

doi 10.3390/math9233137

XCM: An Explainable Convolutional Neural Network for Multivariate Time Series Classification

Authors: Kevin Fauvel, Tao Lin, Véronique Masson, Élisa Fromont, Alexandre Termier

Abstract: Multivariate Time Series (MTS) classification has gained importance over the past decade with the increase in the number of temporal datasets in multiple domains. The current state-of-the-art MTS classifier is a heavyweight deep learning approach, which outperforms the second-best MTS classifier only on large datasets. Moreover, this deep learning approach cannot provide faithful explanations as i… ▽ More Multivariate Time Series (MTS) classification has gained importance over the past decade with the increase in the number of temporal datasets in multiple domains. The current state-of-the-art MTS classifier is a heavyweight deep learning approach, which outperforms the second-best MTS classifier only on large datasets. Moreover, this deep learning approach cannot provide faithful explanations as it relies on post hoc model-agnostic explainability methods, which could prevent its use in numerous applications. In this paper, we present XCM, an eXplainable Convolutional neural network for MTS classification. XCM is a new compact convolutional neural network which extracts information relative to the observed variables and time directly from the input data. Thus, XCM architecture enables a good generalization ability on both large and small datasets, while allowing the full exploitation of a faithful post hoc model-specific explainability method (Gradient-weighted Class Activation Map**) by precisely identifying the observed variables and timestamps of the input data that are important for predictions. We first show that XCM outperforms the state-of-the-art MTS classifiers on both the large and small public UEA datasets. Then, we illustrate how XCM reconciles performance and explainability on a synthetic dataset and show that XCM enables a more precise identification of the regions of the input data that are important for predictions compared to the current deep learning MTS classifier also providing faithful explainability. Finally, we present how XCM can outperform the current most accurate state-of-the-art algorithm on a real-world application while enhancing explainability by providing faithful and more informative explanations. △ Less

Submitted 7 December, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

Comments: Accepted for publication in Mathematics. Another machine learning method for multivariate time series classification providing faithful explanations is presented in arXiv:2005.03645

arXiv:2005.14501 [pdf, other]

A Performance-Explainability Framework to Benchmark Machine Learning Methods: Application to Multivariate Time Series Classifiers

Authors: Kevin Fauvel, Véronique Masson, Élisa Fromont

Abstract: Our research aims to propose a new performance-explainability analytical framework to assess and benchmark machine learning methods. The framework details a set of characteristics that systematize the performance-explainability assessment of existing machine learning methods. In order to illustrate the use of the framework, we apply it to benchmark the current state-of-the-art multivariate time se… ▽ More Our research aims to propose a new performance-explainability analytical framework to assess and benchmark machine learning methods. The framework details a set of characteristics that systematize the performance-explainability assessment of existing machine learning methods. In order to illustrate the use of the framework, we apply it to benchmark the current state-of-the-art multivariate time series classifiers. △ Less

Submitted 19 November, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

Comments: In Proceedings of the IJCAI-PRICAI 2020 Workshop on Explainable Artificial Intelligence. An example of this framework in use is available in arXiv:2005.03645

arXiv:2005.03645 [pdf, other]

doi 10.1007/s10618-022-00823-6

XEM: An Explainable-by-Design Ensemble Method for Multivariate Time Series Classification

Authors: Kevin Fauvel, Élisa Fromont, Véronique Masson, Philippe Faverdin, Alexandre Termier

Abstract: We present XEM, an eXplainable-by-design Ensemble method for Multivariate time series classification. XEM relies on a new hybrid ensemble method that combines an explicit boosting-bagging approach to handle the bias-variance trade-off faced by machine learning models and an implicit divide-and-conquer approach to individualize classifier errors on different parts of the training data. Our evaluati… ▽ More We present XEM, an eXplainable-by-design Ensemble method for Multivariate time series classification. XEM relies on a new hybrid ensemble method that combines an explicit boosting-bagging approach to handle the bias-variance trade-off faced by machine learning models and an implicit divide-and-conquer approach to individualize classifier errors on different parts of the training data. Our evaluation shows that XEM outperforms the state-of-the-art MTS classifiers on the public UEA datasets. Furthermore, XEM provides faithful explainability-by-design and manifests robust performance when faced with challenges arising from continuous data collection (different MTS length, missing data and noise). △ Less

Submitted 15 February, 2022; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: Accepted for publication in Data Mining and Knowledge Discovery

arXiv:1906.00917 [pdf, other]

Learning Interpretable Shapelets for Time Series Classification through Adversarial Regularization

Authors: Yichang Wang, Rémi Emonet, Elisa Fromont, Simon Malinowski, Etienne Menager, Loïc Mosser, Romain Tavenard

Abstract: Times series classification can be successfully tackled by jointly learning a shapelet-based representation of the series in the dataset and classifying the series according to this representation. However, although the learned shapelets are discriminative, they are not always similar to pieces of a real series in the dataset. This makes it difficult to interpret the decision, i.e. difficult to an… ▽ More Times series classification can be successfully tackled by jointly learning a shapelet-based representation of the series in the dataset and classifying the series according to this representation. However, although the learned shapelets are discriminative, they are not always similar to pieces of a real series in the dataset. This makes it difficult to interpret the decision, i.e. difficult to analyze if there are particular behaviors in a series that triggered the decision. In this paper, we make use of a simple convolutional network to tackle the time series classification task and we introduce an adversarial regularization to constrain the model to learn more interpretable shapelets. Our classification results on all the usual time series benchmarks are comparable with the results obtained by similar state-of-the-art algorithms but our adversarially regularized method learns shapelets that are, by design, interpretable. △ Less

Submitted 12 June, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: submitted to CIKM2019

arXiv:1707.07958 [pdf, other]

Residual Conv-Deconv Grid Network for Semantic Segmentation

Authors: Damien Fourure, Rémi Emonet, Elisa Fromont, Damien Muselet, Alain Tremeau, Christian Wolf

Abstract: This paper presents GridNet, a new Convolutional Neural Network (CNN) architecture for semantic image segmentation (full scene labelling). Classical neural networks are implemented as one stream from the input to the output with subsampling operators applied in the stream in order to reduce the feature maps size and to increase the receptive field for the final prediction. However, for semantic im… ▽ More This paper presents GridNet, a new Convolutional Neural Network (CNN) architecture for semantic image segmentation (full scene labelling). Classical neural networks are implemented as one stream from the input to the output with subsampling operators applied in the stream in order to reduce the feature maps size and to increase the receptive field for the final prediction. However, for semantic image segmentation, where the task consists in providing a semantic class to each pixel of an image, feature maps reduction is harmful because it leads to a resolution loss in the output prediction. To tackle this problem, our GridNet follows a grid pattern allowing multiple interconnected streams to work at different resolutions. We show that our network generalizes many well known networks such as conv-deconv, residual or U-Net networks. GridNet is trained from scratch and achieves competitive results on the Cityscapes dataset. △ Less

Submitted 26 July, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

Comments: Accepted for publication at BMVC 2017

arXiv:0902.3373 [pdf]

Learning rules from multisource data for cardiac monitoring

Authors: Marie-Odile Cordier, Elisa Fromont, René Quiniou

Abstract: This paper formalises the concept of learning symbolic rules from multisource data in a cardiac monitoring context. Our sources, electrocardiograms and arterial blood pressure measures, describe cardiac behaviours from different viewpoints. To learn interpretable rules, we use an Inductive Logic Programming (ILP) method. We develop an original strategy to cope with the dimensionality issues caus… ▽ More This paper formalises the concept of learning symbolic rules from multisource data in a cardiac monitoring context. Our sources, electrocardiograms and arterial blood pressure measures, describe cardiac behaviours from different viewpoints. To learn interpretable rules, we use an Inductive Logic Programming (ILP) method. We develop an original strategy to cope with the dimensionality issues caused by using this ILP technique on a rich multisource language. The results show that our method greatly improves the feasibility and the efficiency of the process while staying accurate. They also confirm the benefits of using multiple sources to improve the diagnosis of cardiac arrhythmias. △ Less

Submitted 19 February, 2009; originally announced February 2009.

Showing 1–17 of 17 results for author: Fromont, É