Search | arXiv e-print repository

Clustering Longitudinal Ordinal Data via Finite Mixture of Matrix-Variate Distributions

Authors: Francesco Amato, Julien Jacques, Isabelle Prim-Allaz

Abstract: In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of a underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, ac… ▽ More In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of a underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviours during the Covid-19 pandemic period in France will be presented. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2306.02484 [pdf, ps, other]

Post-Lie algebras of derivations and regularity structures

Authors: Jean-David Jacques, Lorenzo Zambotti

Abstract: Given a commutative algebra $A$, we exhibit a canonical structure of post-Lie algebra on the space $A\otimes Der(A)$ where $Der(A)$ is the space of derivations on $A$, in order to use the machinery given in [Guin & Oudom 2008] and [Ebrahimi-Fard & Lundervold & Munthe-Kaas 2015] and to define a Hopf algebra structure on the associated envelo** algebra with a natural action on $A$. We apply these… ▽ More Given a commutative algebra $A$, we exhibit a canonical structure of post-Lie algebra on the space $A\otimes Der(A)$ where $Der(A)$ is the space of derivations on $A$, in order to use the machinery given in [Guin & Oudom 2008] and [Ebrahimi-Fard & Lundervold & Munthe-Kaas 2015] and to define a Hopf algebra structure on the associated envelo** algebra with a natural action on $A$. We apply these results to the setting of [Linares & Otto & Tempelmayr 2023], giving a simpler and more efficient construction of their action and extending the recent work [Bruned & Katsetsiadis]. This approach gives an optimal setting to perform explicit computations in the associated structure group. △ Less

Submitted 14 March, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Revision of many details, the last section has been deleted for future work

MSC Class: 60L30; 60L70; 16S30; 16T05

arXiv:2209.04803 [pdf, other]

doi 10.1038/s41467-023-38423-7

A map of single-phase high-entropy alloys

Authors: Wei Chen, Antoine Hilhorst, Georges Bokas, Stéphane Gorsse, Pascal J. Jacques, Geoffroy Hautier

Abstract: High-entropy alloys have shown much interest and unusual materials properties. The stability of equimolar single-phase solid solution of five or more elements is likely to be rare and identifying the existence of such alloys has been very challenging because of the very large space of possible combinations. Herein, based on high-throughput density-functional theory calculations, we construct a che… ▽ More High-entropy alloys have shown much interest and unusual materials properties. The stability of equimolar single-phase solid solution of five or more elements is likely to be rare and identifying the existence of such alloys has been very challenging because of the very large space of possible combinations. Herein, based on high-throughput density-functional theory calculations, we construct a chemical map of single-phase equimolar high entropy alloys by investigating over 650000 equimolar quinary alloys through a binary regular solid-solution model. We identify more than 30000 potential single-phase equimolar alloys (5% of the possible combinations) forming mainly in body-centered cubic structures. We unveil the chemistries that are likely to form high-entropy alloys, and identify the complex interplay among mixing enthalpy, intermetallics formation, and melting point that drives the formation of these solid solutions. We demonstrate the power of our method by predicting the existence of two new high entropy alloys, i.e. the body-centered cubic AlCoMnNiV and the face-centered cubic CoFeMnNiZn, which are successfully synthesized. △ Less

Submitted 20 April, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: 12 pages, 6 figures, 1 table (excluding SI)

arXiv:2205.14631 [pdf, ps, other]

doi 10.1145/3487553.3524927

Anchor Prediction: A Topic Modeling Approach

Authors: Jean Dupuy, Adrien Guille, Julien Jacques

Abstract: Networks of documents connected by hyperlinks, such as Wikipedia, are ubiquitous. Hyperlinks are inserted by the authors to enrich the text and facilitate the navigation through the network. However, authors tend to insert only a fraction of the relevant hyperlinks, mainly because this is a time consuming task. In this paper we address an annotation, which we refer to as anchor prediction. Even th… ▽ More Networks of documents connected by hyperlinks, such as Wikipedia, are ubiquitous. Hyperlinks are inserted by the authors to enrich the text and facilitate the navigation through the network. However, authors tend to insert only a fraction of the relevant hyperlinks, mainly because this is a time consuming task. In this paper we address an annotation, which we refer to as anchor prediction. Even though it is conceptually close to link prediction or entity linking, it is a different task that require develo** a specific method to solve it. Given a source document and a target document, this task consists in automatically identifying anchors in the source document, i.e words or terms that should carry a hyperlink pointing towards the target document. We propose a contextualized relational topic model, CRTM, that models directed links between documents as a function of the local context of the anchor in the source document and the whole content of the target document. The model can be used to predict anchors in a source document, given the target document, without relying on a dictionary of previously seen mention or title, nor any external knowledge graph. Authors can benefit from CRTM, by letting it automatically suggest hyperlinks, given a new document and the set of target document to connect to. It can also benefit to readers, by dynamically inserting hyperlinks between the documents they're reading. Experiments conducted on several Wikipedia corpora (in English, Italian and German) highlight the practical usefulness of anchor prediction and demonstrate the relevancy of our approach. △ Less

Submitted 1 June, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

Comments: 14 pages, correct typo and \citep

arXiv:2203.16241 [pdf, other]

Biclustering Algorithms Based on Metaheuristics: A Review

Authors: Adan Jose-Garcia, Julie Jacques, Vincent Sobanski, Clarisse Dhaenens

Abstract: Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem that can be formulated as an optimization problem.… ▽ More Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem that can be formulated as an optimization problem. Therefore, different metaheuristics have been applied to biclustering problems because of their exploratory capability of solving complex optimization problems in reasonable computation time. Although various surveys on biclustering have been proposed, there is a lack of a comprehensive survey on the biclustering problem using metaheuristics. This chapter will present a survey of metaheuristics approaches to address the biclustering problem. The review focuses on the underlying optimization methods and their main search components: representation, objective function, and variation operators. A specific discussion on single versus multi-objective approaches is presented. Finally, some emerging research directions are presented. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: 32 pages, 6 figures, 2 tables, chapter book

MSC Class: 6811

arXiv:2106.07222 [pdf, other]

Outlier detection in multivariate functional data through a contaminated mixture model

Authors: Martial Amovin-Assagba, Irène Gannaz, Julien Jacques

Abstract: In an industrial context, the activity of sensors is recorded at a high frequency. A challenge is to automatically detect abnormal measurement behavior. Considering the sensor measures as functional data, the problem can be formulated as the detection of outliers in a multivariate functional data set. Due to the heterogeneity of this data set, the proposed contaminated mixture model both clusters… ▽ More In an industrial context, the activity of sensors is recorded at a high frequency. A challenge is to automatically detect abnormal measurement behavior. Considering the sensor measures as functional data, the problem can be formulated as the detection of outliers in a multivariate functional data set. Due to the heterogeneity of this data set, the proposed contaminated mixture model both clusters the multivariate functional data into homogeneous groups and detects outliers. The main advantage of this procedure over its competitors is that it does not require to specify the proportion of outliers. Model inference is performed through an Expectation-Conditional Maximization algorithm, and the BIC is used to select the number of clusters. Numerical experiments on simulated data demonstrate the high performance achieved by the inference algorithm. In particular, the proposed model outperforms the competitors. Its application on the real data which motivated this study allows to correctly detect abnormal behaviors. △ Less

Submitted 8 March, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2105.13627 [pdf, other]

Simultaneous predictive bands for functional time series using minimum entropy sets

Authors: Nicolás Hernández, Jairo Cugliari, Julien Jacques

Abstract: Functional Time Series are sequences of dependent random elements taking values on some functional space. Most of the research on this domain is focused on producing a predictor able to forecast the value of the next function having observed a part of the sequence. For this, the Autoregresive Hilbertian process is a suitable framework. We address here the problem of constructing simultaneous predi… ▽ More Functional Time Series are sequences of dependent random elements taking values on some functional space. Most of the research on this domain is focused on producing a predictor able to forecast the value of the next function having observed a part of the sequence. For this, the Autoregresive Hilbertian process is a suitable framework. We address here the problem of constructing simultaneous predictive confidence bands for a stationary functional time series. The method is based on an entropy measure for stochastic processes, in particular functional time series. To construct predictive bands we use a functional bootstrap procedure that allow us to estimate the prediction law through the use of pseudo-predictions. Each pseudo-realisation is then projected into a space of finite dimension, associated to a functional basis. We use Reproducing Kernel Hilbert Spaces (RKHS) to represent the functions, considering then the basis associated to the reproducing kernel. Using a simple decision rule, we classify the points on the projected space among those belonging to the minimum entropy set and those that do not. We push back the minimum entropy set to the functional space and construct a band using the regularity property of the RKHS. The proposed methodology is illustrated through artificial and real-world data sets. △ Less

Submitted 4 May, 2023; v1 submitted 28 May, 2021; originally announced May 2021.

arXiv:2104.13506 [pdf]

doi 10.1016/j.engfracmech.2021.107756

Combined numerical and experimental estimation of the fracture toughness and failure analysis of single lap shear test for dissimilar welds

Authors: Norberto Jimenez Mena, Thaneshan Sapanathan, Pascal J. Jacques, Aude Simar

Abstract: The single lap shear test is widely used to measure the strength of dissimilar welds even though such a test brings limited understanding of the intrinsic weld toughness. The present study proposes a numerical finite element (FE) analysis and experimental characterization of dissimilar joints presenting various microstructures (thickness of the intermetallic layer (IML) and hardness profile). For… ▽ More The single lap shear test is widely used to measure the strength of dissimilar welds even though such a test brings limited understanding of the intrinsic weld toughness. The present study proposes a numerical finite element (FE) analysis and experimental characterization of dissimilar joints presenting various microstructures (thickness of the intermetallic layer (IML) and hardness profile). For this purpose, Friction Melt Bonding (FMB) and Friction Stir Welding (FSW) were used to join aluminum AA6061 and Dual Phase steel (DP980). The FE simulations allowed calculating the evolution of the J-integral near this notch tip. It shows that crack initiation depends significantly on the plastic properties of the welded metallic alloys around the notch tip and the width of the welded zone, which both are significantly different for FSW and FMB processes. Nevertheless, a similar weld fracture toughness J_C of approximately 1 kJ.m-2 is estimated from the analysis for both FMB and FSW. This is three orders of magnitude higher than the fracture toughness of the intermetallic layer, revealing that the plastic dissipation in the Al and steel plates around the crack tip has a major effect on the weld toughness. △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2103.16939 [pdf]

doi 10.1016/j.scriptamat.2021.113910

High temperature in situ SEM assessment followed by ex situ AFM and EBSD investigation of the nucleation and early growth stages of Fe-Al intermetallics

Authors: T. Sapanathan, I. Sabirov, P. Xia, M. A. Monclus, J. M. Molina-Aldareguia, P. J. Jacques, A. Simar

Abstract: A dedicated in situ heating setup in a scanning electron microscope (SEM) followed by an ex situ atomic force microscopy (AFM) and electron backscatter diffraction (EBSD) is used to characterize the nucleation and early growth stages of Fe-Al intermetallics (IMs) at 596 °C. A location tracking is used to interpret further characterization. Ex situ AFM observations reveal a slight shrinkage and out… ▽ More A dedicated in situ heating setup in a scanning electron microscope (SEM) followed by an ex situ atomic force microscopy (AFM) and electron backscatter diffraction (EBSD) is used to characterize the nucleation and early growth stages of Fe-Al intermetallics (IMs) at 596 °C. A location tracking is used to interpret further characterization. Ex situ AFM observations reveal a slight shrinkage and out of plane protrusion of the IM at the onset of IM nucleation followed by directional growth. The formed interfacial IM compounds were identified by ex situ EBSD. It is now clearly demonstrated that the θ-phase nucleates first prior to the diffusion-controlled growth of the η-phase. The θ-phase prevails the intermetallic layer. △ Less

Submitted 31 March, 2021; originally announced March 2021.

Journal ref: Scripta Materialia, Volume 200, 15 July 2021, 113910

arXiv:2001.05727 [pdf, other]

Document Network Projection in Pretrained Word Embedding Space

Authors: Antoine Gourru, Adrien Guille, Julien Velcin, Julien Jacques

Abstract: We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector avera… ▽ More We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes. △ Less

Submitted 16 January, 2020; originally announced January 2020.

arXiv:1912.03048 [pdf, other]

Document Network Embedding: Co** for Missing Content and Missing Links

Authors: Jean Dupuy, Adrien Guille, Julien Jacques

Abstract: Searching through networks of documents is an important task. A promising path to improve the performance of information retrieval systems in this context is to leverage dense node and content representations learned with embedding techniques. However, these techniques cannot learn representations for documents that are either isolated or whose content is missing. To tackle this issue, assuming th… ▽ More Searching through networks of documents is an important task. A promising path to improve the performance of information retrieval systems in this context is to leverage dense node and content representations learned with embedding techniques. However, these techniques cannot learn representations for documents that are either isolated or whose content is missing. To tackle this issue, assuming that the topology of the network and the content of the documents correlate, we propose to estimate the missing node representations from the available content representations, and conversely. Inspired by recent advances in machine translation, we detail in this paper how to learn a linear transformation from a set of aligned content and node representations. The projection matrix is efficiently calculated in terms of the singular value decomposition. The usefulness of the proposed method is highlighted by the improved ability to predict the neighborhood of nodes whose links are unobserved based on the projected content representations, and to retrieve similar documents when content is missing, based on the projected node representations. △ Less

Submitted 6 December, 2019; originally announced December 2019.

arXiv:1601.07999 [pdf, ps, other]

doi 10.1214/15-AOAS861

The discriminative functional mixture model for a comparative analysis of bike sharing systems

Authors: Charles Bouveyron, Etienne Côme, Julien Jacques

Abstract: Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work… ▽ More Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work was motivated by interest in analyzing and comparing several European BSSs to identify common operating patterns in BSSs and to propose practical solutions to avoid potential issues. Our approach relies on the identification of common patterns between and within systems. To this end, a model-based clustering method, called FunFEM, for time series (or more generally functional data) is developed. It is based on a functional mixture model that allows the clustering of the data in a discriminative functional subspace. This model presents the advantage in this context to be parsimonious and to allow the visualization of the clustered systems. Numerical experiments confirm the good behavior of FunFEM, particularly compared to state-of-the-art methods. The application of FunFEM to BSS data from JCDecaux and the Transport for London Initiative allows us to identify 10 general patterns, including pathological ones, and to propose practical improvement strategies based on the system comparison. The visualization of the clustered data within the discriminative subspace turns out to be particularly informative regarding the system efficiency. The proposed methodology is implemented in a package for the R software, named funFEM, which is available on the CRAN. The package also provides a subset of the data analyzed in this work. △ Less

Submitted 29 January, 2016; originally announced January 2016.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS861 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS861

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 4, 1726-1760

arXiv:1509.07344 [pdf, other]

Opinion mining from twitter data using evolutionary multinomial mixture models

Authors: Md. Abul Hasnat, Julien Velcin, Stéphane Bonnevay, Julien Jacques

Abstract: Image of an entity can be defined as a structured and dynamic representation which can be extracted from the opinions of a group of users or population. Automatic extraction of such an image has certain importance in political science and sociology related studies, e.g., when an extended inquiry from large-scale data is required. We study the images of two politically significant entities of Franc… ▽ More Image of an entity can be defined as a structured and dynamic representation which can be extracted from the opinions of a group of users or population. Automatic extraction of such an image has certain importance in political science and sociology related studies, e.g., when an extended inquiry from large-scale data is required. We study the images of two politically significant entities of France. These images are constructed by analyzing the opinions collected from a well known social media called Twitter. Our goal is to build a system which can be used to automatically extract the image of entities over time. In this paper, we propose a novel evolutionary clustering method based on the parametric link among Multinomial mixture models. First we propose the formulation of a generalized model that establishes parametric links among the Multinomial distributions. Afterward, we follow a model-based clustering approach to explore different parametric sub-models and select the best model. For the experiments, first we use synthetic temporal data. Next, we apply the method to analyze the annotated social media data. Results show that the proposed method is better than the state-of-the-art based on the common evaluation metrics. Additionally, our method can provide interpretation about the temporal evolution of the clusters. △ Less

Submitted 24 September, 2015; originally announced September 2015.

Comments: Submitted to the Annals of Applied Statistics

arXiv:1505.02324 [pdf, other]

Simultaneous Clustering and Model Selection for Multinomial Distribution: A Comparative Study

Authors: Md. Abul Hasnat, Julien Velcin, Stéphane Bonnevay, Julien Jacques

Abstract: In this paper, we study different discrete data clustering methods, which use the Model-Based Clustering (MBC) framework with the Multinomial distribution. Our study comprises several relevant issues, such as initialization, model estimation and model selection. Additionally, we propose a novel MBC method by efficiently combining the partitional and hierarchical clustering techniques. We conduct e… ▽ More In this paper, we study different discrete data clustering methods, which use the Model-Based Clustering (MBC) framework with the Multinomial distribution. Our study comprises several relevant issues, such as initialization, model estimation and model selection. Additionally, we propose a novel MBC method by efficiently combining the partitional and hierarchical clustering techniques. We conduct experiments on both synthetic and real data and evaluate the methods using accuracy, stability and computation time. Our study identifies appropriate strategies to be used for discrete data analysis with the MBC methods. Moreover, our proposed method is very competitive w.r.t. clustering accuracy and better w.r.t. stability and computation time. △ Less

Submitted 6 September, 2015; v1 submitted 9 May, 2015; originally announced May 2015.

Comments: Accepted in the International Symposium on Intelligent Data Analysis (IDA 2015)

arXiv:1110.5506 [pdf]

doi 10.1063/1.3136849

Schottky barrier lowering with the formation of crystalline Er silicide on n-Si upon thermal annealing

Authors: Nicolas Reckinger, Xiaohui Tang Vincent Bayot, Dmitri A. Yarekha, Emmanuel Dubois, Sylvie Godey, Xavier Wallart, Guilhem Larrieu, Adam Laszcz, Jacek Ratajczak, Pascal J. Jacques, Jean-Pierre Raskin

Abstract: The evolution of the Schottky barrier height (SBH) of Er silicide contacts to n-Si is investigated as a function of the annealing temperature. The SBH is found to drop substantially from 0.43 eV for the as-deposited sample to reach 0.28 eV, its lowest value, at 450 C. By x-ray diffraction, high resolution transmission electron microscopy, and x-ray photoelectron spectroscopy, the decrease in the S… ▽ More The evolution of the Schottky barrier height (SBH) of Er silicide contacts to n-Si is investigated as a function of the annealing temperature. The SBH is found to drop substantially from 0.43 eV for the as-deposited sample to reach 0.28 eV, its lowest value, at 450 C. By x-ray diffraction, high resolution transmission electron microscopy, and x-ray photoelectron spectroscopy, the decrease in the SBH is shown to be associated with the progressive formation of crystalline ErSi2-x. △ Less

Submitted 25 October, 2011; originally announced October 2011.

Journal ref: Applied Physics Letters 94, 191913, 2009

Showing 1–15 of 15 results for author: Jacques, J