-
Semi-Supervised Learning for Bilingual Lexicon Induction
Authors:
Paul Garnier,
Gauthier Guinet
Abstract:
We consider the problem of aligning two sets of continuous word representations, corresponding to languages, to a common space in order to infer a bilingual lexicon. It was recently shown that it is possible to infer such lexicon, without using any parallel data, by aligning word embeddings trained on monolingual data. Such line of work is called unsupervised bilingual induction. By wondering whet…
▽ More
We consider the problem of aligning two sets of continuous word representations, corresponding to languages, to a common space in order to infer a bilingual lexicon. It was recently shown that it is possible to infer such lexicon, without using any parallel data, by aligning word embeddings trained on monolingual data. Such line of work is called unsupervised bilingual induction. By wondering whether it was possible to gain experience in the progressive learning of several languages, we asked ourselves to what extent we could integrate the knowledge of a given set of languages when learning a new one, without having parallel data for the latter. In other words, while kee** the core problem of unsupervised learning in the latest step, we allowed the access to other corpora of idioms, hence the name semi-supervised. This led us to propose a novel formulation, considering the lexicon induction as a ranking problem for which we used recent tools of this machine learning field. Our experiments on standard benchmarks, inferring dictionary from English to more than 20 languages, show that our approach consistently outperforms existing state of the art benchmark. In addition, we deduce from this new scenario several relevant conclusions allowing a better understanding of the alignment phenomenon.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Event Detection in Time Series: Universal Deep Learning Approach
Authors:
Menouar Azib,
Benjamin Renard,
Philippe Garnier,
Vincent Génot,
Nicolas André
Abstract:
Event detection in time series is a challenging task due to the prevalence of imbalanced datasets, rare events, and time interval-defined events. Traditional supervised deep learning methods primarily employ binary classification, where each time step is assigned a binary label indicating the presence or absence of an event. However, these methods struggle to handle these specific scenarios effect…
▽ More
Event detection in time series is a challenging task due to the prevalence of imbalanced datasets, rare events, and time interval-defined events. Traditional supervised deep learning methods primarily employ binary classification, where each time step is assigned a binary label indicating the presence or absence of an event. However, these methods struggle to handle these specific scenarios effectively. To address these limitations, we propose a novel supervised regression-based deep learning approach that offers several advantages over classification-based methods. Our approach, with a limited number of parameters, can effectively handle various types of events within a unified framework, including rare events and imbalanced datasets. We provide theoretical justifications for its universality and precision and demonstrate its superior performance across diverse domains, particularly for rare events and imbalanced datasets.
△ Less
Submitted 29 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
A Comprehensive Python Library for Deep Learning-Based Event Detection in Multivariate Time Series Data and Information Retrieval in NLP
Authors:
Menouar Azib,
Benjamin Renard,
Philippe Garnier,
Vincent Génot,
Nicolas André
Abstract:
Event detection in time series data is crucial in various domains, including finance, healthcare, cybersecurity, and science. Accurately identifying events in time series data is vital for making informed decisions, detecting anomalies, and predicting future trends. Despite extensive research exploring diverse methods for event detection in time series, with deep learning approaches being among th…
▽ More
Event detection in time series data is crucial in various domains, including finance, healthcare, cybersecurity, and science. Accurately identifying events in time series data is vital for making informed decisions, detecting anomalies, and predicting future trends. Despite extensive research exploring diverse methods for event detection in time series, with deep learning approaches being among the most advanced, there is still room for improvement and innovation in this field. In this paper, we present a new deep learning supervised method for detecting events in multivariate time series data. Our method combines four distinct novelties compared to existing deep-learning supervised methods. Firstly, it is based on regression instead of binary classification. Secondly, it does not require labeled datasets where each point is labeled; instead, it only requires reference events defined as time points or intervals of time. Thirdly, it is designed to be robust by using a stacked ensemble learning meta-model that combines deep learning models, ranging from classic feed-forward neural networks (FFNs) to state-of-the-art architectures like transformers. This ensemble approach can mitigate individual model weaknesses and biases, resulting in more robust predictions. Finally, to facilitate practical implementation, we have developed a Python package to accompany our proposed method. The package, called eventdetector-ts, can be installed through the Python Package Index (PyPI). In this paper, we present our method and provide a comprehensive guide on the usage of the package. We showcase its versatility and effectiveness through different real-world use cases from natural language processing (NLP) to financial security domains.
△ Less
Submitted 18 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Evaluating Soccer Player: from Live Camera to Deep Reinforcement Learning
Authors:
Paul Garnier,
Théophane Gregoir
Abstract:
Scientifically evaluating soccer players represents a challenging Machine Learning problem. Unfortunately, most existing answers have very opaque algorithm training procedures; relevant data are scarcely accessible and almost impossible to generate. In this paper, we will introduce a two-part solution: an open-source Player Tracking model and a new approach to evaluate these players based solely o…
▽ More
Scientifically evaluating soccer players represents a challenging Machine Learning problem. Unfortunately, most existing answers have very opaque algorithm training procedures; relevant data are scarcely accessible and almost impossible to generate. In this paper, we will introduce a two-part solution: an open-source Player Tracking model and a new approach to evaluate these players based solely on Deep Reinforcement Learning, without human data training nor guidance. Our tracking model was trained in a supervised fashion on datasets we will also release, and our Evaluation Model relies only on simulations of virtual soccer games. Combining those two architectures allows one to evaluate Soccer Players directly from a live camera without large datasets constraints. We term our new approach Expected Discounted Goal (EDG), as it represents the number of goals a team can score or concede from a particular state. This approach leads to more meaningful results than the existing ones that are based on real-world data, and could easily be extended to other sports.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
A review on Deep Reinforcement Learning for Fluid Mechanics
Authors:
Paul Garnier,
Jonathan Viquerat,
Jean Rabault,
Aurélien Larcher,
Alexander Kuhnle,
Elie Hachem
Abstract:
Deep reinforcement learning (DRL) has recently been adopted in a wide range of physics and engineering domains for its ability to solve decision-making problems that were previously out of reach due to a combination of non-linearity and high dimensionality. In the last few years, it has spread in the field of computational mechanics, and particularly in fluid dynamics, with recent applications in…
▽ More
Deep reinforcement learning (DRL) has recently been adopted in a wide range of physics and engineering domains for its ability to solve decision-making problems that were previously out of reach due to a combination of non-linearity and high dimensionality. In the last few years, it has spread in the field of computational mechanics, and particularly in fluid dynamics, with recent applications in flow control and shape optimization. In this work, we conduct a detailed review of existing DRL applications to fluid mechanics problems. In addition, we present recent results that further illustrate the potential of DRL in Fluid Mechanics. The coupling methods used in each case are covered, detailing their advantages and limitations. Our review also focuses on the comparison with classical methods for optimal control and optimization. Finally, several test cases are described that illustrate recent progress made in this field. The goal of this publication is to provide an understanding of DRL capabilities along with state-of-the-art applications in fluid dynamics to researchers wishing to address new problems with these methods.
△ Less
Submitted 25 February, 2021; v1 submitted 12 August, 2019;
originally announced August 2019.
-
3D Quantum Cuts for Automatic Segmentation of Porous Media in Tomography Images
Authors:
Junaid Malik,
Serkan Kiranyaz,
Riyadh Al-Raoush,
Olivier Monga,
Patricia Garnier,
Sebti Foufou,
Abdelaziz Bouras,
Alexandros Iosifidis,
Moncef Gabbouj,
Philippe C. Baveye
Abstract:
Binary segmentation of volumetric images of porous media is a crucial step towards gaining a deeper understanding of the factors governing biogeochemical processes at minute scales. Contemporary work primarily revolves around primitive techniques based on global or local adaptive thresholding that have known common drawbacks in image segmentation. Moreover, absence of a unified benchmark prohibits…
▽ More
Binary segmentation of volumetric images of porous media is a crucial step towards gaining a deeper understanding of the factors governing biogeochemical processes at minute scales. Contemporary work primarily revolves around primitive techniques based on global or local adaptive thresholding that have known common drawbacks in image segmentation. Moreover, absence of a unified benchmark prohibits quantitative evaluation, which further clouds the impact of existing methodologies. In this study, we tackle the issue on both fronts. Firstly, by drawing parallels with natural image segmentation, we propose a novel, and automatic segmentation technique, 3D Quantum Cuts (QCuts-3D) grounded on a state-of-the-art spectral clustering technique. Secondly, we curate and present a publicly available dataset of 68 multiphase volumetric images of porous media with diverse solid geometries, along with voxel-wise ground truth annotations for each constituting phase. We provide comparative evaluations between QCuts-3D and the current state-of-the-art over this dataset across a variety of evaluation metrics. The proposed systematic approach achieves a 26% increase in AUROC while achieving a substantial reduction of the computational complexity of the state-of-the-art competitors. Moreover, statistical analysis reveals that the proposed method exhibits significant robustness against the compositional variations of porous media.
△ Less
Submitted 10 April, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.