Skip to main content

Showing 1–13 of 13 results for author: Achab, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04291  [pdf, other

    cs.LG

    Investigating Regularization of Self-Play Language Models

    Authors: Reda Alami, Abdalgader Abubaker, Mastane Achab, Mohamed El Amine Seddik, Salem Lahlou

    Abstract: This paper explores the effects of various forms of regularization in the context of language model alignment via self-play. While both reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) require to collect costly human-annotated pairwise preferences, the self-play fine-tuning (SPIN) approach replaces the rejected answers by data generated from the previous i… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  2. arXiv:2310.19821  [pdf, other

    cs.LG stat.ML

    A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

    Authors: Reda Alami, Mohammed Mahfoud, Mastane Achab

    Abstract: In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environment-specific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  3. arXiv:2310.03767  [pdf, other

    cs.LG cs.NI eess.SP

    Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study

    Authors: Fouzi Boukhalfa, Reda Alami, Mastane Achab, Eric Moulines, Mehdi Bennis

    Abstract: In today's era, autonomous vehicles demand a safety level on par with aircraft. Taking a cue from the aerospace industry, which relies on redundancy to achieve high reliability, the automotive sector can also leverage this concept by building redundancy in V2X (Vehicle-to-Everything) technologies. Given the current lack of reliable V2X technologies, this idea is particularly promising. By deployin… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  4. arXiv:2309.15298  [pdf, other

    math.OC cs.LG

    Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization

    Authors: Mastane Achab

    Abstract: This paper extends the classic theory of convex optimization to the minimization of functions that are equal to the negated logarithm of what we term as a sum-log-concave function, i.e., a sum of log-concave functions. In particular, we show that such functions are in general not convex but still satisfy generalized convexity inequalities. These inequalities unveil the key importance of a certain… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  5. arXiv:2305.19992  [pdf, ps, other

    stat.ML cs.LG

    A Nested Matrix-Tensor Model for Noisy Multi-view Clustering

    Authors: Mohamed El Amine Seddik, Mastane Achab, Henrique Goulart, Merouane Debbah

    Abstract: In this paper, we propose a nested matrix-tensor model which extends the spiked rank-one tensor model of order three. This model is particularly motivated by a multi-view clustering problem in which multiple noisy observations of each data point are acquired, with potentially non-uniform variances along the views. In this case, data can be naturally represented by an order-three tensor where the v… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  6. arXiv:2304.14421  [pdf, other

    cs.LG stat.ML

    One-Step Distributional Reinforcement Learning

    Authors: Mastane Achab, Reda Alami, Yasser Abdelaziz Dahou Djilali, Kirill Fedyanin, Eric Moulines

    Abstract: Reinforcement learning (RL) allows an agent interacting sequentially with an environment to maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the agent goes beyond the limit of the expected value, to capture the underlying probability distribution of the return across all time steps. The set of DistrRL algorithms has led to improved empirical performance. Neverth… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  7. arXiv:2112.15430  [pdf, other

    cs.LG cs.AI math.OC

    Robustness and risk management via distributional dynamic programming

    Authors: Mastane Achab, Gergely Neu

    Abstract: In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distributional reinforcement learning (DRL), the focus is on the whole distribution of the return, not just its expectation. Although DRL-based methods produ… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  8. arXiv:2002.05145  [pdf, other

    stat.ML cs.LG

    Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

    Authors: Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier

    Abstract: We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as $P$ and dominates it. In the unrealistic case where the likelihood ratio $Φ(z)=dP/dP'(z)$ is known, one may… ▽ More

    Submitted 19 February, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: 20 pages, 7 tables and figures

  9. arXiv:1810.06291  [pdf, other

    stat.ML cs.LG

    Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach

    Authors: Mastane Achab, Anna Korba, Stephan Clémençon

    Abstract: Whereas most dimensionality reduction techniques (e.g. PCA, ICA, NMF) for multivariate data essentially rely on linear algebra to a certain extent, summarizing ranking data, viewed as realizations of a random permutation $Σ$ on a set of items indexed by $i\in \{1,\ldots,\; n\}$, is a great statistical challenge, due to the absence of vector space structure for the set of permutations… ▽ More

    Submitted 30 August, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

  10. arXiv:1805.02908  [pdf, other

    stat.ML cs.LG

    Profitable Bandits

    Authors: Mastane Achab, Stephan Clémençon, Aurélien Garivier

    Abstract: Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K possible actions. For each action chosen, she then receives the sum of a random number of rewards. Her objective is to maximize her cumulated earnings. We adapt and study three well-known strategie… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

  11. arXiv:1707.08820  [pdf, other

    stat.ML cs.LG

    Max K-armed bandit: On the ExtremeHunter algorithm and beyond

    Authors: Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade

    Abstract: This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold. We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014), and next propose an alternative approach, showing that, remarkably, Extreme Bandits can be reduc… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

  12. arXiv:1607.06333  [pdf, other

    stat.ML cs.LG

    Uncovering Causality from Multivariate Hawkes Integrated Cumulants

    Authors: Massil Achab, Emmanuel Bacry, Stéphane Gaïffas, Iacopo Mastromatteo, Jean-Francois Muzy

    Abstract: We design a new nonparametric method that allows one to estimate the matrix of integrated kernels of a multivariate Hawkes process. This matrix not only encodes the mutual influences of each nodes of the process, but also disentangles the causality relationships between them. Our approach is the first that leads to an estimation of this matrix without any parametric modeling and estimation of the… ▽ More

    Submitted 29 May, 2017; v1 submitted 21 July, 2016; originally announced July 2016.

  13. arXiv:1510.04822  [pdf, other

    stat.ML cs.LG

    SGD with Variance Reduction beyond Empirical Risk Minimization

    Authors: Massil Achab, Agathe Guilloux, Stéphane Gaïffas, Emmanuel Bacry

    Abstract: We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model used in survival analysis), but our algorithm can be used in different settings as well. The propos… ▽ More

    Submitted 8 November, 2016; v1 submitted 16 October, 2015; originally announced October 2015.

    Comments: 17 pages