Skip to main content

Showing 1–16 of 16 results for author: Ahmadian, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02552  [pdf, other

    cs.CL cs.AI cs.LG

    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

    Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2406.19188  [pdf, other

    cs.LG

    Averaging log-likelihoods in direct alignment

    Authors: Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist

    Abstract: To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involvin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.19185  [pdf, other

    cs.LG

    Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

    Authors: Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist

    Abstract: Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.18682  [pdf, other

    cs.CL cs.AI cs.LG

    The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

    Authors: Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

    Abstract: A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.01660  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Improving Robust Preference Optimization

    Authors: Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad Gheshlaghi Azar

    Abstract: Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimizati… ▽ More

    Submitted 7 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2402.14740  [pdf, other

    cs.LG

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Authors: Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

    Abstract: AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that mos… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 27 pages, 7 figures, 2 tables

    ACM Class: I.2.7

  7. arXiv:2401.05987  [pdf, ps, other

    cs.SE

    Reconstruction as a service: a data space for off-site image reconstruction in magnetic particle imaging

    Authors: Anselm von Gladiss, Amir Shayan Ahmadian, Jan Jürjens

    Abstract: Magnetic particle imaging (MPI) is an emerging medical imaging modality which offers a unique combination of high temporal and spatial resolution, sensitivity and biocompatibility. For system-matrix (SM) based image reconstruction in MPI, a huge amount of calibration data needs to be acquired prior to reconstruction in a time-consuming procedure. Conventionally, the data is recorded on-site inside… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  8. arXiv:2312.10441  [pdf, ps, other

    cs.CR

    Disjunctive Policies for Database-Backed Programs

    Authors: Amir M. Ahmadian, Matvey Soloviev, Musard Balliu

    Abstract: When specifying security policies for databases, it is often natural to formulate disjunctive dependencies, where a piece of information may depend on at most one of two dependencies P1 or P2, but not both. A formal semantic model of such disjunctive dependencies, the Quantale of Information, was recently introduced by Hunt and Sands as a generalization of the Lattice of Information. In this paper… ▽ More

    Submitted 26 April, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: 21 pages, including references and appendix. Extended version of paper accepted to CSF 2024

  9. arXiv:2309.05444  [pdf, other

    cs.CL cs.LG

    Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

    Authors: Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker

    Abstract: The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architectur… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  10. arXiv:2306.17366  [pdf, other

    cs.LG cs.AI

    $λ$-models: Effective Decision-Aware Reinforcement Learning with Latent Models

    Authors: Claas A Voelcker, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand

    Abstract: The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study… ▽ More

    Submitted 29 February, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  11. arXiv:2305.19268  [pdf, other

    cs.LG cs.AI

    Intriguing Properties of Quantization at Scale

    Authors: Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

    Abstract: Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 32 pages, 14 figures

  12. arXiv:2301.01286  [pdf, other

    cs.LG eess.IV

    Pseudo-Inverted Bottleneck Convolution for DARTS Search Space

    Authors: Arash Ahmadian, Louis S. P. Liu, Yue Fei, Konstantinos N. Plataniotis, Mahdi S. Hosseini

    Abstract: Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-desig… ▽ More

    Submitted 18 March, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

    Comments: 5 pages

  13. arXiv:2210.00418  [pdf

    cs.LG cs.NE

    Subspace Learning for Feature Selection via Rank Revealing QR Factorization: Unsupervised and Hybrid Approaches with Non-negative Matrix Factorization and Evolutionary Algorithm

    Authors: Amir Moslemi, Arash Ahmadian

    Abstract: The selection of most informative and discriminative features from high-dimensional data has been noticed as an important topic in machine learning and data engineering. Using matrix factorization-based techniques such as nonnegative matrix factorization for feature selection has emerged as a hot topic in feature selection. The main goal of feature selection using matrix factorization is to extrac… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    Comments: 34 pages, 10 figures, 4 tables

    MSC Class: 68T05

  14. arXiv:1809.10830  [pdf, ps, other

    cs.IT

    Throughput Optimization in FDD MU-MISO Wireless Powered Communication Networks

    Authors: Arman Ahmadian

    Abstract: In this paper, we consider a frequency-division duplexing (FDD) multiple-user multiple-input-single-output (MU-MISO) wireless-powered communication network (WPCN) consisting of one hybrid data-and-energy access point (HAP) with multiple antennas which coordinates energy/information transfer to/from several single-antenna wireless devices (WD). Typically, in such a system, wireless energy transfer… ▽ More

    Submitted 2 October, 2018; v1 submitted 27 September, 2018; originally announced September 2018.

  15. arXiv:1807.05670  [pdf

    cs.IT

    Wireless Powered Communication Networks: TDD or FDD?

    Authors: Arman Ahmadian, Hyuncheol Park

    Abstract: In this paper, we compare two common modes of duplexing in wireless powered communication networks (WPCN); namely TDD and FDD. So far, TDD has been the most widely used duplexing technique due to its simplicity. Yet, TDD does not allow the energy transmitter to function continuously, which means to deliver the same amount of energy as that in FDD, the transmitter has to have a higher maximum trans… ▽ More

    Submitted 16 July, 2018; originally announced July 2018.

  16. arXiv:1807.05543  [pdf, ps, other

    cs.IT

    Maximizing Ergodic Throughput in Wireless Powered Communication Networks

    Authors: Arman Ahmadian, Hyuncheol Park

    Abstract: This paper considers a single-antenna wirelesspowered communication network (WPCN) over a flat-fading channel. We show that, by using our probabilistic harvestand-transmit (PHAT) strategy, which requires the knowledge of instantaneous full channel state information (CSI) and fading probability distribution, the ergodic throughput of this system may be greatly increased relative to that achieved by… ▽ More

    Submitted 15 July, 2018; originally announced July 2018.