Search | arXiv e-print repository

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art research transfer to a multilingual setting. In this work, we perform an exhaustive study to achieve a new state-of-the-art in aligning multilingual LLMs. We introduce a novel, scalable method for generating high-quality multilingual feedback data to balance data coverage. We establish the benefits of cross-lingual transfer and increased dataset size in preference training. Our preference-trained model achieves a 54.4% win-rate against Aya 23 8B, the current state-of-the-art multilingual LLM in its parameter class, and a 69.5% win-rate or higher against widely used models like Gemma-1.1-7B-it, Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3. As a result of our study, we expand the frontier of alignment techniques to 23 languages covering half of the world's population. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.19188 [pdf, other]

Averaging log-likelihoods in direct alignment

Authors: Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist

Abstract: To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involvin… ▽ More To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involving the log-likelihood of (dis)preferred completions according to the trained model. However, completions have various lengths, and the log-likelihood is not length-invariant. On the other side, the cross-entropy loss used in supervised training is length-invariant, as batches are typically averaged token-wise. To reconcile these approaches, we introduce a principled approach for making direct alignment length-invariant. Formally, we introduce a new averaging operator, to be composed with the optimality operator giving the best policy for the underlying RL problem. It translates into averaging the log-likelihood within the loss. We empirically study the effect of such averaging, observing a trade-off between the length of generations and their scores. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19185 [pdf, other]

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Authors: Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist

Abstract: Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-… ▽ More Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-based ones are not the only rewards of interest for LLMs (eg., unit tests for code generation or textual entailment for summarization, among others). RL-finetuning is usually done with a variation of policy gradient, which calls for on-policy or near-on-policy samples, requiring costly generations. We introduce Contrastive Policy Gradient, or CoPG, a simple and mathematically principled new RL algorithm that can estimate the optimal policy even from off-policy data. It can be seen as an off-policy policy gradient approach that does not rely on important sampling techniques and highlights the importance of using (the right) state baseline. We show this approach to generalize the direct alignment method IPO (identity preference optimization) and classic policy gradient. We experiment with the proposed CoPG on a toy bandit problem to illustrate its properties, as well as for finetuning LLMs on a summarization task, using a learned reward function considered as ground truth for the purpose of the experiments. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18682 [pdf, other]

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

Authors: Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

Abstract: A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches… ▽ More A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches when balancing dual objectives: addressing and optimizing for a non-homogeneous set of languages and cultural preferences while minimizing both global and local harms. We collect the first set of human annotated red-teaming prompts in different languages distinguishing between global and local harm, which serve as a laboratory for understanding the reliability of alignment techniques when faced with preference distributions that are non-stationary across geographies and languages. While this setting is seldom covered by the literature to date, which primarily centers on English harm mitigation, it captures real-world interactions with AI systems around the world. We establish a new precedent for state-of-the-art alignment techniques across 6 languages with minimal degradation in general performance. Our work provides important insights into cross-lingual transfer and novel optimization approaches to safeguard AI systems designed to serve global populations. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.01660 [pdf, other]

Self-Improving Robust Preference Optimization

Authors: Eugene Choi, Arash Ahmadian, Matthieu Geist, Oilvier Pietquin, Mohammad Gheshlaghi Azar

Abstract: Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimizati… ▽ More Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimization SRPO, a practical and mathematically principled offline RLHF framework that is completely robust to the changes in the task. The key idea of SRPO is to cast the problem of learning from human preferences as a self-improvement process, which can be mathematically expressed in terms of a min-max objective that aims at joint optimization of self-improvement policy and the generative policy in an adversarial fashion. The solution for this optimization problem is independent of the training task and thus it is robust to its changes. We then show that this objective can be re-expressed in the form of a non-adversarial offline loss which can be optimized using standard supervised optimization techniques at scale without any need for reward model and online inference. We show the effectiveness of SRPO in terms of AI Win-Rate (WR) against human (GOLD) completions. In particular, when SRPO is evaluated on the OOD XSUM dataset, it outperforms the celebrated DPO by a clear margin of 15% after 5 self-revisions, achieving WR of 90%. △ Less

Submitted 7 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2402.14740 [pdf, other]

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Authors: Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Abstract: AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that mos… ▽ More AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as the canonical method for the RL part of RLHF. However, it involves both high computational cost and sensitive hyperparameter tuning. We posit that most of the motivational principles that led to the development of PPO are less of a practical concern in RLHF and advocate for a less computationally expensive method that preserves and even increases performance. We revisit the formulation of alignment from human preferences in the context of RL. Kee** simplicity as a guiding principle, we show that many components of PPO are unnecessary in an RLHF context and that far simpler REINFORCE-style optimization variants outperform both PPO and newly proposed "RL-free" methods such as DPO and RAFT. Our work suggests that careful adaptation to LLMs alignment characteristics enables benefiting from online RL optimization at low cost. △ Less

Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 27 pages, 7 figures, 2 tables

ACM Class: I.2.7

arXiv:2401.05987 [pdf, ps, other]

Reconstruction as a service: a data space for off-site image reconstruction in magnetic particle imaging

Authors: Anselm von Gladiss, Amir Shayan Ahmadian, Jan Jürjens

Abstract: Magnetic particle imaging (MPI) is an emerging medical imaging modality which offers a unique combination of high temporal and spatial resolution, sensitivity and biocompatibility. For system-matrix (SM) based image reconstruction in MPI, a huge amount of calibration data needs to be acquired prior to reconstruction in a time-consuming procedure. Conventionally, the data is recorded on-site inside… ▽ More Magnetic particle imaging (MPI) is an emerging medical imaging modality which offers a unique combination of high temporal and spatial resolution, sensitivity and biocompatibility. For system-matrix (SM) based image reconstruction in MPI, a huge amount of calibration data needs to be acquired prior to reconstruction in a time-consuming procedure. Conventionally, the data is recorded on-site inside the scanning device, which significantly limits the time that the scanning device is available for patient care in a clinical setting. Due to its size, handling the calibration data can be challenging. To solve these issues of recording and handling the data, data spaces could be used, as it has been shown that the calibration data can be measured in dedicated devices off-site. We propose a data space aimed at improving the efficiency of SM-based image reconstruction in MPI. The data space consists of imaging facilities, calibration data providers and reconstruction experts. Its specifications follow the reference architecture model of international data spaces (IDS). Use-cases of image reconstruction in MPI are formulated. The stakeholders and tasks are listed and mapped to the terminology of IDS. The signal chain in MPI is analysed to identify a minimum information model which is used by the data space. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.10441 [pdf, ps, other]

Disjunctive Policies for Database-Backed Programs

Authors: Amir M. Ahmadian, Matvey Soloviev, Musard Balliu

Abstract: When specifying security policies for databases, it is often natural to formulate disjunctive dependencies, where a piece of information may depend on at most one of two dependencies P1 or P2, but not both. A formal semantic model of such disjunctive dependencies, the Quantale of Information, was recently introduced by Hunt and Sands as a generalization of the Lattice of Information. In this paper… ▽ More When specifying security policies for databases, it is often natural to formulate disjunctive dependencies, where a piece of information may depend on at most one of two dependencies P1 or P2, but not both. A formal semantic model of such disjunctive dependencies, the Quantale of Information, was recently introduced by Hunt and Sands as a generalization of the Lattice of Information. In this paper, we seek to contribute to the understanding of disjunctive dependencies in database-backed programs and introduce a practical framework to statically enforce disjunctive security policies. To that end, we introduce the Determinacy Quantale, a new query-based structure which captures the ordering of disjunctive information in databases. This structure can be understood as a query-based counterpart to the Quantale of Information. Based on this structure, we design a sound enforcement mechanism to check disjunctive policies for database-backed programs. This mechanism is based on a type-based analysis for a simple imperative language with database queries, which is precise enough to accommodate a variety of row- and column-level database policies flexibly while kee** track of disjunctions due to control flow. We validate our mechanism by implementing it in a tool, DiVerT, and demonstrate its feasibility on a number of use cases. △ Less

Submitted 26 April, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: 21 pages, including references and appendix. Extended version of paper accepted to CSF 2024

arXiv:2309.05444 [pdf, other]

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Authors: Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker

Abstract: The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architectur… ▽ More The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architecture with lightweight experts.Our MoE architecture outperforms standard parameter-efficient fine-tuning (PEFT) methods and is on par with full fine-tuning by only updating the lightweight experts -- less than 1% of an 11B parameters model. Furthermore, our method generalizes to unseen tasks as it does not depend on any prior task knowledge. Our research underscores the versatility of the mixture of experts architecture, showcasing its ability to deliver robust performance even when subjected to rigorous parameter constraints. Our code used in all the experiments is publicly available here: https://github.com/for-ai/parameter-efficient-moe. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2306.17366 [pdf, other]

$λ$-models: Effective Decision-Aware Reinforcement Learning with Latent Models

Authors: Claas A Voelcker, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand

Abstract: The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study… ▽ More The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works, most importantly the use of a latent model, are vital to achieving good performance for related algorithms. Furthermore, we show that the MuZero loss function is biased in stochastic environments and establish that this bias has practical consequences. Building on these findings, we present an overview of which decision-aware loss functions are best used in what empirical scenarios, providing actionable insights to practitioners in the field. △ Less

Submitted 29 February, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

arXiv:2305.19268 [pdf, other]

Intriguing Properties of Quantization at Scale

Authors: Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

Abstract: Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop… ▽ More Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop of increased research focus on why certain emergent properties surface at scale, this work provides a useful counter-example. We posit that it is possible to optimize for a quantization friendly training recipe that suppresses large activation magnitude outliers. Here, we find that outlier dimensions are not an inherent product of scale, but rather sensitive to the optimization conditions present during pre-training. This both opens up directions for more efficient quantization, and poses the question of whether other emergent properties are inherent or can be altered and conditioned by optimization and architecture design choices. We successfully quantize models ranging in size from 410M to 52B with minimal degradation in performance. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 32 pages, 14 figures

arXiv:2301.01286 [pdf, other]

Pseudo-Inverted Bottleneck Convolution for DARTS Search Space

Authors: Arash Ahmadian, Louis S. P. Liu, Yue Fei, Konstantinos N. Plataniotis, Mahdi S. Hosseini

Abstract: Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-desig… ▽ More Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. We introduce the Pseudo-Inverted Bottleneck Conv (PIBConv) block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower computational footprint (measured in GMACs) and parameter count, GradCAM comparisons show that our network can better detect distinctive features of target objects compared to DARTS. Code is available from https://github.com/mahdihosseini/PIBConv. △ Less

Submitted 18 March, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 5 pages

arXiv:2210.00418 [pdf]

Subspace Learning for Feature Selection via Rank Revealing QR Factorization: Unsupervised and Hybrid Approaches with Non-negative Matrix Factorization and Evolutionary Algorithm

Authors: Amir Moslemi, Arash Ahmadian

Abstract: The selection of most informative and discriminative features from high-dimensional data has been noticed as an important topic in machine learning and data engineering. Using matrix factorization-based techniques such as nonnegative matrix factorization for feature selection has emerged as a hot topic in feature selection. The main goal of feature selection using matrix factorization is to extrac… ▽ More The selection of most informative and discriminative features from high-dimensional data has been noticed as an important topic in machine learning and data engineering. Using matrix factorization-based techniques such as nonnegative matrix factorization for feature selection has emerged as a hot topic in feature selection. The main goal of feature selection using matrix factorization is to extract a subspace which approximates the original space but in a lower dimension. In this study, rank revealing QR (RRQR) factorization, which is computationally cheaper than singular value decomposition (SVD), is leveraged in obtaining the most informative features as a novel unsupervised feature selection technique. This technique uses the permutation matrix of QR for feature selection which is a unique property to this factorization method. Moreover, QR factorization is embedded into non-negative matrix factorization (NMF) objective function as a new unsupervised feature selection method. Lastly, a hybrid feature selection algorithm is proposed by coupling RRQR, as a filter-based technique, and a Genetic algorithm as a wrapper-based technique. In this method, redundant features are removed using RRQR factorization and the most discriminative subset of features are selected using the Genetic algorithm. The proposed algorithm shows to be dependable and robust when compared against state-of-the-art feature selection algorithms in supervised, unsupervised, and semi-supervised settings. All methods are tested on seven available microarray datasets using KNN, SVM and C4.5 classifiers. In terms of evaluation metrics, the experimental results shows that the proposed method is comparable with the state-of-the-art feature selection. △ Less

Submitted 2 October, 2022; originally announced October 2022.

Comments: 34 pages, 10 figures, 4 tables

MSC Class: 68T05

arXiv:1809.10830 [pdf, ps, other]

Throughput Optimization in FDD MU-MISO Wireless Powered Communication Networks

Authors: Arman Ahmadian

Abstract: In this paper, we consider a frequency-division duplexing (FDD) multiple-user multiple-input-single-output (MU-MISO) wireless-powered communication network (WPCN) consisting of one hybrid data-and-energy access point (HAP) with multiple antennas which coordinates energy/information transfer to/from several single-antenna wireless devices (WD). Typically, in such a system, wireless energy transfer… ▽ More In this paper, we consider a frequency-division duplexing (FDD) multiple-user multiple-input-single-output (MU-MISO) wireless-powered communication network (WPCN) consisting of one hybrid data-and-energy access point (HAP) with multiple antennas which coordinates energy/information transfer to/from several single-antenna wireless devices (WD). Typically, in such a system, wireless energy transfer (WET) requires such techniques as energy beamforming (EB) for efficient transfer of energy to the WDs. Yet, efficient EB can only be accomplished if channel state information (CSI) is available to the transmitter, which, in FDD systems is only achieved through uplink (UL) feedback. Therefore, while in our scheme we use the downlink (DL) channels for WET only, the UL channel frames are split into two phases: the CSI feedback phase during which the WDs feed CSI back to the HAP and the WIT phase where the HAP performs wireless information transmission (WIT) via space-division-multiple-access (SDMA). To ensure rate fairness among the WDs, this paper maximizes the minimum WIT data rate among the WDs. Using an iterative solution, the original optimization problem can be relaxed into two sub-problems whose convexity conditions are derived. Finally, the behavior of this system when the number of HAP antennas increases is analyzed. Simulation results verify the truthfulness of our analysis. △ Less

Submitted 2 October, 2018; v1 submitted 27 September, 2018; originally announced September 2018.

arXiv:1807.05670 [pdf]

Wireless Powered Communication Networks: TDD or FDD?

Authors: Arman Ahmadian, Hyuncheol Park

Abstract: In this paper, we compare two common modes of duplexing in wireless powered communication networks (WPCN); namely TDD and FDD. So far, TDD has been the most widely used duplexing technique due to its simplicity. Yet, TDD does not allow the energy transmitter to function continuously, which means to deliver the same amount of energy as that in FDD, the transmitter has to have a higher maximum trans… ▽ More In this paper, we compare two common modes of duplexing in wireless powered communication networks (WPCN); namely TDD and FDD. So far, TDD has been the most widely used duplexing technique due to its simplicity. Yet, TDD does not allow the energy transmitter to function continuously, which means to deliver the same amount of energy as that in FDD, the transmitter has to have a higher maximum transmit power. On the other hand, when regulations for power spectral density limits are not restrictive, using FDD may lead to higher throughput than that of TDD by allocating less bandwidth to energy and therefore leaving more bandwidth for data. Hence, the best duplexing technique to choose for a specific problem needs careful examination and evaluation. △ Less

Submitted 16 July, 2018; originally announced July 2018.

arXiv:1807.05543 [pdf, ps, other]

Maximizing Ergodic Throughput in Wireless Powered Communication Networks

Authors: Arman Ahmadian, Hyuncheol Park

Abstract: This paper considers a single-antenna wirelesspowered communication network (WPCN) over a flat-fading channel. We show that, by using our probabilistic harvestand-transmit (PHAT) strategy, which requires the knowledge of instantaneous full channel state information (CSI) and fading probability distribution, the ergodic throughput of this system may be greatly increased relative to that achieved by… ▽ More This paper considers a single-antenna wirelesspowered communication network (WPCN) over a flat-fading channel. We show that, by using our probabilistic harvestand-transmit (PHAT) strategy, which requires the knowledge of instantaneous full channel state information (CSI) and fading probability distribution, the ergodic throughput of this system may be greatly increased relative to that achieved by the harvestthen-transmit (HTT) protocol. To do so, instead of dividing every frame to the uplink (UL) and downlink (DL), the channel is allocated to the UL wireless information transmission (WIT) and DL wireless power transfer (WPT) based on the estimated channel power gain. In other words, based on the fading probability distribution, we will derive some thresholds that determine the association of a frame to the DL WPT or UL WIT. More specifically, if the channel gain falls below or goes over these thresholds, the channel will be allocated to WPT or WIT. Simulation results verify the performance of our proposed scheme. △ Less

Submitted 15 July, 2018; originally announced July 2018.

Showing 1–16 of 16 results for author: Ahmadian, A