Search | arXiv e-print repository

Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning

Authors: Mohammadreza Nakhaei, Aidan Scannell, Joni Pajarinen

Abstract: Offline reinforcement learning (RL) allows learning sequential behavior from fixed datasets. Since offline datasets do not cover all possible situations, many methods collect additional data during online fine-tuning to improve performance. In general, these methods assume that the transition dynamics remain the same during both the offline and online phases of training. However, in many real-worl… ▽ More Offline reinforcement learning (RL) allows learning sequential behavior from fixed datasets. Since offline datasets do not cover all possible situations, many methods collect additional data during online fine-tuning to improve performance. In general, these methods assume that the transition dynamics remain the same during both the offline and online phases of training. However, in many real-world applications, such as outdoor construction and navigation over rough terrain, it is common for the transition dynamics to vary between the offline and online phases. Moreover, the dynamics may vary during the online fine-tuning. To address this problem of changing dynamics from offline to online RL we propose a residual learning approach that infers dynamics changes to correct the outputs of the offline solution. At the online fine-tuning phase, we train a context encoder to learn a representation that is consistent inside the current online learning environment while being able to predict dynamic transitions. Experiments in D4RL MuJoCo environments, modified to support dynamics' changes upon environment resets, show that our approach can adapt to these dynamic changes and generalize to unseen perturbations in a sample-efficient way, whilst comparison methods cannot. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figures, 1 table. Accepted at L4DC 2024

arXiv:2406.02696 [pdf, other]

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Authors: Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen

Abstract: Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent rep… ▽ More Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 9 pages, 11 figures

arXiv:1803.01373 [pdf, ps, other]

A Successive Optimization Approach to Pilot Design for Multi-Cell Massive MIMO Systems

Authors: Hayder Al-Salihi, Trinh Van Chien, Tuan Anh Le, Mohammad Reza Nakhai

Abstract: In this letter, we introduce a novel pilot design approach that minimizes the total mean square errors of the minimum mean square error estimators of all base stations (BSs) subject to the transmit power constraints of individual users in the network, while tackling the pilot contamination in multi-cell Massive MIMO systems. First, we decompose the original non-convex problem into distributed opti… ▽ More In this letter, we introduce a novel pilot design approach that minimizes the total mean square errors of the minimum mean square error estimators of all base stations (BSs) subject to the transmit power constraints of individual users in the network, while tackling the pilot contamination in multi-cell Massive MIMO systems. First, we decompose the original non-convex problem into distributed optimization sub-problems at individual BSs, where each BS can optimize its own pilot signals given the knowledge of pilot signals from the remaining BSs. We then introduce a successive optimization approach to transform each optimization sub-problem into a linear matrix inequality (LMI) form, which is convex and can be solved by available optimization packages. Simulation results confirm the fast convergence of the proposed approach and prevails a benchmark scheme in terms of providing higher accuracy. △ Less

Submitted 4 March, 2018; originally announced March 2018.

Comments: Accepted, IEEE Communications Letters 2018

arXiv:1103.4406 [pdf, other]

Interference Alignment with Partially Coordinated Transmit Precoding

Authors: Aimal Khan Yousafzai, Mohammad Reza Nakhai

Abstract: In this paper, we introduce an efficient interference alignment (IA) algorithm exploiting partially coordinated transmit precoding to improve the number of concurrent interference-free transmissions, i.e., the multiplexing gain, in multicell downlink. The proposed coordination model is such that each base-station simultaneously transmits to two users and each user is served by two base-stations. F… ▽ More In this paper, we introduce an efficient interference alignment (IA) algorithm exploiting partially coordinated transmit precoding to improve the number of concurrent interference-free transmissions, i.e., the multiplexing gain, in multicell downlink. The proposed coordination model is such that each base-station simultaneously transmits to two users and each user is served by two base-stations. First, we show in a K-user system operating at the information theoretic upper bound of degrees of freedom (DOF), the generic IA is proper when $K \leq 3$, whereas the proposed partially coordinated IA is proper when $K \leq 5$. Then, we derive a non-iterative, i.e., one shot, IA algorithm for the proposed scheme when $K \leq 5$. We show that for a given latency, the backhaul data rate requirement of the proposed method grows linearly with K. Monte-Carlo simulation results show that the proposed one-shot algorithm offers higher system throughput than the iterative IA at practical SNR levels. △ Less

Submitted 22 March, 2011; originally announced March 2011.

Comments: 19 pages, 8 figures

Showing 1–4 of 4 results for author: Nakhaei, M