Skip to main content

Showing 1–26 of 26 results for author: Farahmand, A

.
  1. arXiv:2407.08803  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    PID Accelerated Temporal Difference Algorithms

    Authors: Mark Bedaywi, Amin Rakhsha, Amir-massoud Farahmand

    Abstract: Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2406.17718  [pdf, other

    cs.LG

    When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

    Authors: Claas Voelcker, Tyler Kastner, Igor Gilitschenski, Amir-massoud Farahmand

    Abstract: We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation learning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2403.05996  [pdf, other

    cs.LG cs.AI

    Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence

    Authors: Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton

    Abstract: We show that deep reinforcement learning can maintain its ability to learn without resetting network parameters in settings where the number of gradient updates greatly exceeds the number of environment samples. Under such large update-to-data ratios, a recent study by Nikishin et al. (2022) suggested the emergence of a primacy bias, in which agents overfit early interactions and downplay later ex… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  4. arXiv:2311.18495  [pdf, other

    cs.LG cs.CV

    Improving Adversarial Transferability via Model Alignment

    Authors: Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, **dong Gu

    Abstract: Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measu… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  5. arXiv:2311.17855  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    Maximum Entropy Model Correction in Reinforcement Learning

    Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand

    Abstract: We propose and theoretically analyze an approach for planning with an approximate model in reinforcement learning that can reduce the adverse impact of model error. If the model is accurate enough, it accelerates the convergence to the true value function too. One of its key components is the MaxEnt Model Correction (MoCo) procedure that corrects the model's next-state distributions based on a Max… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  6. arXiv:2308.06703  [pdf, other

    cs.LG

    Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

    Authors: Avery Ma, Yangchen Pan, Amir-massoud Farahmand

    Abstract: Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation de… ▽ More

    Submitted 28 November, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

    Comments: Accepted at TMLR (Featured Certification). Code: see https://github.com/averyma/opt-robust

  7. arXiv:2307.08507  [pdf, other

    cs.LG

    Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

    Authors: Mete Kemertas, Allan D. Jepson, Amir-massoud Farahmand

    Abstract: We design a novel algorithm for optimal transport by drawing from the entropic optimal transport, mirror descent and conjugate gradients literatures. Our scalable and GPU parallelizable algorithm is able to compute the Wasserstein distance with extreme precision, reaching relative error rates of $10^{-8}$ without numerical stability issues. Empirically, the algorithm converges to high precision so… ▽ More

    Submitted 31 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  8. arXiv:2307.01708  [pdf, other

    cs.LG cs.AI

    Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

    Authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand

    Abstract: We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equiva… ▽ More

    Submitted 3 December, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

  9. arXiv:2306.17366  [pdf, other

    cs.LG cs.AI

    $λ$-models: Effective Decision-Aware Reinforcement Learning with Latent Models

    Authors: Claas A Voelcker, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand

    Abstract: The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study… ▽ More

    Submitted 29 February, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  10. arXiv:2211.13937  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    Operator Splitting Value Iteration

    Authors: Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand

    Abstract: We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS2022

  11. arXiv:2204.01464  [pdf, other

    cs.LG cs.AI

    Value Gradient weighted Model-Based Reinforcement Learning

    Authors: Claas Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand

    Abstract: Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the in… ▽ More

    Submitted 20 June, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

  12. arXiv:2110.11265  [pdf, other

    cs.LG math.DS

    Deep Reinforcement Learning for Online Control of Stochastic Partial Differential Equations

    Authors: Erfan Pirmorad, Faraz Khoshbakhtian, Farnam Mansouri, Amir-massoud Farahmand

    Abstract: In many areas, such as the physical sciences, life sciences, and finance, control approaches are used to achieve a desired goal in complex dynamical systems governed by differential equations. In this work we formulate the problem of controlling stochastic partial differential equations (SPDE) as a reinforcement learning problem. We present a learning-based, distributed control approach for online… ▽ More

    Submitted 8 December, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

  13. arXiv:2010.01753  [pdf, other

    cs.LG cs.AI

    The act of remembering: a study in partially observable reinforcement learning

    Authors: Rodrigo Toro Icarte, Richard Valenzano, Toryn Q. Klassen, Phillip Christoffersen, Amir-massoud Farahmand, Sheila A. McIlraith

    Abstract: Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observ… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

  14. arXiv:2007.09569  [pdf, other

    cs.AI cs.LG

    Understanding and Mitigating the Limitations of Prioritized Experience Replay

    Authors: Yangchen Pan, **cheng Mei, Amir-massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

    Abstract: Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations. In this work, we take a deep look at the prioritized ER. In a supervised learning setting, we show the equivalence between the error-based prioriti… ▽ More

    Submitted 11 June, 2022; v1 submitted 18 July, 2020; originally announced July 2020.

    Comments: Accepted to UAI2022

  15. arXiv:2004.01832  [pdf, ps, other

    cs.LG stat.ML

    SOAR: Second-Order Adversarial Regularization

    Authors: Avery Ma, Fartash Faghri, Nicolas Papernot, Amir-massoud Farahmand

    Abstract: Adversarial training is a common approach to improving the robustness of deep neural networks against adversarial examples. In this work, we propose a novel regularization approach as an alternative. To derive the regularizer, we formulate the adversarial robustness problem under the robust optimization framework and approximate the loss function using a second-order Taylor series expansion. Our p… ▽ More

    Submitted 7 February, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

  16. arXiv:2003.00030  [pdf, other

    cs.AI

    Policy-Aware Model Learning for Policy Gradient Methods

    Authors: Romina Abachi, Mohammad Ghavamzadeh, Amir-massoud Farahmand

    Abstract: This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that… ▽ More

    Submitted 3 January, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

  17. arXiv:2002.06195  [pdf, other

    stat.ML cs.LG

    An implicit function learning approach for parametric modal regression

    Authors: Yangchen Pan, Ehsan Imani, Martha White, Amir-massoud Farahmand

    Abstract: For multi-valued functions---such as when the conditional distribution on targets given the inputs is multi-modal---standard regression approaches are not always desirable because they provide the conditional mean. Modal regression algorithms address this issue by instead finding the conditional mode(s). Most, however, are nonparametric approaches and so can be difficult to scale. Further, paramet… ▽ More

    Submitted 29 October, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to NeurIPS 2020

  18. arXiv:2002.05822  [pdf, other

    cs.LG cs.AI stat.ML

    Frequency-based Search-control in Dyna

    Authors: Yangchen Pan, **cheng Mei, Amir-massoud Farahmand

    Abstract: Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to ICLR 2020

  19. arXiv:1906.07791  [pdf, other

    cs.LG cs.AI stat.ML

    Hill Climbing on Value Estimates for Search-control in Dyna

    Authors: Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White

    Abstract: Dyna is an architecture for model-based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search-control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtaine… ▽ More

    Submitted 4 July, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: IJCAI 2019

  20. arXiv:1903.03495  [pdf, other

    cs.AI

    Improving Skin Condition Classification with a Visual Symptom Checker Trained using Reinforcement Learning

    Authors: Mohamed Akrout, Amir-massoud Farahmand, Tory Jarmain, Latif Abid

    Abstract: We present a visual symptom checker that combines a pre-trained Convolutional Neural Network (CNN) with a Reinforcement Learning (RL) agent as a Question Answering (QA) model. This method increases the classification confidence and accuracy of the visual symptom checker, and decreases the average number of questions asked to narrow down the differential diagnosis. A Deep Q-Network (DQN)-based RL a… ▽ More

    Submitted 7 August, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

    Comments: Accepted for the Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2019

  21. arXiv:1811.06165  [pdf, other

    cs.CV

    Improving Skin Condition Classification with a Question Answering Model

    Authors: Mohamed Akrout, Amir-massoud Farahmand, Tory Jarmain

    Abstract: We present a skin condition classification methodology based on a sequential pipeline of a pre-trained Convolutional Neural Network (CNN) and a Question Answering (QA) model. This method enables us to not only increase the classification confidence and accuracy of the deployed CNN system, but also enables the emulation of the conventional approach of doctors asking the relevant questions in refini… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Journal ref: Medical Imaging meets NeurIPS Workshop (2018)

  22. arXiv:1806.06931  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

    Authors: Yangchen Pan, Amir-massoud Farahmand, Martha White, Saleh Nabi, Piyush Grover, Daniel Nikovski

    Abstract: Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, wh… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: ICML2018

  23. arXiv:1702.01478  [pdf, other

    cs.CV

    Attentional Network for Visual Object Detection

    Authors: Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-massoud Farahmand

    Abstract: We propose augmenting deep neural networks with an attention mechanism for the visual object detection task. As perceiving a scene, humans have the capability of multiple fixation points, each attended to scene content at different locations and scales. However, such a mechanism is missing in the current state-of-the-art visual object detection methods. Inspired by the human vision system, we prop… ▽ More

    Submitted 5 February, 2017; originally announced February 2017.

  24. arXiv:1509.07860  [pdf, ps, other

    eess.SY

    Learning-Based Modular Indirect Adaptive Control for a Class of Nonlinear Systems

    Authors: Mouhacine Benosman, Amir-massoud Farahmand, Meng Xia

    Abstract: We study in this paper the problem of adaptive trajectory tracking control for a class of nonlinear systems with parametric uncertainties. We propose to use a modular approach, where we first design a robust nonlinear state feedback which renders the closed loop input-to-state stable (ISS), where the input is considered to be the estimation error of the uncertain parameters, and the state is consi… ▽ More

    Submitted 25 September, 2015; originally announced September 2015.

    Comments: arXiv admin note: text overlap with arXiv:1507.05120

  25. arXiv:1407.0449  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

    Authors: Amir-massoud Farahmand, Doina Precup, André M. S. Barreto, Mohammad Ghavamzadeh

    Abstract: Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a… ▽ More

    Submitted 1 July, 2014; originally announced July 2014.

    MSC Class: 68T05 (Primary); 93E35; 93E20; 90C40; 49L20 (Secondary) ACM Class: I.2.6; I.2.8

  26. arXiv:1207.5554  [pdf, other

    cs.LG stat.ML

    Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

    Authors: Mahdi Milani Fard, Yuri Grinberg, Amir-massoud Farahmand, Joelle Pineau, Doina Precup

    Abstract: We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We… ▽ More

    Submitted 21 September, 2012; v1 submitted 23 July, 2012; originally announced July 2012.