Skip to main content

Showing 1–15 of 15 results for author: Cayci, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04163  [pdf, ps, other

    math.OC cs.LG eess.SY

    Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

    Authors: Johannes Müller, Semih Cayci

    Abstract: We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. We provide a lower bound matching our upper bound up to a polynomial factor. Our proof relies on the correspon… ▽ More

    Submitted 25 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 26 pages, 1 figure

    MSC Class: 37N40; 65K05; 90C05; 90C40; 90C53

  2. arXiv:2405.18221  [pdf, other

    math.OC cs.LG stat.ML

    Recurrent Natural Policy Gradient for POMDPs

    Authors: Semih Cayci, Atilla Eryilmaz

    Abstract: In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning. We present finite-time and finite-width analyses for both the critic (recurrent temporal difference l… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2403.19448  [pdf, other

    math.OC cs.LG eess.SY math.NA stat.ML

    Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients

    Authors: Johannes Müller, Semih Çaycı, Guido Montúfar

    Abstract: Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization. We study another natural gradient method which is based on the Fisher information matrix of the state-action distributions and has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient f… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 27 pages, 4 figures, under review

    MSC Class: 65K05; 90C05; 90C08; 90C40; 90C53

  4. arXiv:2402.12241  [pdf, other

    cs.LG math.OC stat.ML

    Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

    Authors: Semih Cayci, Atilla Eryilmaz

    Abstract: We analyze recurrent neural networks trained with gradient descent in the supervised learning setting for dynamical systems, and prove that gradient descent can achieve optimality \emph{without} massive overparameterization. Our in-depth nonasymptotic analysis (i) provides sharp bounds on the network size $m$ and iteration complexity $τ$ in terms of the sequence length $T$, sample size $n$ and amb… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  5. arXiv:2306.11455  [pdf, other

    cs.LG math.OC stat.ML

    Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

    Authors: Semih Cayci, Atilla Eryilmaz

    Abstract: In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In such instances, the existing RL methods may fail miserably due to frequent statistical outliers. In this work, we establish that temporal difference (TD) learni… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  6. arXiv:2212.14449  [pdf, ps, other

    math.OC cs.GT cs.LG stat.ML

    Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

    Authors: Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He

    Abstract: Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on… ▽ More

    Submitted 9 June, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: Accepted for publication at ICML 2023

  7. arXiv:2206.00833  [pdf, ps, other

    cs.LG math.OC stat.ML

    Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm

    Authors: Semih Cayci, Niao He, R. Srikant

    Abstract: Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces. In this paper, we present a finite-time analysis of NAC with neural network approximation, and identify the roles of neural networks, regularization and optimization techniques (e.g., grad… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  8. arXiv:2202.09753  [pdf, ps, other

    cs.LG math.OC stat.ML

    Finite-Time Analysis of Natural Actor-Critic for POMDPs

    Authors: Semih Cayci, Niao He, R. Srikant

    Abstract: We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal differ… ▽ More

    Submitted 19 July, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

  9. arXiv:2106.05165  [pdf, other

    cs.LG math.OC stat.ML

    A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

    Authors: Semih Cayci, Yilin Zheng, Atilla Eryilmaz

    Abstract: In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each action, and a stochastic feasibility constraint that may impose important operational limitations on decision-making. In this work, we consider a general… ▽ More

    Submitted 23 January, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  10. arXiv:2106.04096  [pdf, other

    cs.LG math.OC stat.ML

    Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

    Authors: Semih Cayci, Niao He, R. Srikant

    Abstract: Natural policy gradient (NPG) methods with entropy regularization achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linea… ▽ More

    Submitted 8 February, 2024; v1 submitted 8 June, 2021; originally announced June 2021.

  11. arXiv:2103.01391  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Sample Complexity and Overparameterization Bounds for Temporal Difference Learning with Neural Network Approximation

    Authors: Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

    Abstract: In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}. We consider two practically used algorithms, projection-free and max-norm regularized Neural TD learning, and establish the first convergence bounds for these algorithms. An interesting observation from our result… ▽ More

    Submitted 5 August, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  12. arXiv:2007.00081  [pdf, other

    cs.LG math.OC stat.ML

    Continuous-Time Multi-Armed Bandits with Controlled Restarts

    Authors: Semih Cayci, Atilla Eryilmaz, R. Srikant

    Abstract: Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science. Recently, restart strategies have gained significant attention for boosting the efficiency of time-constrained processes by expediting the completion times. In this work, we investigate the bandit problem with controlled restarts for time-constrained decision processe… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

  13. arXiv:2006.06852  [pdf, other

    cs.LG stat.ML

    Group-Fair Online Allocation in Continuous Time

    Authors: Semih Cayci, Swati Gupta, Atilla Eryilmaz

    Abstract: The theory of discrete-time online learning has been successfully applied in many problems that involve sequential decision-making under uncertainty. However, in many applications including contractual hiring in online freelancing platforms and server allocation in cloud computing systems, the outcome of each action is observed only after a random and action-dependent time. Furthermore, as a conse… ▽ More

    Submitted 23 July, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Corrected figure captions. Added references

  14. arXiv:2003.00365  [pdf, other

    cs.LG stat.ML

    Budget-Constrained Bandits over General Cost and Reward Distributions

    Authors: Semih Cayci, Atilla Eryilmaz, R. Srikant

    Abstract: We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

  15. arXiv:1811.10829  [pdf, other

    cs.NI cs.LG

    Optimal Learning for Dynamic Coding in Deadline-Constrained Multi-Channel Networks

    Authors: Semih Cayci, Atilla Eryilmaz

    Abstract: We study the problem of serving randomly arriving and delay-sensitive traffic over a multi-channel communication system with time-varying channel states and unknown statistics. This problem deviates from the classical exploration-exploitation setting in that the design and analysis must accommodate the dynamics of packet availability and urgency as well as the cost of each channel use at the time… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: Submitted to IEEE/ACM Transactions on Networking