Skip to main content

Showing 1–24 of 24 results for author: Hsieh, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18655  [pdf, other

    cs.LG cs.AI q-bio.GN

    CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

    Authors: **-Han Hsieh, Ru-Xiu Hsiao, Katalin Ferenc, Anthony Mathelier, Rebekka Burkholz, Chien-Yu Chen, Geir Kjetil Sandve, Tatiana Belova, Marieke Lydia Kuijjer

    Abstract: Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modaliti… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.16194  [pdf, other

    cs.LG cs.AI cs.RO

    Diffusion-Reward Adversarial Imitation Learning

    Authors: Chun-Mao Lai, Hsiang-Chun Wang, **-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun

    Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despit… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  3. arXiv:2403.18270  [pdf, other

    cs.CV eess.IV

    Image Deraining via Self-supervised Reinforcement Learning

    Authors: He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, **-Chun Hsieh, Chung-Chi Tsai

    Abstract: The quality of images captured outdoors is often affected by the weather. One factor that interferes with sight is rain, which can obstruct the view of observers and computer vision applications that rely on those images. The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain streak pixels from… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  4. arXiv:2403.12406  [pdf, other

    cs.AI cs.LG

    Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion

    Authors: Kuang-Da Wang, Wei-Yao Wang, **-Chun Hsieh, Wen-Chih Peng

    Abstract: In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Preprint

  5. arXiv:2312.12065  [pdf, other

    cs.LG cs.AI

    PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clip**

    Authors: Nai-Chieh Huang, **-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu

    Abstract: Proximal Policy Optimization algorithm employing a clipped surrogate objective (PPO-Clip) is a prominent exemplar of the policy optimization methods. However, despite its remarkable empirical success, PPO-Clip lacks theoretical substantiation to date. In this paper, we contribute to the field by establishing the first global convergence results of a PPO-Clip variant in both tabular and neural func… ▽ More

    Submitted 19 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  6. arXiv:2310.11897  [pdf, other

    cs.LG

    Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

    Authors: Yen-Ju Chen, Nai-Chieh Huang, Ching-Pei Lee, **-Chun Hsieh

    Abstract: Various acceleration approaches for Policy Gradient (PG) have been analyzed within the realm of Reinforcement Learning (RL). However, the theoretical understanding of the widely used momentum-based acceleration method on PG remains largely open. In response to this gap, we adapt the celebrated Nesterov's accelerated gradient (NAG) method to policy optimization in RL, termed \textit{Accelerated Pol… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 69 pages, 17 figures

  7. arXiv:2310.11515  [pdf, ps, other

    cs.LG

    Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs

    Authors: Yu-Heng Hung, **-Chun Hsieh, Akshay Mete, P. R. Kumar

    Abstract: We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature map**. While the existing regression-based approaches have been theoretically shown to achieve nearly-optimal regret, they are computationally rather inefficient due to the need for a… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  8. arXiv:2309.15484  [pdf, other

    cs.AI

    Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games

    Authors: Kuo-Hao Ho, **-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu

    Abstract: In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength. While deep reinforcement learning agents have recently achieved superhuman performance in various video games, some of these unconstrained agents may exhibit actions, such as shaking and spinning, that are not typically observed in… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  9. arXiv:2212.05237  [pdf, other

    cs.LG

    Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

    Authors: Hsin-En Su, Yen-Ju Chen, **-Chun Hsieh, Xi Liu

    Abstract: We revisit the domain of off-policy policy optimization in RL from the perspective of coordinate ascent. One commonly-used approach is to leverage the off-policy policy gradient to optimize a surrogate objective -- the total discounted in expectation return of the target policy with respect to the state distribution of the behavior policy. However, this approach has been shown to suffer from the d… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

    Comments: 47 pages, 4 figures

  10. arXiv:2212.03117  [pdf, ps, other

    cs.LG

    Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

    Authors: Wei Hung, Bo-Kai Huang, **-Chun Hsieh, Xi Liu

    Abstract: Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives. However, the existing MORL methods either rely on multiple passes of explicit search for finding the Pareto front and therefore are not sample-efficient, or u… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 17 pages, 15 figures

  11. arXiv:2209.13210  [pdf, other

    eess.IV cs.CV

    Neural Frank-Wolfe Policy Optimization for Region-of-Interest Intra-Frame Coding with HEVC/H.265

    Authors: Yung-Han Ho, Chia-Hao Kao, Wen-Hsiao Peng, **-Chun Hsieh

    Abstract: This paper presents a reinforcement learning (RL) framework that utilizes Frank-Wolfe policy optimization to solve Coding-Tree-Unit (CTU) bit allocation for Region-of-Interest (ROI) intra-frame coding. Most previous RL-based methods employ the single-critic design, where the rewards for distortion minimization and rate regularization are weighted by an empirically chosen hyper-parameter. Recently,… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by VCIP 2022. arXiv admin note: text overlap with arXiv:2203.05127

  12. arXiv:2203.04192  [pdf, ps, other

    cs.LG stat.ML

    Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

    Authors: Yu-Heng Hung, **-Chun Hsieh

    Abstract: Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the stochastic contextual bandit problem with general bounded reward functions and proposes NeuralRBMLE, which adapts the RBMLE principle by adding a bias term to the log-likelihood to enforce exploration. NeuralRBMLE leverages th… ▽ More

    Submitted 29 May, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

  13. arXiv:2110.13799  [pdf, other

    cs.LG

    Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective

    Authors: Nai-Chieh Huang, **-Chun Hsieh, Kuo-Hao Ho, Hsuan-Yu Yao, Kai-Chun Hu, Liang-Chun Ouyang, I-Chen Wu

    Abstract: Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness. Despite its superior empirical performance, PPO-Clip has not been justified via theoretical p… ▽ More

    Submitted 31 August, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: 33 pages, 1 figure

  14. arXiv:2110.02128  [pdf, other

    cs.LG stat.ML

    NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

    Authors: Khaled Nakhleh, Santosh Ganji, **-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai

    Abstract: Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless ba… ▽ More

    Submitted 19 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in NeurIPS 2021

  15. arXiv:2106.04335  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

    Authors: Bing-**g Hsieh, **-Chun Hsieh, Xi Liu

    Abstract: Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide vari… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: 21 pages, 8 figures

  16. arXiv:2102.11055  [pdf, ps, other

    cs.LG

    Esca** from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

    Authors: Jyun-Li Lin, Wei Hung, Shang-Hsuan Yang, **-Chun Hsieh, Xi Liu

    Abstract: Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints. While the existing projection-based approaches ensure zero constraint violation, they could suffer from the zero-gradient problem due to the tight coupling of the policy grad… ▽ More

    Submitted 2 August, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: Published in UAI 2021

  17. arXiv:2010.04091  [pdf, ps, other

    cs.LG stat.ML

    Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

    Authors: Yu-Heng Hung, **-Chun Hsieh, Xi Liu, P. R. Kumar

    Abstract: Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

  18. arXiv:2001.09595  [pdf, other

    cs.LG cs.IR stat.ML

    Develo** Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning

    Authors: Xi Liu, Li Li, **-Chun Hsieh, Muhe Xie, Yong Ge, Rui Chen

    Abstract: With the explosive growth of online products and content, recommendation techniques have been considered as an effective tool to overcome information overload, improve user experience, and boost business revenue. In recent years, we have observed a new desideratum of considering long-term rewards of multiple related recommendation tasks simultaneously. The consideration of long-term rewards is str… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  19. arXiv:1911.00902  [pdf, other

    cs.NI

    Fresher Content or Smoother Playback? A Brownian-Approximation Framework for Scheduling Real-Time Wireless Video Streams

    Authors: **-Chun Hsieh, Xi Liu, I-Hong Hou

    Abstract: This paper presents a Brownian-approximation framework to optimize the quality of experience (QoE) for real-time video streaming in wireless networks. In real-time video streaming, one major challenge is to tackle the natural tension between the two most critical QoE metrics: playback latency and video interruption. To study this trade-off, we first propose an analytical model that precisely captu… ▽ More

    Submitted 23 October, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

    Comments: MobiHoc 2020

  20. arXiv:1907.01287  [pdf, ps, other

    cs.LG stat.ML

    Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits

    Authors: Xi Liu, **-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar

    Abstract: Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE -- a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the parametric Exponential Family as well as the non-parametric sub-Gaussian/Exponential family, we show that RBMLE yields an index policy. To choose the bias-growth rate… ▽ More

    Submitted 23 October, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: ICML 2020

  21. arXiv:1811.05932  [pdf, ps, other

    cs.LG cs.SI stat.ML

    Streaming Network Embedding through Local Actions

    Authors: Xi Liu, **-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, Xidao Wen

    Abstract: Recently, considerable research attention has been paid to network embedding, a popular approach to construct feature vectors of vertices. Due to the curse of dimensionality and sparsity in graphical datasets, this approach has become indispensable for machine learning tasks over large networks. The majority of existing literature has considered this technique under the assumption that the network… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

  22. arXiv:1810.12418  [pdf, ps, other

    cs.LG stat.ML

    Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

    Authors: **-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar

    Abstract: Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a "reneging" phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedast… ▽ More

    Submitted 15 May, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: To appear in ICML 2019. More rounds of experiments are performed before being taken average of compared to versions before

  23. arXiv:1701.03991  [pdf, other

    cs.NI

    Throughput-Optimal Scheduling for Multi-Hop Networked Transportation Systems With Switch-Over Delay

    Authors: **-Chun Hsieh, Xi Liu, Jian Jiao, I-Hong Hou, Yunlong Zhang, P. R. Kumar

    Abstract: The emerging connected-vehicle technology provides a new dimension in develo** more intelligent traffic control algorithms for signalized intersections in networked transportation systems. An important challenge for the scheduling problem in networked transportation systems is the switch-over delay caused by the guard time before any traffic signal change. The switch-over delay can result in sig… ▽ More

    Submitted 18 January, 2017; v1 submitted 14 January, 2017; originally announced January 2017.

    Comments: 16 pages, 6 figures

  24. arXiv:1701.03831  [pdf, ps, other

    cs.PF cs.NI

    Delay-Optimal Scheduling for Queueing Systems with Switching Overhead

    Authors: **-Chun Hsieh, I-Hong Hou, Xi Liu

    Abstract: We study the scheduling polices for asymptotically optimal delay in queueing systems with switching overhead. Such systems consist of a single server that serves multiple queues, and some capacity is lost whenever the server switches to serve a different set of queues. The capacity loss due to this switching overhead can be significant in many emerging applications, and needs to be explicitly addr… ▽ More

    Submitted 13 January, 2017; originally announced January 2017.

    Comments: 37 pages

    MSC Class: 60K25; 68M20; 90B22