Search | arXiv e-print repository

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Authors: Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami, Toru Namerikawa

Abstract: Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of… ▽ More Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator. △ Less

Submitted 20 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Accepted by the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD) 2022. The Version of Record is available at https://doi.org/10.1007/978-3-031-26412-2_9

arXiv:2107.12632 [pdf, other]

doi 10.1088/1748-0221/16/08/P08054

INTPIX4NA -- new integration-type silicon-on-insulator pixel detector for imaging application

Authors: R. Nishimura, S. Kishimoto, T. Sasaki, S. Mitsui, M. Shinya, Y. Arai, T. Miyoshi

Abstract: INTPIX4NA is an integration-type silicon-on-insulator pixel detector. This detector has a 14.1 x 8.7 mm^2 sensitive area, 425,984 (832 column x 512 row matrix) pixels and the pixel size is 17 x 17 um^2. This detector was developed for residual stress measurement using X-rays (the cos alpha method). The performance of INTPIX4NA was tested with the synchrotron beamlines of the Photon Factory (KEK),… ▽ More INTPIX4NA is an integration-type silicon-on-insulator pixel detector. This detector has a 14.1 x 8.7 mm^2 sensitive area, 425,984 (832 column x 512 row matrix) pixels and the pixel size is 17 x 17 um^2. This detector was developed for residual stress measurement using X-rays (the cos alpha method). The performance of INTPIX4NA was tested with the synchrotron beamlines of the Photon Factory (KEK), and the following results were obtained. The modulation transfer function, the index of the spatial resolution, was more than 50% at the Nyquist frequency (29.4 cycle/mm). The energy resolution analyzed from the collected charge counts is 35.3%--46.2% at 5.415 keV, 21.7%--35.6% at 8 keV, and 15.7%--19.4% at 12 keV. The X-ray signal can be separated from the noise even at a low energy of 5.415 keV at room temperature (approximately 25--27 degree Celsius). The maximum frame rate at which the signal quality can be maintained is 153 fps in the current measurement system. These results satisfy the required performance in the air and at room temperature (approximately 25--27 degree Celsius) condition that is assumed for the environment of the residual stress measurement. △ Less

Submitted 14 January, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: Accepted for publication at JINST (2022/01/14 Typo correction ver.)

Journal ref: 2021 JINST 16 P08054

arXiv:2103.03808 [pdf, ps, other]

doi 10.1080/18824889.2023.2278753

Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator

Authors: Mei Minami, Yuka Masumoto, Yoshihiro Okawa, Tomotake Sasaki, Yutaka Hori

Abstract: In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesi… ▽ More In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL usually requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design approach that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed approach improves the transient learning performance and efficiency in hyperparameter tuning of RL. △ Less

Submitted 30 November, 2023; v1 submitted 5 March, 2021; originally announced March 2021.

Journal ref: SICE Journal of Control, Measurement, and System Integration, vol. 16, no. 1, pp. 349--362, 2023

arXiv:2103.03656 [pdf, ps, other]

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

Authors: Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane

Abstract: In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration… ▽ More In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. Specifically, our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value as well as adjusts the variance-covariance matrix used in the Gaussian policy for exploration. We also show that our exploration process adjustment method theoretically guarantees the satisfaction of the constraints with the pre-specified probability, that is, the satisfaction of a joint chance constraint at every time. Finally, we illustrate the validity and the effectiveness of our method through numerical simulation. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: Accepted to the 21st IFAC World Congress (IFAC-V 2020)

arXiv:1812.05501 [pdf, other]

doi 10.7566/JPSJ.88.044003

Bayesian Spectral Deconvolution Based on Poisson Distribution: Bayesian Measurement and Virtual Measurement Analytics (VMA)

Authors: Kenji Nagata, Yoh-ichi Mototake, Rei Muraoka, Takehiko Sasaki, Masato Okada

Abstract: In this paper, we propose a new method of Bayesian measurement for spectral deconvolution, which regresses spectral data into the sum of unimodal basis function such as Gaussian or Lorentzian functions. Bayesian measurement is a framework for considering not only the target physical model but also the measurement model as a probabilistic model, and enables us to estimate the parameter of a physica… ▽ More In this paper, we propose a new method of Bayesian measurement for spectral deconvolution, which regresses spectral data into the sum of unimodal basis function such as Gaussian or Lorentzian functions. Bayesian measurement is a framework for considering not only the target physical model but also the measurement model as a probabilistic model, and enables us to estimate the parameter of a physical model with its confidence interval through a Bayesian posterior distribution given a measurement data set. The measurement with Poisson noise is one of the most effective system to apply our proposed method. Since the measurement time is strongly related to the signal-to-noise ratio for the Poisson noise model, Bayesian measurement with Poisson noise model enables us to clarify the relationship between the measurement time and the limit of estimation. In this study, we establish the probabilistic model with Poisson noise for spectral deconvolution. Bayesian measurement enables us to perform virtual and computer simulation for a certain measurement through the established probabilistic model. This property is called "Virtual Measurement Analytics(VMA)" in this paper. We also show that the relationship between the measurement time and the limit of estimation can be extracted by using the proposed method in a simulation of synthetic data and real data for XPS measurement of MoS$_2$. △ Less

Submitted 11 December, 2018; originally announced December 2018.

Comments: 8 pages, 8 figures

Showing 1–5 of 5 results for author: Sasaki, T