-
Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems
Authors:
Hyungjun Park,
Daiki Min,
Jong-hyun Ryu,
Dong Gu Choi
Abstract:
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First…
▽ More
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.
△ Less
Submitted 19 May, 2021; v1 submitted 21 November, 2020;
originally announced November 2020.
-
Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid
Authors:
Hyungjun Park,
Daiki Min,
Jong-hyun Ryu,
Dong Gu Choi
Abstract:
A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as…
▽ More
A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as the ability to tackle and resolve the delayed effective property of the microgrid, which prevents learning agents from deriving a well-fitted policy under sophisticated controls. The proposed technique tracks the history of the charging period and retroactively assigns an adjusted value to the ESS charging control. The operation policy derived using the proposed approach is well-fitted for the real effects of ESS operation because of the process of the technique. Therefore, it supports the search for a near-optimal operation policy under a sophisticatedly controlled microgrid environment. To validate our technique, we simulate the operation policy under a real-world grid-connected microgrid system and demonstrate the convergence to a near-optimal policy by comparing performance measures of our policy with benchmark policy and optimal policy.
△ Less
Submitted 20 October, 2020; v1 submitted 30 June, 2020;
originally announced June 2020.
-
An intelligent financial portfolio trading strategy using deep Q-learning
Authors:
Hyungjun Park,
Min Kyu Sim,
Dong Gu Choi
Abstract:
Portfolio traders strive to identify dynamic portfolio allocation schemes so that their total budgets are efficiently allocated through the investment horizon. This study proposes a novel portfolio trading strategy in which an intelligent agent is trained to identify an optimal trading action by using deep Q-learning. We formulate a Markov decision process model for the portfolio trading process,…
▽ More
Portfolio traders strive to identify dynamic portfolio allocation schemes so that their total budgets are efficiently allocated through the investment horizon. This study proposes a novel portfolio trading strategy in which an intelligent agent is trained to identify an optimal trading action by using deep Q-learning. We formulate a Markov decision process model for the portfolio trading process, and the model adopts a discrete combinatorial action space, determining the trading direction at prespecified trading size for each asset, to ensure practical applicability. Our novel portfolio trading strategy takes advantage of three features to outperform in real-world trading. First, a map** function is devised to handle and transform an initially found but infeasible action into a feasible action closest to the originally proposed ideal action. Second, by overcoming the dimensionality problem, this study establishes models of agent and Q-network for deriving a multi-asset trading strategy in the predefined action space. Last, this study introduces a technique that has the advantage of deriving a well-fitted multi-asset trading strategy by designing an agent to simulate all feasible actions in each state. To validate our approach, we conduct backtests for two representative portfolios and demonstrate superior results over the benchmark strategies.
△ Less
Submitted 28 November, 2019; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Modelling the Intrusive feelings of advanced driver assistance systems based on vehicle activity log data: a case study for the lane kee** assistance system
Authors:
Kyudong Park,
Jiyoung Kwahk,
Sung H. Han,
Minseok Song,
Dong Gu Choi,
Hyeji Jang,
Dohyeon Kim,
Young Deok Won,
In Sub Jeong
Abstract:
Although the automotive industry has been among the sectors that best-understands the importance of drivers' affect, the focus of design and research in the automotive field has long emphasized the visceral aspects of exterior and interior design. With the adoption of Advanced Driver Assistance Systems (ADAS), endowing 'semi-autonomy' to the vehicles, however, the scope of affective design should…
▽ More
Although the automotive industry has been among the sectors that best-understands the importance of drivers' affect, the focus of design and research in the automotive field has long emphasized the visceral aspects of exterior and interior design. With the adoption of Advanced Driver Assistance Systems (ADAS), endowing 'semi-autonomy' to the vehicles, however, the scope of affective design should be expanded to include the behavioural aspects of the vehicle. In such a 'shared-control' system wherein the vehicle can intervene in the human driver's operations, a certain degree of 'intrusive feelings' are unavoidable. For example, when the Lane Kee** Assistance System (LKAS), one of the most popular examples of ADAS, operates the steering wheel in a dangerous situation, the driver may feel interrupted or surprised because of the abrupt torque generated by LKAS. This kind of unpleasant experience can lead to prolonged negative feelings such as irritation, anxiety, and distrust of the system. Therefore, there are increasing needs of investigating the driver's affective responses towards the vehicle's dynamic behaviour. In this study, four types of intrusive feelings caused by LKAS were identified to be proposed as a quantitative performance indicator in designing the affectively satisfactory behaviour of LKAS. A metric as well as a statistical data analysis method to quantitatively measure the intrusive feelings through the vehicle sensor log data.
△ Less
Submitted 30 April, 2019; v1 submitted 18 September, 2018;
originally announced September 2018.