Risk-Sensitive Portfolio Management by Using C51 Algorithm

Journal article

Authors/Editors

WARASINEE CHAISANGMONGKON

Strategic Research Themes

Publication Details

Author list: Harnpadungkij, Thammasorn; Chaisangmongkon, Warasinee; Phunchongharn, Phond;

Publication year: 2022

Journal: Chiang Mai Journal of Science (0125-2526)

Volume number: 49

Issue number: 5

Start page: 1458

End page: 1482

Number of pages: 25

ISSN: 0125-2526

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85138979901&doi=10.12982%2fCMJS.2022.094&partnerID=40&md5=cffc72503363ff224b65a5510e351332

Languages: English-Great Britain (EN-GB)

View in Web of Science | View on publisher site | View citing articles in Web of Science

Abstract

Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profit but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fixed length of previous returns. This work proposes a new approach to deal with the profit-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a significantly higher Sharpe ratio and lower maximum drawdown without sacrificing profit compared to the C51algorithm utilizing a purely profit-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training configuration. We find that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no significant impact on performance. Our study provides statistical evidence of the efficiency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process. © 2022, Chiang Mai University. All rights reserved.

Keywords

algorithmic trading