Risk-Sensitive Portfolio Management by Using C51 Algorithm

บทความในวารสาร

ผู้เขียน/บรรณาธิการ

วราสิณี ฉายแสงมงคล

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Harnpadungkij, Thammasorn; Chaisangmongkon, Warasinee; Phunchongharn, Phond;

ปีที่เผยแพร่ (ค.ศ.): 2022

วารสาร: Chiang Mai Journal of Science (0125-2526)

Volume number: 49

Issue number: 5

หน้าแรก: 1458

หน้าสุดท้าย: 1482

จำนวนหน้า: 25

นอก: 0125-2526

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85138979901&doi=10.12982%2fCMJS.2022.094&partnerID=40&md5=cffc72503363ff224b65a5510e351332

ภาษา: English-Great Britain (EN-GB)

ดูในเว็บของวิทยาศาสตร์ | ดูบนเว็บไซต์ของสำนักพิมพ์ | บทความในเว็บของวิทยาศาสตร์

บทคัดย่อ

Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profit but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fixed length of previous returns. This work proposes a new approach to deal with the profit-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a significantly higher Sharpe ratio and lower maximum drawdown without sacrificing profit compared to the C51algorithm utilizing a purely profit-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training configuration. We find that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no significant impact on performance. Our study provides statistical evidence of the efficiency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process. © 2022, Chiang Mai University. All rights reserved.

คำสำคัญ

algorithmic trading