A Reinforcement Learning Approach to Bitcoin Trading: Proximal Policy Optimization with Trend-Following and Risk-Aware Reward Design

Victor Vladareanu

doi:10.47738/jdmdc.v3i2.64

PDF

Published: May 22, 2026

DOI: https://doi.org/10.47738/jdmdc.v3i2.64

Keywords:

Reinforcement Learning

Citation Analysis:

👤 Victor Vladareanu

🏢 Robotics and Mechatronics Department, Institute of Solid Mechanics of the Romanian Academy, Bucharest, Romania

This study proposes a reinforcement learning based trading strategy for Bitcoin using Proximal Policy Optimization with a trend following and risk aware reward design. The model is developed within a custom trading environment that incorporates multiple technical indicators, including trend, momentum, and volatility features, to capture market dynamics. A continuous action space is employed to enable flexible portfolio allocation between cash and Bitcoin, allowing the agent to learn dynamic position sizing rather than discrete buy or sell decisions. The reward function is designed to encourage profit generation while penalizing excessive risk, trading activity, and drawdowns. The proposed model is evaluated on historical Bitcoin data and compared with a Buy and Hold baseline using metrics such as total return, Sharpe ratio, maximum drawdown, trading frequency, and transaction costs. The results show that while the PPO strategy does not outperform Buy and Hold in terms of total return, it achieves superior risk adjusted performance with a higher Sharpe ratio and more stable portfolio growth. However, the model exhibits high trading frequency, leading to increased transaction costs that reduce overall profitability. These findings demonstrate that reinforcement learning offers a promising approach for developing adaptive and risk sensitive trading strategies, although further improvements are required to enhance trading efficiency and cost management.

[1]

V. Vladareanu, “A Reinforcement Learning Approach to Bitcoin Trading: Proximal Policy Optimization with Trend-Following and Risk-Aware Reward Design”, J. Digit. Mark. Digit. Curr., vol. 3, no. 2, pp. 131–143, May 2026.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Issue

Vol. 3 No. 2 (2026): Regular Issue June 2026

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Journal Metrics
Acceptance Rate	51%
Review Speed	45 days
Issue Per Year	4
Number of Volume	3
Number of Issues	8
Number of Articles	35
Number of Reviewers	62
Number of Contributor	77
Contributing Countries	14
No. of WoS Citations	35
No. of Scopus Citations	165
No. of Google Citations	311
Abstract Views	27,206 views
PDF Download	18,927

Tools

Reference Manager
Plagiarism Checker
Grammar Assistant

3048-0981 (Online)
Organizer / Collaboration	:	Faculty of Economics Universitas Negeri Jakarta, Indonesia
Published by	:	Bright Publisher
Website	:	jdmdc.com
Mailing Address	:	Graha Permata Estate, Jl. HM Bahrun Blok H9, Sokayasa, Berkoh, Kec. Purwokerto Tim., Kabupaten Banyumas, Jawa Tengah 53146
Email	:	dwisugianto@outlook.com (principal contact)
		editor@jdmdc.com (managing editor)

Article Sidebar

Main Article Content

Article Details