A Reinforcement Learning Approach to Bitcoin Trading: Proximal Policy Optimization with Trend-Following and Risk-Aware Reward Design
Main Article Content
This study proposes a reinforcement learning based trading strategy for Bitcoin using Proximal Policy Optimization with a trend following and risk aware reward design. The model is developed within a custom trading environment that incorporates multiple technical indicators, including trend, momentum, and volatility features, to capture market dynamics. A continuous action space is employed to enable flexible portfolio allocation between cash and Bitcoin, allowing the agent to learn dynamic position sizing rather than discrete buy or sell decisions. The reward function is designed to encourage profit generation while penalizing excessive risk, trading activity, and drawdowns. The proposed model is evaluated on historical Bitcoin data and compared with a Buy and Hold baseline using metrics such as total return, Sharpe ratio, maximum drawdown, trading frequency, and transaction costs. The results show that while the PPO strategy does not outperform Buy and Hold in terms of total return, it achieves superior risk adjusted performance with a higher Sharpe ratio and more stable portfolio growth. However, the model exhibits high trading frequency, leading to increased transaction costs that reduce overall profitability. These findings demonstrate that reinforcement learning offers a promising approach for developing adaptive and risk sensitive trading strategies, although further improvements are required to enhance trading efficiency and cost management.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.