This is the first post in a 3-part series where I explore how rule-based logic compares to reinforcement learning (RL) when building trading strategies. We’ll start simple by testing a handful of models on AAPL — one of the most traded stocks in the world — using two years of daily price data from Yahoo Finance.
You’ll see how different models behave under the same conditions, and we’ll use metrics like Sharpe ratio, drawdown, and total return to evaluate each approach.
In Part II, we scale these models across the entire S&P 500 to see which strategies consistently outperform.
In Part III, we build an interactive Streamlit app to explore and compare results.
Strategy Overview
Model # | Type | Description |
---|---|---|
Model 0 | Rule-Based | Buy when price > 10-day SMA, sell otherwise |
Model 1 | RL | SMA + Volume as state inputs |
Model 2 | RL | SMA + Day of Week |
Model 3 | RL | SMA + Volume + Day of Week |
Model 4 | RL | SMA only (Q-learning) |
Model 5 | Rule-Based | Buy if 3-day return < -3% (Mean Reversion) |
Model 6 | RL | Uses 1d/3d return, volatility, and RSI |
What is Q-Learning?
Q-learning is a form of Reinforcement Learning where an agent interacts with an environment by taking actions, observing rewards, and updating a Q-table — a memory structure that tells it the best action to take in each situation. Over time, it learns which trades lead to positive outcomes and which don’t — without being explicitly told how to trade.
Backtest Results on AAPL
Here’s how each model performed over the two-year backtest:
Model | Total Return ($) | Sharpe Ratio | Max Drawdown (%) | # Trades |
---|---|---|---|---|
Model0 (Rule SMA) | 63.74 | 0.2373 | -15.11 | 75 |
Model1 (RL) | 53.27 | 0.0299 | -51.25 | 33 |
Model2 (RL) | 46.51 | -0.0897 | -72.98 | 37 |
Model3 (RL) | 16.38 | -0.0613 | -70.97 | 37 |
Model4 (RL) | 15.06 | -0.0279 | -54.21 | 18 |
Model5 (Rule Mean Reversion) | -0.49 | -0.0045 | -20.21 | 44 |
Model6 (RL Mean Reversion) | -3.08 | -0.0086 | -11.59 | 34 |
Key Takeaways
- Model 0 (SMA Rule) still outperforms in both return and Sharpe ratio — simple and effective.
- Model 1 (RL) shows some promise with solid returns, though risk is high.
- Model 6 (RL Mean Reversion) didn’t top the return chart, but had the lowest drawdown, highlighting its potential for building risk-adjusted strategies.
- Most RL models struggled without proper feature tuning, often underperforming rules.
Final Thoughts
Reinforcement Learning isn’t a silver bullet. Without careful design and feature engineering, it can easily underperform basic logic. That said, some RL models (like Model 1 and Model 6) show potential when tuned correctly — especially for managing risk and volatility.
In Part II, we’ll scale this same logic across all 500 stocks in the S&P — and uncover which models rise to the top.
GitHub Link to notebook: GitHub