Rule-Based vs Reinforcement Learning: Which Strategy Performs Best on AAPL?

This is the first post in a 3-part series where I explore how rule-based logic compares to reinforcement learning (RL) when building trading strategies. We’ll start simple by testing a handful of models on AAPL — one of the most traded stocks in the world — using two years of daily price data from Yahoo Finance.

You’ll see how different models behave under the same conditions, and we’ll use metrics like Sharpe ratio, drawdown, and total return to evaluate each approach.

In Part II, we scale these models across the entire S&P 500 to see which strategies consistently outperform.

In Part III, we build an interactive Streamlit app to explore and compare results.


Strategy Overview

Model #TypeDescription
Model 0Rule-BasedBuy when price > 10-day SMA, sell otherwise
Model 1RLSMA + Volume as state inputs
Model 2RLSMA + Day of Week
Model 3RLSMA + Volume + Day of Week
Model 4RLSMA only (Q-learning)
Model 5Rule-BasedBuy if 3-day return < -3% (Mean Reversion)
Model 6RLUses 1d/3d return, volatility, and RSI

What is Q-Learning?

Q-learning is a form of Reinforcement Learning where an agent interacts with an environment by taking actions, observing rewards, and updating a Q-table — a memory structure that tells it the best action to take in each situation. Over time, it learns which trades lead to positive outcomes and which don’t — without being explicitly told how to trade.


Backtest Results on AAPL

Here’s how each model performed over the two-year backtest:

ModelTotal Return ($)Sharpe RatioMax Drawdown (%)# Trades
Model0 (Rule SMA)63.740.2373-15.1175
Model1 (RL)53.270.0299-51.2533
Model2 (RL)46.51-0.0897-72.9837
Model3 (RL)16.38-0.0613-70.9737
Model4 (RL)15.06-0.0279-54.2118
Model5 (Rule Mean Reversion)-0.49-0.0045-20.2144
Model6 (RL Mean Reversion)-3.08-0.0086-11.5934

Key Takeaways

  • Model 0 (SMA Rule) still outperforms in both return and Sharpe ratio — simple and effective.
  • Model 1 (RL) shows some promise with solid returns, though risk is high.
  • Model 6 (RL Mean Reversion) didn’t top the return chart, but had the lowest drawdown, highlighting its potential for building risk-adjusted strategies.
  • Most RL models struggled without proper feature tuning, often underperforming rules.

Final Thoughts

Reinforcement Learning isn’t a silver bullet. Without careful design and feature engineering, it can easily underperform basic logic. That said, some RL models (like Model 1 and Model 6) show potential when tuned correctly — especially for managing risk and volatility.

In Part II, we’ll scale this same logic across all 500 stocks in the S&P — and uncover which models rise to the top.

GitHub Link to notebook: GitHub

Sharing

Related Articles

  • All Post
  • Articles
  • Blog Post
  • General Business Automation
  • Portfolio
  • Stock Market & Finance