Can deep reinforcement learning discover patterns that simpler strategies miss?
This is the final technical post in my roulette simulation series.
- Start from the beginning: The House Always Wins? I Taught 3 AI Agents to Beat Roulette Anyway
- See the full project code: GitHub – rossautomatedsolutions/roulette-simulator
After testing Q-learning and risk-aware agents, I wanted to push further.
Instead of using a table to map state-action values, I trained a neural network to approximate that mapping. This approach is known as Deep Q-Learning (DQN).
While roulette is a simple game, this was a valuable experiment in scaling reinforcement learning — especially as the state space grows more complex.
What Changed
1. State Representation
Instead of discrete buckets, I passed a 5-dimensional continuous state vector into the neural network:
- Normalized bankroll (0.0 to ~2.0)
- Drawdown from peak
- Win streak (scaled)
- Loss streak (scaled)
- Last win (1.0 for win, 0.0 for loss, 0.5 for unknown)
This allows much finer representation of game conditions.
2. Action Space
Same as before:
- Outside bets: red, black, even, odd, high, low
- Straight number bets: 0–36
- Fixed $20 per bet
3. DQN Framework
- PyTorch model with 2 hidden layers
- Experience replay buffer
- Epsilon-greedy exploration
- Target network updates
- MSE loss on predicted Q-values
Training
The agent trained over 1000 episodes, each capped at 1000 spins or until bankroll depletion. Reward shaping was similar to prior agents:
- Reward = win/loss payout
- Penalties for drawdowns and losing streaks
- Bonuses for growth and longevity
Evaluation (100 Test Runs)
| Strategy | Avg Reward | Median Reward | Avg Spins | % Profitable |
|---|---|---|---|---|
| DQN Agent | -$608 | -$4,270 | 433 | ~16%* |
| Flat Betting | -$626 | -$1,120 | 775 | 8% |
*Approximate — profitability varied across runs.
Takeaways
- The DQN agent performed comparably to Q-learning but was far more flexible
- It learned to survive longer and occasionally go on strong profit runs
- It still did not reliably beat the house (as expected — the house edge is real)
- Results improved when training time increased or feature engineering improved
Conclusion: What Worked, What Didn’t
Over the course of this series, I tested:
- Classic systems (Flat, Martingale, Reverse)
- Tabular Q-learning
- Risk-aware Q-learning
- Deep Q-learning (DQN)
None of them beat roulette long-term, but I learned a lot:
- Simple systems fail predictably
- Learning agents adapt, but need time and structure
- Risk management and reward shaping deeply affect agent behavior
- A small house edge is still incredibly powerful over time
What’s Next?
This was never about “winning at roulette” — it was a sandbox to explore:
- Agent design
- Simulation strategy
- Reward modeling
- Visual storytelling
If anything, it proved how subtle and devastating the house edge is — and how even sophisticated agents struggle to survive in truly negative-sum environments.
But this framework could easily be extended to more complex, less random games (e.g., options trading, portfolio selection, or blackjack strategy modeling).