From Q-Tables to Neural Nets – A DQN Roulette Agent

Can deep reinforcement learning discover patterns that simpler strategies miss?

This is the final technical post in my roulette simulation series.

Start from the beginning: The House Always Wins? I Taught 3 AI Agents to Beat Roulette Anyway
See the full project code: GitHub – rossautomatedsolutions/roulette-simulator

After testing Q-learning and risk-aware agents, I wanted to push further.

Instead of using a table to map state-action values, I trained a neural network to approximate that mapping. This approach is known as Deep Q-Learning (DQN).

While roulette is a simple game, this was a valuable experiment in scaling reinforcement learning — especially as the state space grows more complex.

What Changed

1. State Representation

Instead of discrete buckets, I passed a 5-dimensional continuous state vector into the neural network:

Normalized bankroll (0.0 to ~2.0)
Drawdown from peak
Win streak (scaled)
Loss streak (scaled)
Last win (1.0 for win, 0.0 for loss, 0.5 for unknown)

This allows much finer representation of game conditions.

2. Action Space

Same as before:

Outside bets: red, black, even, odd, high, low
Straight number bets: 0–36
Fixed $20 per bet

3. DQN Framework

PyTorch model with 2 hidden layers
Experience replay buffer
Epsilon-greedy exploration
Target network updates
MSE loss on predicted Q-values

Training

The agent trained over 1000 episodes, each capped at 1000 spins or until bankroll depletion. Reward shaping was similar to prior agents:

Reward = win/loss payout
Penalties for drawdowns and losing streaks
Bonuses for growth and longevity

Evaluation (100 Test Runs)

Strategy	Avg Reward	Median Reward	Avg Spins	% Profitable
DQN Agent	-$608	-$4,270	433	~16%*
Flat Betting	-$626	-$1,120	775	8%

*Approximate — profitability varied across runs.

Takeaways

The DQN agent performed comparably to Q-learning but was far more flexible
It learned to survive longer and occasionally go on strong profit runs
It still did not reliably beat the house (as expected — the house edge is real)
Results improved when training time increased or feature engineering improved

Conclusion: What Worked, What Didn’t

Over the course of this series, I tested:

Classic systems (Flat, Martingale, Reverse)
Tabular Q-learning
Risk-aware Q-learning
Deep Q-learning (DQN)

None of them beat roulette long-term, but I learned a lot:

Simple systems fail predictably
Learning agents adapt, but need time and structure
Risk management and reward shaping deeply affect agent behavior
A small house edge is still incredibly powerful over time

What’s Next?

This was never about “winning at roulette” — it was a sandbox to explore:

Agent design
Simulation strategy
Reward modeling
Visual storytelling

If anything, it proved how subtle and devastating the house edge is — and how even sophisticated agents struggle to survive in truly negative-sum environments.

But this framework could easily be extended to more complex, less random games (e.g., options trading, portfolio selection, or blackjack strategy modeling).

Sharing

All Post
Articles
Blog Post
General Business Automation
Portfolio
Stock Market & Finance

All rights are reserved.

From Q-Tables to Neural Nets – A DQN Roulette Agent

What Changed

1. State Representation

2. Action Space

3. DQN Framework

Training

Evaluation (100 Test Runs)

Takeaways

Conclusion: What Worked, What Didn’t

What’s Next?

Categories

Sharing

Related Articles

Deep Reinforcement Learning for Blackjack

Card Counting & Adaptive Betting

Scaling Up to Casino Blackjack