From Q-Tables to Neural Nets – A DQN Roulette Agent

Can deep reinforcement learning discover patterns that simpler strategies miss?

This is the final technical post in my roulette simulation series.


After testing Q-learning and risk-aware agents, I wanted to push further.

Instead of using a table to map state-action values, I trained a neural network to approximate that mapping. This approach is known as Deep Q-Learning (DQN).

While roulette is a simple game, this was a valuable experiment in scaling reinforcement learning — especially as the state space grows more complex.


What Changed

1. State Representation

Instead of discrete buckets, I passed a 5-dimensional continuous state vector into the neural network:

  • Normalized bankroll (0.0 to ~2.0)
  • Drawdown from peak
  • Win streak (scaled)
  • Loss streak (scaled)
  • Last win (1.0 for win, 0.0 for loss, 0.5 for unknown)

This allows much finer representation of game conditions.

2. Action Space

Same as before:

  • Outside bets: red, black, even, odd, high, low
  • Straight number bets: 0–36
  • Fixed $20 per bet

3. DQN Framework

  • PyTorch model with 2 hidden layers
  • Experience replay buffer
  • Epsilon-greedy exploration
  • Target network updates
  • MSE loss on predicted Q-values

Training

The agent trained over 1000 episodes, each capped at 1000 spins or until bankroll depletion. Reward shaping was similar to prior agents:

  • Reward = win/loss payout
  • Penalties for drawdowns and losing streaks
  • Bonuses for growth and longevity

Evaluation (100 Test Runs)

StrategyAvg RewardMedian RewardAvg Spins% Profitable
DQN Agent-$608-$4,270433~16%*
Flat Betting-$626-$1,1207758%

*Approximate — profitability varied across runs.


Takeaways

  • The DQN agent performed comparably to Q-learning but was far more flexible
  • It learned to survive longer and occasionally go on strong profit runs
  • It still did not reliably beat the house (as expected — the house edge is real)
  • Results improved when training time increased or feature engineering improved

Conclusion: What Worked, What Didn’t

Over the course of this series, I tested:

  • Classic systems (Flat, Martingale, Reverse)
  • Tabular Q-learning
  • Risk-aware Q-learning
  • Deep Q-learning (DQN)

None of them beat roulette long-term, but I learned a lot:

  • Simple systems fail predictably
  • Learning agents adapt, but need time and structure
  • Risk management and reward shaping deeply affect agent behavior
  • A small house edge is still incredibly powerful over time

What’s Next?

This was never about “winning at roulette” — it was a sandbox to explore:

  • Agent design
  • Simulation strategy
  • Reward modeling
  • Visual storytelling

If anything, it proved how subtle and devastating the house edge is — and how even sophisticated agents struggle to survive in truly negative-sum environments.

But this framework could easily be extended to more complex, less random games (e.g., options trading, portfolio selection, or blackjack strategy modeling).

Sharing

Related Articles

  • All Post
  • Articles
  • Blog Post
  • General Business Automation
  • Portfolio
  • Stock Market & Finance