Can an agent learn how to bet more intelligently than a fixed system?
This post is part of a multi-part series on simulating and analyzing roulette betting strategies with Python and reinforcement learning.
- Start from the beginning: The House Always Wins? I Taught 3 AI Agents to Beat Roulette Anyway
- Explore the code: GitHub – rossautomatedsolutions/roulette-simulator
In earlier posts, I analyzed classical systems like Flat Betting and Martingale. They all failed in the long run.
Now, I explored something more dynamic: a reinforcement learning agent that learns how to bet based on feedback from wins and losses.
This was my first “intelligent” system — built using tabular Q-learning.
What Is Q-Learning?
Q-learning is a reinforcement learning algorithm that learns the expected reward of taking an action in a given state. Over time, the agent builds a Q-table that maps:
(State, Action) → Value
The agent starts by exploring randomly and eventually settles into the most profitable decisions.
My Approach
I created a custom RouletteEnv environment with a small, discrete state space:
- Bankroll (bucketed by $100)
- Previous spin result (win/loss)
- Last outcome (bucketed into 6 bins)
Action space included outside bets (red, black, even, etc.) and straight number bets (0–36). Each action bet a flat $20.
Reward shaping included:
- Payout from win/loss
- Bonus for surviving 100+ spins
- Penalty for going bust
- Bonus for exceeding 1.5x bankroll
Training Results (1000 Episodes)
During training:
- The agent learned to favor safer bets early (e.g.,
red,black) - It avoided large losing streaks more effectively than classic systems
- It sometimes found profitable sequences using number-based bets
Sample Evaluation Results (100 Runs)
| Strategy | Avg Reward | Median Reward | Avg Spins | % Profitable |
|---|---|---|---|---|
| Q-Learning Agent | $8,522 | -$1,180 | 375 | 33% |
| Flat Betting | -$626 | -$1,120 | 775 | 8% |
While the agent didn’t consistently profit, it had higher upside potential than any fixed strategy I tested. It learned to take calculated risks — occasionally producing 5-digit returns — but also suffered busts.
Takeaway
Q-learning produced mixed results. It didn’t beat the house consistently, but it clearly learned non-random behavior.
- The agent outperformed Flat Betting in 33% of runs
- It was more volatile — but not completely reckless
- Reward shaping and state design had a big impact on behavior
This opened the door to more nuanced training techniques and smarter agents. In the next post, I added risk-awareness to encourage survival alongside reward.
Continue to Post 4: Smarter ≠ Safer – Risk-Aware Q-Learning