Teaching AI to Bet – The Q-Learning Roulette Agent

Can an agent learn how to bet more intelligently than a fixed system?

This post is part of a multi-part series on simulating and analyzing roulette betting strategies with Python and reinforcement learning.


In earlier posts, I analyzed classical systems like Flat Betting and Martingale. They all failed in the long run.

Now, I explored something more dynamic: a reinforcement learning agent that learns how to bet based on feedback from wins and losses.

This was my first “intelligent” system — built using tabular Q-learning.


What Is Q-Learning?

Q-learning is a reinforcement learning algorithm that learns the expected reward of taking an action in a given state. Over time, the agent builds a Q-table that maps:

(State, Action) → Value

The agent starts by exploring randomly and eventually settles into the most profitable decisions.


My Approach

I created a custom RouletteEnv environment with a small, discrete state space:

  • Bankroll (bucketed by $100)
  • Previous spin result (win/loss)
  • Last outcome (bucketed into 6 bins)

Action space included outside bets (red, black, even, etc.) and straight number bets (0–36). Each action bet a flat $20.

Reward shaping included:

  • Payout from win/loss
  • Bonus for surviving 100+ spins
  • Penalty for going bust
  • Bonus for exceeding 1.5x bankroll

Training Results (1000 Episodes)

During training:

  • The agent learned to favor safer bets early (e.g., red, black)
  • It avoided large losing streaks more effectively than classic systems
  • It sometimes found profitable sequences using number-based bets

Sample Evaluation Results (100 Runs)

StrategyAvg RewardMedian RewardAvg Spins% Profitable
Q-Learning Agent$8,522-$1,18037533%
Flat Betting-$626-$1,1207758%

While the agent didn’t consistently profit, it had higher upside potential than any fixed strategy I tested. It learned to take calculated risks — occasionally producing 5-digit returns — but also suffered busts.


Takeaway

Q-learning produced mixed results. It didn’t beat the house consistently, but it clearly learned non-random behavior.

  • The agent outperformed Flat Betting in 33% of runs
  • It was more volatile — but not completely reckless
  • Reward shaping and state design had a big impact on behavior

This opened the door to more nuanced training techniques and smarter agents. In the next post, I added risk-awareness to encourage survival alongside reward.


Continue to Post 4: Smarter ≠ Safer – Risk-Aware Q-Learning

Sharing

Related Articles

  • All Post
  • Articles
  • Blog Post
  • General Business Automation
  • Portfolio
  • Stock Market & Finance