Teaching AI to Bet – The Q-Learning Roulette Agent

Can an agent learn how to bet more intelligently than a fixed system?

This post is part of a multi-part series on simulating and analyzing roulette betting strategies with Python and reinforcement learning.

Start from the beginning: The House Always Wins? I Taught 3 AI Agents to Beat Roulette Anyway
Explore the code: GitHub – rossautomatedsolutions/roulette-simulator

In earlier posts, I analyzed classical systems like Flat Betting and Martingale. They all failed in the long run.

Now, I explored something more dynamic: a reinforcement learning agent that learns how to bet based on feedback from wins and losses.

This was my first “intelligent” system — built using tabular Q-learning.

What Is Q-Learning?

Q-learning is a reinforcement learning algorithm that learns the expected reward of taking an action in a given state. Over time, the agent builds a Q-table that maps:

(State, Action) → Value

The agent starts by exploring randomly and eventually settles into the most profitable decisions.

My Approach

I created a custom RouletteEnv environment with a small, discrete state space:

Bankroll (bucketed by $100)
Previous spin result (win/loss)
Last outcome (bucketed into 6 bins)

Action space included outside bets (red, black, even, etc.) and straight number bets (0–36). Each action bet a flat $20.

Reward shaping included:

Payout from win/loss
Bonus for surviving 100+ spins
Penalty for going bust
Bonus for exceeding 1.5x bankroll

Training Results (1000 Episodes)

During training:

The agent learned to favor safer bets early (e.g., red, black)
It avoided large losing streaks more effectively than classic systems
It sometimes found profitable sequences using number-based bets

Sample Evaluation Results (100 Runs)

Strategy	Avg Reward	Median Reward	Avg Spins	% Profitable
Q-Learning Agent	$8,522	-$1,180	375	33%
Flat Betting	-$626	-$1,120	775	8%

While the agent didn’t consistently profit, it had higher upside potential than any fixed strategy I tested. It learned to take calculated risks — occasionally producing 5-digit returns — but also suffered busts.

Takeaway

Q-learning produced mixed results. It didn’t beat the house consistently, but it clearly learned non-random behavior.

The agent outperformed Flat Betting in 33% of runs
It was more volatile — but not completely reckless
Reward shaping and state design had a big impact on behavior

This opened the door to more nuanced training techniques and smarter agents. In the next post, I added risk-awareness to encourage survival alongside reward.

Continue to Post 4: Smarter ≠ Safer – Risk-Aware Q-Learning

Sharing

All Post
Articles
Blog Post
General Business Automation
Portfolio
Stock Market & Finance

All rights are reserved.

Teaching AI to Bet – The Q-Learning Roulette Agent

What Is Q-Learning?

My Approach

Training Results (1000 Episodes)

Sample Evaluation Results (100 Runs)

Takeaway

Categories

Sharing

Related Articles

Deep Reinforcement Learning for Blackjack

Card Counting & Adaptive Betting

Scaling Up to Casino Blackjack