Smarter ≠ Safer – Risk-Aware Q-Learning

What happens when an agent learns not just to win, but to protect itself from loss?

This post is part of my series exploring how to simulate, analyze, and optimize roulette strategies using reinforcement learning.

Start from the beginning: The House Always Wins? I Taught 3 AI Agents to Beat Roulette Anyway
View the source code: GitHub – rossautomatedsolutions/roulette-simulator

After building a Q-learning agent that could learn from wins and losses, I wanted to take things a step further.

What if the agent didn’t just chase reward, but also learned to avoid risk?

I created a risk-aware roulette agent, using enhanced state features and reward shaping to emphasize survival, bankroll preservation, and streak awareness.

Key Additions

Expanded State Space

The agent’s state included:

Bankroll (bucketed)
Drawdown from peak bankroll
Current win/loss streak
Last spin result (win or loss)

This gave the agent memory of context — not just the current bankroll, but whether it was trending up or spiraling down.

Reward Shaping

In addition to win/loss payouts, the agent received:

Penalty for large drawdowns (>50% of starting bankroll)
Penalty for long losing streaks
Bonus for surviving every 100 spins
Bonus for growing bankroll >1.5x

This shaped the agent toward stable, long-lived behavior.

Results

After 1000 training episodes, I evaluated the risk-aware agent on 100 new roulette sessions.

Strategy	Avg Reward	Median Reward	Avg Spins	% Profitable
Risk-Aware Agent	-$14,300	-$9,978	404	0%
Flat Betting	-$32,660	-$29,917	734	1%

Takeaways

The risk-aware agent lost less money than flat betting on average
However, it never produced profitable outcomes
By avoiding all risk, it eliminated all upside

In trying to prevent failure, the agent also stopped itself from succeeding.

This highlights a core trade-off: protecting capital vs pursuing growth. You need both — and this version only had one.

What’s Next?

The next step was to bring in deep learning — by moving from a Q-table to a neural network (DQN) that could handle continuous states and learn richer patterns.

It’s more complex, but it opens the door to more powerful policies — and possibly, something closer to intelligent betting.

Continue to Post 5: From Q-Tables to Neural Nets – A DQN Roulette Agent

Sharing

All Post
Articles
Blog Post
General Business Automation
Portfolio
Stock Market & Finance

All rights are reserved.

Smarter ≠ Safer – Risk-Aware Q-Learning

Key Additions

Expanded State Space

Reward Shaping

Results

Takeaways

What’s Next?

Categories

Sharing

Related Articles

Deep Reinforcement Learning for Blackjack

Card Counting & Adaptive Betting

Scaling Up to Casino Blackjack