Card Counting & Adaptive Betting

Basic strategy alone cannot beat the house edge in blackjack. The only way to shift the odds is through betting variation — wagering more when the deck is favorable, and less when it is not. This is the essence of card counting.

This experiment introduces card counting into the RL framework and asks: can an agent discover when to bet more, without being told the rules of card counting?

Environment Extensions

The environment is expanded to resemble a casino shoe:

Six decks dealt with 75% penetration before shuffling.
Standard rules: dealer stands on soft 17, blackjack pays 3:2.
Betting actions added: flat bet (1 unit) or increase bet (2 units).
Hi-Lo count feature included in the state, discretized into count “buckets.”
The state is extended to:

# State includes both play and betting features
state = obs  # (player_total, dealer_upcard, usable_ace, bucket, allow_double)

# State includes both play and betting features
state = obs  # (player_total, dealer_upcard, usable_ace, bucket, allow_double)

Q-Learning with Betting

The action space now includes both play decisions and betting amounts. The agent updates its Q-values incrementally, but importantly, it respects which actions are valid in the current state.

target = r + gamma * max([Q[next_state][na] for na in next_valid])
Q[state][a] += alpha * (target - Q[state][a])

target = r + gamma * max([Q[next_state][na] for na in next_valid])
Q[state][a] += alpha * (target - Q[state][a])

This ensures the update only considers legal moves (for example, Double Down is only available on the first action).

Learned Policy

The playing strategy continues to look like basic strategy, with the expected differences between hard and soft hands. For example, with a soft 18 the agent hits against strong dealer upcards but stands otherwise.

The key difference appears in betting: when the count bucket is high (a favorable deck), the agent selects the larger bet more often.

Results

After training on hundreds of thousands of hands, the results show:

Flat betting EV: -7.7%
Adaptive betting EV: -5.4%

Outcome frequencies with adaptive betting:

Win rate ≈ 42%
Push rate ≈ 8%
Loss rate ≈ 49%

Takeaways

By incorporating the count bucket into the state, the agent learns to increase bets when the deck is favorable.
Expected value improves significantly compared to flat betting, though it remains negative overall.
The experiment shows reinforcement learning can independently rediscover the key principle of card counting — varying bet size to exploit deck composition.

Continue to Notebook 4: Deep RL for Blackjack →

Sharing

All Post
Articles
Blog Post
General Business Automation
Portfolio
Stock Market & Finance

All rights are reserved.

Card Counting & Adaptive Betting

Environment Extensions

Q-Learning with Betting

Learned Policy

Results

Takeaways

Categories

Sharing

Related Articles

Deep Reinforcement Learning for Blackjack

Scaling Up to Casino Blackjack

From Zero to Basic Strategy