Teaching a Machine to Beat Blackjack: An RL Journey

Blackjack is one of the most widely studied games in both casinos and academia. The rules are simple, but the strategy is subtle, and the possibility of card counting has made it a fascinating testbed for probability, game theory, and decision-making.

The question at the heart of this project: can a machine, using reinforcement learning (RL), teach itself how to play blackjack — and even rediscover professional strategies such as card counting?

This project is structured as a series of four experiments, each building on the last:

Notebook 1 – From Zero to Basic Strategy:
A toy blackjack environment with only two actions (Hit/Stand). The agent starts with no knowledge and learns strategy from scratch using Monte Carlo Control.
Notebook 2 – Scaling Up to Casino Blackjack:
A six-deck shoe, real casino rules (dealer stands on soft 17, blackjack pays 3:2), and the addition of Double Down as an action. Q-learning replaces Monte Carlo to handle the larger state/action space.
Notebook 3 – Card Counting & Adaptive Betting:
A multi-deck shoe with penetration, card counting features added to the state, and variable betting actions. The agent begins to learn when to bet more in favorable decks.
Notebook 4 – Deep RL for Blackjack:
Tabular methods give way to function approximation. A Deep Q-Network is used to generalize across the massive state/action space, combining play and betting decisions at scale.

Each notebook will be presented as a separate blog post, with code, results, and analysis. Together they form a complete journey — from a naïve beginner agent to an RL system that approaches professional-level play.

All of this can be found on my GitHub

Sharing

All Post
Articles
Blog Post
General Business Automation
Portfolio
Stock Market & Finance

All rights are reserved.

Teaching a Machine to Beat Blackjack: An RL Journey

Categories

Sharing

Related Articles

From Financial Statements to Structural Signals

A Feature Engineering Layer for SEC Fundamentals

Building a Buffett-Style Shareholder Letter Scoring Pipeline