Building a Buffett-Style Shareholder Letter Scoring Pipeline

I recently finished the first version of a philosophy-oriented NLP research pipeline focused on shareholder letters.

The idea behind philo_nlp is simple: long-form corporate communication often reflects an underlying philosophy. In this first implementation, the reference philosophy is Berkshire Hathaway and Warren Buffett shareholder communication.

The project is split into three repositories.

Berkshire_Letters builds the Buffett reference corpus by downloading and normalizing Berkshire shareholder letters into a reusable NLP dataset.

shareholder_letters_downloader expands the process to other companies. It downloads shareholder letters, extracts text, performs quality checks, and generates structured sentiment and keyword features.

philo_nlp consumes those structured datasets and performs the actual similarity scoring and ranking.

The workflow looks like this:

Berkshire reference corpus
        ↓
Shareholder letter pipeline
        ↓
Structured NLP datasets
        ↓
Buffett-style scoring and ranking

For the first end-to-end run, I tested:

Berkshire Hathaway
Markel
Brookfield
Amazon
Danaher
Costco
Apple
Meta
Alphabet

The pipeline successfully produced a ranked Buffett-style screening output. Markel and Brookfield ranked highly, which is directionally intuitive for a first-pass semantic comparison against the Berkshire reference profile.

The most important result isn’t the ranking itself. The important result is that the entire workflow now runs end to end:

download letters
→ extract text
→ build features
→ score similarity
→ rank companies

This is not investment advice or a production investment model. It is research infrastructure designed to make shareholder-letter analysis reproducible and extensible.

Future work may include multi-year company histories, richer Buffett reference profiles, explainability, and broader universe coverage.

Repository links:

Sharing

All Post
Articles
Blog Post
General Business Automation
Portfolio
Stock Market & Finance

All rights are reserved.

Building a Buffett-Style Shareholder Letter Scoring Pipeline

Categories

Sharing

Related Articles

From Financial Statements to Structural Signals

A Feature Engineering Layer for SEC Fundamentals

Building a COT Positioning Dashboard for Faster Futures Market Review