Overview
The first step in any machine learning project is gathering the necessary data. In this post, we outline how we collect and structure our options trading dataset. We focus on intraday price movements of options contracts and underlying assets, ensuring that we have high-quality data to drive our analysis.
Data Sources & Collection Process
- We retrieve options chain data and historical price bars from an API (Alpaca, Yahoo Finance, etc.).
- We structure the dataset to include strike price, expiration date, option type (call/put), moneyness, Greeks, and implied volatility.
- We store both underlying asset prices and options contract prices to compare price movements.
Key Functions & Outputs
fetch_full_day_intraday_data()
: Downloads intraday price data for the underlying stock (e.g., SPY).fetch_full_day_option_ohlc()
: Retrieves historical OHLC data for multiple options contracts.- Data is saved as CSV files for further processing.
This foundational step ensures that our data is complete and structured for in-depth analysis.