When working with financial data, efficiency matters. Traders, analysts, and data scientists rely on various storage formats—CSV, Parquet, and SQLite—for backtesting trading strategies. But which format offers the best performance?
In this post, I benchmark three widely used storage formats while backtesting Apple Inc. (AAPL) stock data. The goal? To determine which format provides the fastest execution time while ensuring accuracy across different trading strategies.
The Experiment
I tested three popular trading strategies:
🔹 Simple Moving Average (SMA) Crossover
🔹 Bollinger Bands + Relative Strength Index (RSI)
🔹 Moving Average Convergence Divergence (MACD) + Average True Range (ATR)
Each strategy was applied to AAPL’s historical price data (2013-2024). The backtests were executed using three different storage formats:
1️⃣ CSV – A widely used, human-readable format.
2️⃣ Parquet – A highly efficient, columnar storage format optimized for speed.
3️⃣ SQLite – A lightweight, file-based database for structured data storage.
I measured the execution time for each combination to determine the most efficient approach.
Benchmark Results
Strategy | CSV (sec) | Parquet (sec) | SQLite (sec) |
---|---|---|---|
SMA | 0.08333 | 0.01298 | 0.06184 |
Bollinger+RSI | 0.02931 | 0.01848 | 0.02877 |
MACD+ATR | 0.02202 | 0.01333 | 0.04004 |
✅ All strategies produced identical results across all formats.
Key Takeaways
1️⃣ Parquet is the fastest format for all three strategies, significantly outperforming CSV and SQLite.
2️⃣ CSV is the slowest, likely due to inefficient data storage and retrieval.
3️⃣ SQLite offers structured storage but isn’t always the fastest for numeric-heavy financial data.
4️⃣ All formats provided identical results, confirming that precision was maintained across different storage options.
Final Thoughts
For large-scale financial data analysis, Parquet is the best choice due to its superior performance. CSV remains useful for simple tasks, while SQLite is great for structured queries.
If you’re working with stock data and need efficient storage and retrieval, switching to Parquet could significantly speed up your workflow.
See the Full Analysis
📌 View the full notebook and code on GitHub: GitHub Repository Link
Let me know in the comments if you’ve run similar tests or if you have other data storage recommendations!
Please note: this is mostly written by chatgpt