Part 4: Statistical Modeling & Machine Learning

Introduction

In this section, we apply logistic regression and clustering techniques to better understand trade profitability. We analyze whether specific attributes—such as time of entry, moneyness, and strike distance—affect the likelihood of hitting a profit target.

We also explore Cohen’s d effect size measurement and perform survival analysis to determine trade longevity.


Logistic Regression for Profit Prediction

To predict the probability of an option reaching its profit target, we apply a logistic regression model. The model takes various factors into account, such as strike distance, time of entry, and DTE (days to expiration).

Below are the results of our logistic regression:

Logistic Regression Results

Logit Regression Results ============================================================================== Dep. Variable: y No. Observations: 87601 Model: Logit Df Residuals: 87591 Method: MLE Df Model: 9 Date: Wed, 19 Mar 2025 Pseudo R-squ.: 0.01823 Time: 21:45:38 Log-Likelihood: -38455. converged: True LL-Null: -39169. Covariance Type: nonrobust LLR p-value: 7.703e-302 ============================================================================== coef std err z P>|z| [0.025 0.975] —————————————————————————— const 1.5223 0.287 5.296 0.000 0.959 2.086 x1 0.0316 0.003 9.591 0.000 0.025 0.038 x2 0.0150 0.012 1.238 0.216 -0.009 0.039 x3 0.1101 0.034 3.252 0.001 0.044 0.176 x4 0.2114 0.032 6.538 0.000 0.148 0.275 x5 0.3045 0.034 8.938 0.000 0.238 0.371 x6 0.4284 0.034 12.700 0.000 0.362 0.494 x7 -0.4325 0.019 -22.583 0.000 -0.470 -0.395 x8 -0.1146 0.286 -0.401 0.688 -0.675 0.446 x9 0.4324 0.286 1.512 0.130 -0.128 0.993

From these results:

  • x1 to x6 show statistically significant effects on trade profitability.
  • x7 (negative coefficient) suggests that some factors decrease the likelihood of hitting profit targets.
  • The pseudo R-squared value (0.01823) suggests that while some variables are relevant, more complexity may be needed for a better predictive model.

Profit Target Achievement Summary

We analyze the percentage of trades that achieve each profit target level.

Profit Target% of Trades Achieving
5%83.55%
10%75.17%
15%68.33%
20%62.53%
25%57.67%
30%53.41%
35%49.82%
40%46.94%
45%44.25%
50%41.92%

This suggests that trades frequently hit lower profit targets (5-10%) but become less likely to reach higher targets (30-50%).


Clustering & Sensitivity Analysis

By grouping trades into clusters, we can better understand the composite scores of profitable trades.

Top Trade Cluster Composite Scores

Contract SymbolComposite Score
SPY240118C004720007.3
SPY240118C004730007.3
SPY240118C004740007.3
SPY240118C004750007.3
SPY240118C004760007.3

Higher composite scores correlate with a higher probability of reaching profit targets.


Conclusion

  • Lower profit targets (5-15%) are easier to achieve than higher ones (30-50%).
  • Logistic regression highlights key predictive factors like strike distance and entry time.
  • Clustering reveals top trade setups with high composite scores.

This analysis provides a data-driven approach to trade selection and risk management for options traders.

Sharing

Related Articles

  • All Post
  • Articles
  • Blog Post
  • General Business Automation
  • Portfolio
  • Stock Market & Finance