The ASTA team
\[P(y=1\, |\, x)=\alpha+\beta x\] is simple, but often inappropiate. If \(\beta\) is positive and \(x\) sufficiently large, then the probability exceeds \(1\).
Instead we consider the odds that the person is able to complete the task \[\mathtt{Odds}(y=1\, |\, x)=\frac{P(y=1\, |\, x)}{P(y=0\, |\, x)}= \frac{P(y=1\, |\, x)}{1-P(y=1\, |\, x)}\] which can have any positive value.
The logistic model is defined as: \[\mathtt{logit}(P(y=1\, |\, x))=\log(\mathtt{Odds}(y=1\, |\, x))=\alpha+\beta x\] The function \(\mathtt{logit}(p)=\log(\frac{p}{1-p})\) - i.e. log of odds - is termed the logistic transformation.
Remark that log odds can be any number, where zero corresponds to \(P(y=1\,|\, x)=0.5\). Solving \(\alpha+\beta x=0\) shows that at age \(x_0=-\alpha/\beta\) you have fifty-fifty chance of solving the task.
import numpy as np
from scipy.special import logit, expit
p = np.arange(0.1, 1.0, 0.2) # stop is exclusive, so use 1.0
p## array([0.1, 0.3, 0.5, 0.7, 0.9])
## array([-2.197, -0.847, 0. , 0.847, 2.197])
## array([0.1, 0.3, 0.5, 0.7, 0.9])
expit() applied to the transformed values can recover the original probabilities:Plot of logistic function and inverse logistic
import matplotlib.pyplot as plt
p = np.arange(0.001, 0.999, 0.005)
fit = plt.plot(p, logit(p), label='logit(p)')
plt.xlabel('p')
plt.ylabel('logit(p)')
plt.show()x = np.arange(-7, 7, 0.1)
fig = plt.plot(x, expit(x), label='inverse logit (expit)')
plt.xlabel('x')
plt.ylabel('expit(x)')Interpretation of \(\beta\):
What happens to odds, if we increase age by 1 year?
Consider the so-called odds-ratio: \[\frac{\mathtt{Odds}(y=1\, |\, x+1)}{\mathtt{Odds}(y=1\, |\, x)}= \frac{\exp(\alpha+\beta(x+1))}{\exp(\alpha+\beta x)}=\exp(\beta)\] where we see, that \(\exp(\beta)\) equals the odds for age \(x+1\) relative to odds at age \(x\).
This means that when age increase by 1 year, then the relative change \[ \frac{\exp(\alpha+\beta(x+1))-\exp(\alpha+\beta x)}{\exp(\alpha+\beta x)}\] in odds is given by \(100(\exp(\beta)-1)\%\).
Examples of logistic curves for \(P(y=1|x)\). The black curve has a positive \(\beta\)-value (=10), whereas the red has a negative \(\beta\) (=-3).
In addition we note that:
This means that at age \(x=-\frac{\alpha}{\beta}\) you have 50% chance to perform the task.
We shall investigate if income is a good predictor of whether or not you have a credit card.
n denote the number of persons with that income, and credit how many of these that carries a credit card.import pandas as pd
creInc = pd.read_csv("https://asta.math.aau.dk/datasets?file=income-credit.csv", sep=',')
creInc.head(6)## Income n credit
## 0 12 1 0
## 1 13 1 0
## 2 14 8 2
## 3 15 14 2
## 4 16 9 0
## 5 17 8 2
import statsmodels.api as sm
import statsmodels.formula.api as smf
modelFit = smf.glm(formula='credit + I(n - credit) ~ Income',
data=creInc,
family=sm.families.Binomial()).fit()
modelFit.summary()| Dep. Variable: | ['credit', 'I(n - credit)'] | No. Observations: | 24 |
|---|---|---|---|
| Model: | GLM | Df Residuals: | 22 |
| Model Family: | Binomial | Df Model: | 1 |
| Link Function: | Logit | Scale: | 1.0000 |
| Method: | IRLS | Log-Likelihood: | -27.417 |
| Date: | Mon, 03 Nov 2025 | Deviance: | 39.276 |
| Time: | 18:11:46 | Pearson chi2: | 32.3 |
| No. Iterations: | 5 | Pseudo R-squ. (CS): | 0.6698 |
| Covariance Type: | nonrobust |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Intercept | -3.5179 | 0.710 | -4.953 | 0.000 | -4.910 | -2.126 |
| Income | 0.1054 | 0.026 | 4.030 | 0.000 | 0.054 | 0.157 |
The response has the form credit + I(n - credit).
We need to use the function glm (generalized linear model).
The argument family=sm.families.Binomial() tells the function that the data has binomial variation. Leaving out this argument will lead Python to believe that data follows a normal distribution (linear regression).
The params extracts the coefficients (estimates of parameters) from the model:
## Intercept -3.517947
## Income 0.105409
## dtype: float64
## Coef. Std.Err. z P>|z| [0.025 0.975]
## Intercept -3.517947 0.710336 -4.952513 7.326117e-07 -4.910179 -2.125714
## Income 0.105409 0.026157 4.029788 5.582714e-05 0.054141 0.156677
Our model for dependence of odds of having a credit card related to income(\(x\)) is \[\mathtt{logit}(x)=\alpha+\beta x\] The hypothesis of no relation between income and ability to obtain a credit card corresponds to \[H_0:\ \ \beta=0\] with the alternative \(\beta\not=0\). Inspecting the summary reveals that \(\hat{\beta}=0.1054\) is more than 4 standard errors away from zero.
With a z-score equal to 4.03 we get the tail probability
## np.float64(5.577685288105094e-05)
Which is very significant - as reflected by the p-value.
From the summary:
\(\hat{\beta}=0.10541\) and hence \(\exp(\hat{\beta})-1=0.11\). If income increases by 1000 euro, then odds increases by 11%.
Standard error on \(\hat{\beta}\) is \(0.02616\) and hence a 95% confidence interval for log-odds ratio is \(\hat{\beta}\pm 1.96\times 0.02616=(0.054;0,157)\).
Corresponding interval for odds ratio: \(\exp((0.054;0,157))=(1.056;1.170)\),
i.e. the increase in odds is - with confidence 95% - between 5.6% and 17%.
## Ignoring unknown labels:
## • ylab : "Probability of credit card"
## • xlab : "Income"
We generalize the model to the case, where we have \(k\) predictors \(x_1,x_2,\ldots,x_k\). Where some might be dummies for a factor. \[\mathtt{logit}(P(y=1\, |\, x_1,x_2,\ldots,x_k))= \alpha+\beta_1 x_1+\cdots+\beta_k x_k\] Interpretation of \(\beta\)-values is unaltered: If we fix \(x_2,\ldots,x_k\) and increase \(x_1\) by one unit, then the relative change in odds is given by \(\exp(\beta_1)-1\).
Wisconsin Breast Cancer Database covers 683 observations of 10 variables in relation to examining tumors in the breast.
Nine clinical variables with a score between 0 and 10.
The binary variable Class with levels benign/malignant.
By default Python orders the levels lexicografically and chooses the first level as reference (\(y=0\)). Hence benign is reference, and we model odds of malignant.
We shall work with only 4 of the predictors, where two of these have been discretized.
## nuclei cromatin Size.low Size.medium Shape.low Class
## 0 1 3 True False True benign
## 1 10 3 False True False benign
## 2 2 3 True False True benign
## 3 4 3 False False False benign
## 4 1 3 True False True benign
## 5 10 9 False False False malignant
First we fit the model mainEffects with main effect of all predictors - remember the notation ~ . for all predictors. Then we fit the model noEffects with no predictors.
BC['Class_numeric'] = (BC['Class'] == 'malignant').astype(int)
BC = BC.rename(columns={
'Size.low': 'Size_low',
'Size.medium': 'Size_medium',
'Shape.low': 'Shape_low'
})
mainEffects = smf.glm('Class_numeric ~ nuclei + cromatin + Size_low + Size_medium + Shape_low', data=BC, family=sm.families.Binomial()).fit()
noEffects = smf.glm('Class_numeric ~ 1', data=BC, family=sm.families.Binomial()).fit()First we want to test, whether there is any effect of the predictors, i.e the null hypothesis \[H_0:\ \ \beta_1=\beta_2=\beta_3=\beta_4=\beta_5=0\]
We test the hypothesis that all \(\beta_i=0\) using a \(\chi^2\)-test.
from scipy.stats import chi2
ll_noEffects = noEffects.llf
ll_mainEffects = mainEffects.llf
test_stat = -2 * (ll_noEffects - ll_mainEffects)
df = mainEffects.df_model - noEffects.df_model
p_value = 1 - chi2.cdf(test_stat, df)
print(f"Test statistic: {test_stat}")## Test statistic: 749.2859276261477
## Degrees of freedom: 5
## P-value: 0.0
mainEffects is a much better model.
The test statistic is the Deviance (749.29), which should be small.
It is evaluated in a chi-square with 5 (the number of parameters equal to zero under the nul hypothesis) degrees of freedom.
The 95%-critical value for the \(\chi^2(5)\) distribution is 11.07 and the p-value is in practice zero.
## Coef. Std.Err. z P>|z| [0.025 0.975]
## Intercept -0.7090 0.8570 -0.8274 0.4080 -2.3887 0.9706
## Size_low[T.True] -3.6154 0.8081 -4.4739 0.0000 -5.1992 -2.0315
## Size_medium[T.True] -2.3773 0.7188 -3.3073 0.0009 -3.7861 -0.9685
## Shape_low[T.True] -2.1490 0.6054 -3.5496 0.0004 -3.3356 -0.9624
## nuclei 0.4403 0.0823 5.3483 0.0000 0.2790 0.6017
## cromatin 0.5058 0.1444 3.5025 0.0005 0.2228 0.7888
For each predictor \(p\) can we test the hypothesis: \[H_0:\ \ \beta_p=0\]
z-values, there is a clear effect of all 5 predictors. Which of course is also supported by the p-values. pred to our dataframe BC.pred is the final model’s estimate of the probability of malignant.## Class pred
## 0 benign 0.010817
## 1 benign 0.944507
## 2 benign 0.016702
## 3 benign 0.928883
## 4 benign 0.010817
## 5 malignant 0.999738
Not good for patients 2 and 4.
We may classify by round(BC$pred):
benign (probability BC$pred less than 0.5)malignant (probability BC$pred more than 0.5)BC['pred_class'] = np.where(BC['pred'] > 0.5, "pred_malignant", "pred_benign")
tab = pd.crosstab(BC['Class'], BC['pred_class'])
tab## pred_class pred_benign pred_malignant
## Class
## benign 433 11
## malignant 11 228
11+11=22 patients are misclassified.
malignant_preds = BC.loc[BC['Class'] == 'malignant', 'pred']
malignant_preds_sorted = malignant_preds.sort_values().head(5)
malignant_preds_sorted## 440 0.034703
## 216 0.036964
## 63 0.088544
## 474 0.189870
## 55 0.205024
## Name: pred, dtype: float64
There is a malignant woman with a predicted probability of malignancy, which is only 3.5%.
If we assign all women with predicted probability of malignancy above 5% to further investigation, then we only miss two malignant.
BC['pred_class'] = np.where(BC['pred'] > 0.05, "pred_malignant", "pred_benign")
tab = pd.crosstab(BC['Class'], BC['pred_class'])
tab## pred_class pred_benign pred_malignant
## Class
## benign 394 50
## malignant 2 237
The expense is that the number of false positive increases from 11 to 50.
BC['pred_class'] = np.where(BC['pred'] > 0.1, "pred_malignant", "pred_benign")
tab = pd.crosstab(BC['Class'], BC['pred_class'])
tab## pred_class pred_benign pred_malignant
## Class
## benign 417 27
## malignant 3 236