The ASTA team
Picture of a “lot” of capacitors.
The word lot is used to identify several components produced in a single run.
Where a run is a production series limited to a given timeinterval and fixed production parameters.
Peter Koch has tested 269 of the capacitors in the displayed lot.
First of all, we will check the assumption that our measurements have a log normal error.
The qq-plot(WMM - section 8.8) supports normality of the ln_Error.
There are several tests of normality.
Two of these are considered in WMM section 10.11:
Consider a sample \(X_1,\ldots,X_n\) and an estimate of \(\sigma\) - the standard deviation of the population:
\(S_0\) is always a good estimator of the population standard deviation \(\sigma\) - no matter the form of the population distribution.
Next consider
This is a good estimator of \(\sigma\), if the population is normal. But otherwise, it will under- or overestimate \(\sigma\) depending on the form of the population distribution.
Hence we expect that
For large values of \(n\) a normal approximation yields that
that is, if \(-2\leq z_{obs}\leq 2\), we do not reject normality, if we test on level 5%.
mln_E=mean(ln_Error)
s1=sqrt(mean((ln_Error-mln_E)^2))
s0=sqrt(pi/2)*mean(abs(ln_Error-mln_E))
u=s1/s0
z_obs=sqrt(length(ln_Error))*(u-1)/0.2261
z_obs
## [1] -1.628122
Hence there is no evidence of non-normality.
Is a general method for investigating whether a sample has a specific distribution.
The first example in WMM is concerned with the problem of whether a dice is balanced.
That is, all sides have probability 1/6 of showing up.
Rolling the dice 120 times we expect
Actually we observe
Distance measure between observed and expected:
If the dice is balanced then
where k=6 is the number of possible outcomes.
For the actual data:
## [1] 11.0705
At 5% significance the critical value is 11.07, so there is no evidence of unbalancedness.
We assume that ln_Error is a sample from a normal distribution and divide the population distribution into 10 bins with equal probabilities p=10%.
The number of bins could be changed. It is required that the expected frequency should be at least 5.
Area in each bin of the red population curve is 0.1 and as sample size is 269 we obtain
Observed frequecies:
## bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8 bin9 bin10
## 25 37 25 19 28 30 21 25 25 34
\(X^2\) statistic:
## [1] 10.21933
The degrees of freedom is the number of bins minus 3 (number of parameters + 1), i.e. df = 10-3 = 7.
## [1] 10.21933
## [1] 14.06714
## [1] 0.1764812
We do not reject normality at level 5%.
As mentioned, there are multiple tests of normality.
We introduce one other test: Shapiro-Wilks. It is standard in R
.
We do not treat the details, but the test statistic is somewhat like a correlation for the qq-plot. If the “correlation is far from 1”, we reject normality.
##
## Shapiro-Wilk normality test
##
## data: ln_Error
## W = 0.99255, p-value = 0.1971
With p-value=19.71%, we do not reject normality, if we test on level 5%.
In lecture 1 we discussed
Generally it is relevant to decompose the production variation in 2 components:
As we have one lot only, we cannot identify the variation between lots.
Our actual data are thus composed of
In lecture 1 we developed a linear calibration eliminating the systematic measurement error.
Adopting this to the actual data yields
We are now left with a sample, which has
where we have assumed that the random measurement error and the random lot error are independent.
Estimate of \(\mu_l\)
## [1] -0.02686793
That is, the systematic lot error is around -2.7%.
Estimate of \(\sigma_m^2+\sigma_l^2\)
## [1] 0.0003892828
that is \(s_m^2+s_l^2=\) 3.9e-04
In lecture 1 we estimated \(s_m^2\) = 0.29e-06 and hence
3 sigma limits for the correct lot values:
clearly respecting the 10% tolerance.
Peter has also tested 311 capacitors with nominal value 470 nF
cap470 <- read.table(url("https://asta.math.aau.dk/datasets?file=capacitor_lot_470_nF2.txt"))[, 1]
hist(cap470, breaks = 15, col = "greenyellow")
Consulting Peter, it turned out, that his box of capacitors contained components from 2 different lots.
We ln-transform and calibrate:
ln_Error <- log(cap470/470)
ln_Error_corrected <- (ln_Error-ab[1])/ab[2]
hist(ln_Error_corrected, breaks = 15, col = "gold")
## [1] -0.08888934 0.08323081
We assume that the ln_Error
So we have 4 unknown parameters: \((\mu_1,\mu_2,\sigma,p)\).
How to estimate these, we entrust to the R
-package mclust
.
library(mclust)
fit <- Mclust(ln_Error_corrected, 2 , "E")# 2 clusters; "E"qual variances
pr <- fit$parameters$pro[1]
pr
## [1] 0.728314
The chance of coming from lot1 is around 73%.
## 1 2
## -0.05174452 0.05406515
## [1] 0.01692654
Estimate of \(\sigma\) was 1.7%. In relation to the 220 nF lot we estimated 2.0%, which is comparable.
do not completely respect the tolerance 10%. However, in the sample the minimum is -8.9% and the maximum 8.3%.
This indicates that the variation between lots is much greater than the variation within lots.
Which is also clearly illustrated by the histogram/density plots.