Data from Peter Koch

Peter has done 100 independent measurements of the capacity of 4 of the displayed capacitors and one additional. Nominal values are 47, 47, 100, 150, 150. All with stated tolerance of 1%.

load(url("https://asta.math.aau.dk/datasets?file=cap_1pct.RData"))
head(capDat, 4)

##   capacity nomval   sample
## 1    45.69     47 s_1_nF47
## 2    45.71     47 s_1_nF47
## 3    45.69     47 s_1_nF47
## 4    45.71     47 s_1_nF47

Here we see the first 4 capacity measurements of the first capacitor with nominal value 47.

Remark: The measured values are consistently below the nominal value minus the 1% tolerance: \(47 - 0.47 = 46.53\).

table(capDat$sample)

## 
##  s_1_nF47  s_2_nF47 s_3_nF100 s_4_nF150 s_5_nF150 
##       100       100       100       100       100

Transformation

Linearisation:

\[\begin{align} f(x) &\approx f(x_0) + f'(x_0)(x - x_0) \\ \\ x_0 &= 1 \\ f(x) &= \log x \\ f'(x) &= 1/x \\ \\ x &= m/n \\ \log \left ( \frac{m}{n} \right ) &\approx \log 1 + \frac{1}{1} \left ( \frac{m}{n} - 1 \right ) \\ &= \frac{m - n}{n} \end{align}\]

\[ \log \left ( \frac{m}{n} \right ) \approx \frac{m - n}{n} \]

n <- 47
m <- seq(47-5*0.01*47, 47+5*0.01*47, length.out = 100)
plot(m, log(m/n), col = "red", type = "l")
lines(m, (m - n)/n, col = "blue", type = "l")
legend("topleft", legend = c("log(m/n)", "(m-n)/n"), lty = 1, col = c("red", "blue"))

Transformation

\[ \log \left ( \frac{m}{n} \right ) \approx \frac{m - n}{n} \]

Instead of the raw measurement we will consider:

\(\mbox{lnError = ln(measuredValue/nominalValue)}\)

Remark that by linear approximation:

\(\mbox{lnError}\approx\mbox{measuredValue/nominalValue - 1 = }\) \(\mbox{(measuredValue-nominalValue)/nominalValue}\)

which is the error relative to the nominal value.

I.e.: lnError can be interpreted as relative error.

Transformed data

capDat = within(capDat, lnError <- log(capacity/nomval))
head(capDat, 2)

##   capacity nomval   sample     lnError
## 1    45.69     47 s_1_nF47 -0.02826815
## 2    45.71     47 s_1_nF47 -0.02783051

tail(capDat, 2)

##     capacity nomval    sample     lnError
## 499    145.7    150 s_5_nF150 -0.02908558
## 500    145.6    150 s_5_nF150 -0.02977216

The resolution on Peters capacitance meter is with 2/1 decimal(s) in the 47/150 nF range, which means that only a limited number of different values(3-8) are observed. Meaning that box- or histogram-plots are noninformative.

Model considerations

The measurements are more than 2.7% below the nominal value. This must be due to a systematic error on the meter.

In this case we have as earlier mentioned two further sources of error:

\(\mbox{ln(measuredValue / nominalValue) = systematicError + productionError + measurementError}\)

Statistical model

\[\mbox{ln(measuredValue / nominalValue) = systematicError + productionError + measurementError}\]

We formulate the model:

\(Y_{ij}=\mu+A_i+\varepsilon_{ij}\)

where

\(Y_{ij}\) is the log error measurement
\(\mu\) is the systematic error on the meter
\(A_i\) is the random production error
\(\varepsilon_{ij}\) is the random measurement error
\(i=1,2,3,4,k=5\) is the number of the 5 samples
\(j=1,\ldots,n=100\) is the number of the observation in each sample

Assumptions

This is the model treated in WMM chapter 13.11, where it is assumed that

\(A_i\) is normally distributed with mean 0 and variance \(\sigma_\alpha^2\), which represents the production error
\(\varepsilon_{ij}\) is normally distributed with mean 0 and variance \(\sigma^2\), which represents the measurement error

Estimation of systematic error

The systematic error is simply estimated by the mean

\(\hat\mu=\bar y_{..}\)

muhat <- mean(capDat$lnError)
muhat

## [1] -0.0288375

The meter systematically reports a value, which is estimated to be 2.88% too low.

Estimation of random error

Notation from WMM chapter 13.3:

\(SSA=n\sum_i (\bar y_{i.}-\bar y_{..})^2\) (related to production error)
\(SSE=\sum_{ij} ( y_{ij}-\bar y_{i.})^2\) (related to measurement error)

Theorem 13.4 states:

\(E(SSA)=(k-1)\sigma^2+n(k-1)\sigma_\alpha^2\)
\(E(SSE)=k(n-1)\sigma^2\)

Fit

fit <- lm(lnError ~ sample, data = capDat)
anova(fit)

## Analysis of Variance Table
## 
## Response: lnError
##            Df    Sum Sq    Mean Sq F value    Pr(>F)    
## sample      4 0.0046576 0.00116440  4067.4 < 2.2e-16 ***
## Residuals 495 0.0001417 0.00000029                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

where we read

SS <- anova(fit)$`Sum Sq`
SSA <- SS[1]
SSE <- SS[2]

SSA = 0.00466 and SSE = 0.000142

Solution

Solving the equations

SSA = E(SSA) and SSE = E(SSE)

yields

\(\hat\sigma_\alpha^2=\frac{1}{n}(\frac{SSA}{k-1}-\hat\sigma^2)\) = \(11.64 \times 10^{-6}\)
\(\hat\sigma^2=\frac{SSE}{k(n-1)}\) = \(0.29 \times 10^{-6}\)

Summing up

the meter has an estimated systematic error of \(-2.88\)%
the estimated standard error of the meter is \(\sqrt{0.29 \times 10^{-6}}\) = 0.054%
the estimated standard error of the production is \(\sqrt{11.64 \times 10^{-6}}\) = 0.34%. So the 3-sigma limit is 1.02%, which is in accordance with the tolerance of 1%. It should be noted that the estimate is insecure, as it is based on 4 degrees of freedom only.

The estimated variance on log error

\(0.29 \times 10^{-6} + 11.64 \times 10^{-6} = 11.93 \times 10^{-6}\)

is clearly dominated by the production error.

Test of no random effect

We have the possibility of testing the hypothesis

\(H_0\): \(\sigma_\alpha=0\)

This is equivalent to

\(E(SSA/(k-1))=E(SSE/k/(n-1))=\sigma^2\)

Under \(H_0\) the statistic

\(F=\frac{\frac{SSA}{k-1}}{\frac{SSE}{k(n-1)}}\)

has an F-distribution with degrees of freedom \((k-1,k(n-1))\)

In the actual case \(f_{obs}=4067.4\), which is highly significant (p-value=0).

Lognormal variation

In the preceeding we assumed normal errors after a log transformation.

Let \(X\) be a random variable and \(Y=ln(X)\).

We say that \(X\) has a lognormal distribution if \(Y\) has a normal distribution with - say - mean \(\mu\) and standard deviation \(\sigma\).

Density plots:

Moments of lognormal

If \(Y=ln(X)\) has a normal distribution with mean \(\mu\) and standard deviation \(\sigma\), then Theorem 6.7 of WMM states:

\(E(X)=\exp(\mu+\sigma^2/2)\)
\(Var(X)=\exp(2\mu+\sigma^2)(\exp(\sigma^2)-1)\)

If we are interested in relative variation, it is common to look at the coefficient of variation

\(CV(X)=\frac{\sigma}{\mu}\)

if e.g. CV=0.05 then 95% of our measurements are within

\(\mu\pm 2\sigma=\mu\pm 2*0.05\mu=\mu(1\pm 0.1)\)

i.e. most observations are within 10% of the mean.

CV of Lognormal

If \(Y=ln(X)\) has a normal distribution with mean \(\mu\) and standard deviation \(\sigma\), we calculate CV for \(X\) as

\(CV(X)=\frac{E(X)}{\sqrt{Var(X)}}=\sqrt{\exp(\sigma^2)-1}\)

In Peter’s data we estimated the variance of the log error to \(11.64 \times 10^{-6}\), which means that the estimated CV of the capacity measurement is

CV = \(\sqrt{\exp\left (11.64 \times 10^{-6} \right ) -1} = 0.34\)%.

i.e., if we correct for the systematic error of the meter, then our measurements are extremely precise.

Linear calibration

In our previous analysis, we assumed, that the systematic error on the meter did not depend on nominal value.

To check this assumption consider the model

\(Y=\mbox{ln(measuredValue)}\) is a linear model of \(x= \mbox{ln(nominalValue)}\)
\(Y=\alpha+\beta x+\varepsilon\)

where we have previously assumed slope(\(\beta\)) equal to 1.

Linear calibration fit

fit <- lm(log(capacity) ~ log(nomval), data = capDat)
summary(fit)

## 
## Call:
## lm(formula = log(capacity) ~ log(nomval), data = capDat)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0064121 -0.0010784  0.0007315  0.0013879  0.0050839 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0300145  0.0011907  -25.21   <2e-16 ***
## log(nomval)  1.0002636  0.0002648 3776.74   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003101 on 498 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.426e+07 on 1 and 498 DF,  p-value: < 2.2e-16

The slope is more than close to 1. But is actually extremely significantly different from 1 (tvalue=3776.74 >>>> 3).

Clearly, it is a bit dubious to assume a linear relationship, as we only have 3 nominal values.

Calibrated values

If we stick to the linear calibration model, it is sensible to correct our measured errors according to the calibration of the meter:

\[\mbox{measuredError}=\alpha+\beta *\mbox{correctError}\]
\[\mbox{correctError}=(\mbox{measuredError}-\alpha)/\beta\]

ab = coef(fit)
ab

## (Intercept) log(nomval) 
## -0.03001454  1.00026359

capDat$lnError_c = (capDat$lnError - ab[1])/ab[2]

Calibrated data

head(capDat)

##   capacity nomval   sample     lnError   lnError_c
## 1    45.69     47 s_1_nF47 -0.02826815 0.001745930
## 2    45.71     47 s_1_nF47 -0.02783051 0.002183452
## 3    45.69     47 s_1_nF47 -0.02826815 0.001745930
## 4    45.71     47 s_1_nF47 -0.02783051 0.002183452
## 5    45.70     47 s_1_nF47 -0.02804930 0.001964715
## 6    45.69     47 s_1_nF47 -0.02826815 0.001745930

The calibrated data now shows that the production error on component s_1_nF47 is in the vicinity of 0.2%. Well below the tolerance 1%.

Statistics and electronics - lecture 1

Sources of variation

Data from Peter Koch

Transformation

Transformation

Transformed data

Model considerations

Statistical model

Assumptions

Estimation of systematic error

Estimation of random error

Fit

Solution

Summing up

Test of no random effect

Lognormal variation

Moments of lognormal

CV of Lognormal

Linear calibration

Linear calibration fit

Calibrated values

Calibrated data