Statistics and electronics - lecture 1

The ASTA team

Sources of variation

We shall study 2 types of variation

Data from Peter Koch

Peter has done 100 independent measurements of the capacity of 4 of the displayed capacitors and one additional. Nominal values are 47, 47, 100, 150, 150. All with stated tolerance of 1%.

load(url("https://asta.math.aau.dk/datasets?file=cap_1pct.RData"))
head(capDat, 4)
##   capacity nomval   sample
## 1    45.69     47 s_1_nF47
## 2    45.71     47 s_1_nF47
## 3    45.69     47 s_1_nF47
## 4    45.71     47 s_1_nF47

Here we see the first 4 capacity measurements of the first capacitor with nominal value 47.

table(capDat$sample)
## 
##  s_1_nF47  s_2_nF47 s_3_nF100 s_4_nF150 s_5_nF150 
##       100       100       100       100       100

Transformation

Linearisation:

\[\begin{align} f(x) &\approx f(x_0) + f'(x_0)(x - x_0) \\ \\ x_0 &= 1 \\ f(x) &= \log x \\ f'(x) &= 1/x \\ \\ x &= m/n \\ \log \left ( \frac{m}{n} \right ) &\approx \log 1 + \frac{1}{1} \left ( \frac{m}{n} - 1 \right ) \\ &= \frac{m - n}{n} \end{align}\]

\[ \log \left ( \frac{m}{n} \right ) \approx \frac{m - n}{n} \]

n <- 47
m <- seq(47-5*0.01*47, 47+5*0.01*47, length.out = 100)
plot(m, log(m/n), col = "red", type = "l")
lines(m, (m - n)/n, col = "blue", type = "l")
legend("topleft", legend = c("log(m/n)", "(m-n)/n"), lty = 1, col = c("red", "blue"))

Transformation

\[ \log \left ( \frac{m}{n} \right ) \approx \frac{m - n}{n} \]

Instead of the raw measurement we will consider:

\(\mbox{lnError = ln(measuredValue/nominalValue)}\)

Remark that by linear approximation:

\(\mbox{lnError}\approx\mbox{measuredValue/nominalValue - 1 = }\) \(\mbox{(measuredValue-nominalValue)/nominalValue}\)

which is the error relative to the nominal value.

I.e.: lnError can be interpreted as relative error.

Transformed data

capDat = within(capDat, lnError <- log(capacity/nomval))
head(capDat, 2)
##   capacity nomval   sample     lnError
## 1    45.69     47 s_1_nF47 -0.02826815
## 2    45.71     47 s_1_nF47 -0.02783051
tail(capDat, 2)
##     capacity nomval    sample     lnError
## 499    145.7    150 s_5_nF150 -0.02908558
## 500    145.6    150 s_5_nF150 -0.02977216

Model considerations

In this case we have as earlier mentioned two further sources of error:

Statistical model

\[\mbox{ln(measuredValue / nominalValue) = systematicError + productionError + measurementError}\]

We formulate the model:

where

Assumptions

This is the model treated in WMM chapter 13.11, where it is assumed that

Estimation of systematic error

The systematic error is simply estimated by the mean

muhat <- mean(capDat$lnError)
muhat
## [1] -0.0288375

The meter systematically reports a value, which is estimated to be 2.88% too low.

Estimation of random error

Notation from WMM chapter 13.3:

Theorem 13.4 states:

Fit

fit <- lm(lnError ~ sample, data = capDat)
anova(fit)
## Analysis of Variance Table
## 
## Response: lnError
##            Df    Sum Sq    Mean Sq F value    Pr(>F)    
## sample      4 0.0046576 0.00116440  4067.4 < 2.2e-16 ***
## Residuals 495 0.0001417 0.00000029                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

where we read

SS <- anova(fit)$`Sum Sq`
SSA <- SS[1]
SSE <- SS[2]

Solution

Solving the equations

yields

Summing up

The estimated variance on log error

is clearly dominated by the production error.

Test of no random effect

We have the possibility of testing the hypothesis

This is equivalent to

Under \(H_0\) the statistic

has an F-distribution with degrees of freedom \((k-1,k(n-1))\)

In the actual case \(f_{obs}=4067.4\), which is highly significant (p-value=0).

Lognormal variation

In the preceeding we assumed normal errors after a log transformation.

Let \(X\) be a random variable and \(Y=ln(X)\).

We say that \(X\) has a lognormal distribution if \(Y\) has a normal distribution with - say - mean \(\mu\) and standard deviation \(\sigma\).

Density plots:

Moments of lognormal

If \(Y=ln(X)\) has a normal distribution with mean \(\mu\) and standard deviation \(\sigma\), then Theorem 6.7 of WMM states:

If we are interested in relative variation, it is common to look at the coefficient of variation

if e.g. CV=0.05 then 95% of our measurements are within

i.e. most observations are within 10% of the mean.

CV of Lognormal

If \(Y=ln(X)\) has a normal distribution with mean \(\mu\) and standard deviation \(\sigma\), we calculate CV for \(X\) as

In Peter’s data we estimated the variance of the log error to \(11.64 \times 10^{-6}\), which means that the estimated CV of the capacity measurement is

i.e., if we correct for the systematic error of the meter, then our measurements are extremely precise.

Linear calibration

In our previous analysis, we assumed, that the systematic error on the meter did not depend on nominal value.

To check this assumption consider the model

where we have previously assumed slope(\(\beta\)) equal to 1.

Linear calibration fit

fit <- lm(log(capacity) ~ log(nomval), data = capDat)
summary(fit)
## 
## Call:
## lm(formula = log(capacity) ~ log(nomval), data = capDat)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0064121 -0.0010784  0.0007315  0.0013879  0.0050839 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0300145  0.0011907  -25.21   <2e-16 ***
## log(nomval)  1.0002636  0.0002648 3776.74   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.003101 on 498 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.426e+07 on 1 and 498 DF,  p-value: < 2.2e-16

The slope is more than close to 1. But is actually extremely significantly different from 1 (tvalue=3776.74 >>>> 3).

Clearly, it is a bit dubious to assume a linear relationship, as we only have 3 nominal values.

Calibrated values

If we stick to the linear calibration model, it is sensible to correct our measured errors according to the calibration of the meter:

ab = coef(fit)
ab
## (Intercept) log(nomval) 
## -0.03001454  1.00026359
capDat$lnError_c = (capDat$lnError - ab[1])/ab[2]

Calibrated data

head(capDat)
##   capacity nomval   sample     lnError   lnError_c
## 1    45.69     47 s_1_nF47 -0.02826815 0.001745930
## 2    45.71     47 s_1_nF47 -0.02783051 0.002183452
## 3    45.69     47 s_1_nF47 -0.02826815 0.001745930
## 4    45.71     47 s_1_nF47 -0.02783051 0.002183452
## 5    45.70     47 s_1_nF47 -0.02804930 0.001964715
## 6    45.69     47 s_1_nF47 -0.02826815 0.001745930

The calibrated data now shows that the production error on component s_1_nF47 is in the vicinity of 0.2%. Well below the tolerance 1%.