The ASTA team
Capacitors come with a nominal value for the capacitance.
We shall study 2 sources of variation:
Peter has done 100 independent measurements of the capacitance of each 4 of the displayed capacitors and one additional.
Nominal values are 47, 47, 100, 150, 150 nF.
All have a stated tolerance of 1%.
## capacity nomval sample
## 1 45.69 47 s_1_nF47
## 2 45.71 47 s_1_nF47
## 3 45.69 47 s_1_nF47
## 4 45.71 47 s_1_nF47
Here we see the first 4 measurements of the first capacitor with nominal value 47nF.
##
## s_1_nF47 s_2_nF47 s_3_nF100 s_4_nF150 s_5_nF150
## 100 100 100 100 100
Instead of considering the raw errors \[\text{measuredValue - nominalValue},\] we will consider the relative error \[\frac{\text{measuredValue - nominalValue}}{\text{nominalValue}}.\]
A tolerance of 0.01 means that the relative error should be within \(\pm 0.01\).
Instead of looking at the relative error, we may look at the following approximation: \[\text{lnError} = \ln\Big( \frac{\text{measuredValue}}{\text{nominalValue}} \Big) \approx \ \frac{\text{measuredValue} - \text{nominalValue}}{\text{nominalValue}} \]
This is illustrated below with a nominal value of \(n=47\) and measured values of \(47\) plus/minus \(5\%\).
The approximation can be justified theoretically.
Recall the linear approximation of a function: \[ f(x) \approx f(x_0) + f'(x_0)(x - x_0) \]
If we take \[\begin{align}
x_0 &= 1 \\
f(x) &= \ln x \\
f'(x) &= 1/x,
\end{align}\]
we get \[\ln(x) \approx \ln(x_0) +\frac{1}{x_0}\cdot(x-x_0) = x-1.\]
Suppose \(x=m/n\). Then
\[
\ln \left ( \frac{m}{n} \right ) \approx \frac{m}{n} - 1 = \frac{m - n}{n}
\]
lnError
variable in the capDat
dataset.## capacity nomval sample lnError
## 1 45.69 47 s_1_nF47 -0.02826815
## 2 45.71 47 s_1_nF47 -0.02783051
## capacity nomval sample lnError
## 499 145.7 150 s_5_nF150 -0.02908558
## 500 145.6 150 s_5_nF150 -0.02977216
## sample min Q1 median Q3 max
## 1 s_1_nF47 -0.02958221 -0.02832287 -0.02804930 -0.02804930 -0.02783051
## 2 s_2_nF47 -0.02914399 -0.02783051 -0.02761176 -0.02761176 -0.02717441
## 3 s_3_nF100 -0.03521276 -0.03399638 -0.03386707 -0.03366020 -0.03334998
## 4 s_4_nF150 -0.02565975 -0.02446352 -0.02429269 -0.02429269 -0.02360987
## 5 s_5_nF150 -0.03045921 -0.02977216 -0.02908558 -0.02908558 -0.02908558
## mean sd n missing
## 1 -0.02832518 0.0005062160 100 0
## 2 -0.02786346 0.0005171088 100 0
## 3 -0.03398306 0.0005057586 100 0
## 4 -0.02453879 0.0005870180 100 0
## 5 -0.02947702 0.0005543930 100 0
All measurements are more than 2.3% below the nominal value.
This must be due to a systematic error on the meter.
We have the model: \[\ln\Big(\frac{\text{measuredValue}}{\text{nominalValue}}\Big) = \text{systematicError} + \text{productionError} + \text{measurementError}\]
We may write the model mathematically as \[Y_{ij}=\mu+A_i+\varepsilon_{ij}\] where
We make the following assumptions:
This is called a random effects model, see [WMMY] Chapter 13.11.
## [1] -0.0288375
We now try to estimate the variance \(\sigma_\alpha^2\) of the production error and the variance of the random measurement error \(\sigma^2\).
We need two types of sum of squares:
SSA (sum of squares between groups) measures how much the sample means for the individual capacitors \(\bar y_{i.}\) deviate from the total sample mean \(\bar y_{..}\) \[SSA=n\sum_i (\bar y_{i.}-\bar y_{..})^2\]
SSE (sum of squares within groups) measures how much the individual measurements deviate from the sample mean of the capacitor they were measured on: \[SSE=\sum_{ij} ( y_{ij}-\bar y_{i.})^2\]
Intuitively, \(SSA\) is closely related to the variance of the production error \(\sigma_\alpha^2\), while \(SSE\) is closely related to the variance of the random measurement error \(\sigma^2\).
## Analysis of Variance Table
##
## Response: lnError
## Df Sum Sq Mean Sq F value Pr(>F)
## sample 4 0.0046576 0.00116440 4067.4 < 2.2e-16 ***
## Residuals 495 0.0001417 0.00000029
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0.004657588
## [1] 0.0001417076
\[E(SSA)=(k-1)\sigma^2+n(k-1)\sigma_\alpha^2\] \[E(SSE)=k(n-1)\sigma^2\]
The meter has an estimated systematic error of \(\hat \mu = -2.88\%\).
The estimated standard deviation of the meter is \(\hat \sigma = \sqrt{2.86 \cdot 10^{-7}} = 0.0534\%\).
The estimated standard deviation of the production error is \(\hat \sigma_\alpha = \sqrt{1.16\cdot 10^{-5}} = 0.341 \%\).
Since \(99.7\%\) (practically all) of all observations fall within \(\pm 3\cdot \sigma_\alpha\) from \(0\), we have that the production error falls within \[\pm 3 \cdot 0.341\% = 1.02 \%\] of the nominal value, which is in accordance with the tolerance of 1%.
The total estimated variance of the log error is \[\hat \sigma_\alpha^2 + \hat \sigma^2 = 1.16\cdot 10^{-5} + 2.86 \cdot 10^{-7}=1.19\cdot 10^{-5}.\]
Note that especially the estimate \(\hat \sigma_\alpha\) is quite uncertain, since we only have measurements from 5 capacitors.
In the preceeding analysis, we assumed that the log-transformed errors had a normal distribution.
Let \(X\) be a random variable and \(Y=\ln(X)\).
We say that \(X\) has a lognormal distribution if \(Y\) has a normal distribution with - say - mean \(\mu\) and standard deviation \(\sigma\).
Here are some plots of the density of the lognormal distribution:
Suppose \(X\) has a log-normal distribution, so that \(Y=\ln(X)\) has a normal distribution with mean \(\mu\) and standard deviation \(\sigma\).
Then the mean and variance are given by (Theorem 6.7 of [WMMY]): \[E(X)=\exp(\mu+\sigma^2/2)\] \[Var(X)=\exp(2\mu+\sigma^2)(\exp(\sigma^2)-1)\]
The coefficient of variation is then \[CV(X)=\frac{\sqrt{Var(X)}}{E(X)}= \frac{\sqrt{\exp(2\mu+\sigma^2)(\exp(\sigma^2)-1)}}{\exp(\mu+\sigma^2/2)}=\sqrt{\exp(\sigma^2)-1}\]
In Peter’s data we estimated the variance of the ln error to \(\hat\sigma_\alpha^2 = 1.16\cdot 10^{-5},\) which means that the estimated CV of the capacity measurement is \[\widehat{CV}(X)=\sqrt{\exp\left ( 1.16\cdot 10^{-5} \right ) -1} = 0.341\%.\]
In our previous analysis, we assumed, that the systematic error on the meter did not depend on nominal value. \[ \ln\Big(\frac{\text{measuredValue}}{\text{nominalValue}}\Big) = \text{meterError} + \text{randomError} \]
To check this assumption consider the linear model \[\ln(\text{measuredValue}) =\alpha+\beta \cdot \ln(\text{nominalValue})+\varepsilon.\]
Note that the previously considered model corresponds to \(\beta=1\).
##
## Call:
## lm(formula = log(capacity) ~ log(nomval), data = capDat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0064121 -0.0010784 0.0007315 0.0013879 0.0050839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0300145 0.0011907 -25.21 <2e-16 ***
## log(nomval) 1.0002636 0.0002648 3776.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.003101 on 498 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 1.426e+07 on 1 and 498 DF, p-value: < 2.2e-16
The slope looks close to 1.
We may test the null-hypothesis \(H_0: \beta = 1\). \[t_{obs} = \frac{1.0002636-1}{0.0002648} = 0.995.\] This yields a p-value of around \(32\%\).
If we stick to the linear calibration model, it is sensible to correct our measured errors according to the calibration of the meter.
We have the model: \[\text{measuredValue}=\alpha+\beta *\text{nominalValue}\]
We compute the calibrated values \[\text{calibratedValue}=(\text{measuredValue}-\alpha)/\beta\]
We estimate the coefficients \(\alpha\) and \(\beta\) and calibrate the measurements.
## (Intercept) log(nomval)
## -0.03001454 1.00026359
## capacity nomval sample lnError lnError_c
## 1 45.69 47 s_1_nF47 -0.02826815 0.001745930
## 2 45.71 47 s_1_nF47 -0.02783051 0.002183452
## 3 45.69 47 s_1_nF47 -0.02826815 0.001745930
## 4 45.71 47 s_1_nF47 -0.02783051 0.002183452
## 5 45.70 47 s_1_nF47 -0.02804930 0.001964715
## 6 45.69 47 s_1_nF47 -0.02826815 0.001745930