---
title: "Quality Control-2"
author: "The ASTA team"
output:
  pdf_document:
    fig_caption: no
    highlight: tango
    number_section: yes
    toc: yes
  slidy_presentation:
    fig_caption: no
    highlight: tango
---

## OC curves

Usual setup:

* $m$ samples with sample size $n$.
* Process mean $\mu$ and standard deviation $\sigma$.
* Sample means have standard deviation $\frac{\sigma}{\sqrt{n}}$.
* By default we use the 3*sigma rule.

  Assume that the mean is shifted by $c\times\sigma$.

  What is the probability of NOT getting an immediate alarm?

  This is also called a **type II error**.

```{r message=FALSE,warning=FALSE}
library(qcc)
data(pistonrings)
diam    <- pistonrings$diameter
phaseI  <- matrix(diam[1:125],25,byrow=TRUE)
phaseII <- matrix(diam[126:200],15,byrow=TRUE)
xbcc    <- qcc(phaseI, std.dev = "UWAVE-SD", type = "xbar", plot = FALSE,
               newdata = phaseII, title = "qq chart: pistonrings")
```

## OC curves - example

```{r}
oc.curves(xbcc)
```
 
For the actual sample size (n=5) and in case of a process shift c=2, the error probability is around 7%.

  If we increase sample size to n=10 then we are almost sure to
  immediate detection of a shift by $2\sigma$.

## CUSUM chart

  The CUSUM chart is generally better than the xbar chart for
  detecting small shifts in the mean of a process.

  Consider the standardized residuals $$z_i=\frac{\sqrt{n}(x_i-\hat{\mu}_0)}{\hat{\sigma}}$$
  where

* $\hat{\mu}_0$ is an estimate of the process mean.
* $\hat{\sigma}$ is an estimate of the process standard deviation.


  For an in-control proces the cumulative sum (CUSUM) of residuals should vary around $zero$.

  The CUSUM chart is developed to see, if there is a drift away from zero. To that end we define 2 processes controlling for downward(D)
  respectively upward(U) drift:
  \begin{align*}
    D(i)&=\max\{0,D(i-1)-k-z_i\}\\
    U(i)&=\max\{0,U(i-1)+z_i-k\}
  \end{align*}
  where $k$ is a positive number.

## Interpretation of CUSUM chart
  Interpretation of
  \begin{align*}
    D(i)&=\max\{0,D(i-1)-k-z_i\}\\
    U(i)&=\max\{0,U(i-1)+z_i-k\}
  \end{align*}

*  If $z_i< -k$, i.e. $z_i$ is more that $k$ below
    zero, then $D$
    is increased. If this happens a number of times in a row, then $D$
    grows "big".
* If $z_i>k$, i.e. $z_i$ is more that $k$ above zero,
    then $U$
    is increased. If this happens a number of times in a row, then $U$
    grows "big".

  The process is considered out of control if $D$ or $U$ exceeds a limit
  $h$.

  In `qcc` $k$ and $h$ is specified by the arguments:

* `se.shift` is $k$, which has a default value of 1.
* `decision.interval` is $h$, which has a default value of 5.


## CUSUM chart example
```{r}
h <- cusum(phaseI, newdata = phaseII, title = "CUSUM chart: pistonrings")
```

The chart includes a plot of

* U controlling for positive drift, which is clearly happening in phase II.
* D controlling for negative drift.

## EWMA chart
  The Exponentially Weighted Moving Average (EWMA) is a statistic for monitoring the process, which averages the data in a way that gives most weight to recent data.

  The EWMA is formally defined by
  $$M_t=\lambda x_t+(1-\lambda)M_{t-1},\ \ t=1,2,\ldots,T$$
  where

* $M_0$ is the mean of some historical data.
* $x_t$ is the measurement at time $t$.
* $T$ is the length of the sampling period.
* $0 < \lambda \leq 1$ is a smoothing parameter, where $\lambda=1$ corresponds to "no memory".

  
  The influence of $x_t$ on $M_{t+s}$ is of the order $(1-\lambda)^s$,
  i.e. exponentially decreasing, which explains the term
  "Exponentially Weighted".
  
## EWMA chart

Estimated variance for the EWMA process

* $s_M^2=\frac{\lambda}{2-\lambda}s^2$
* $s$ is the standard deviation from historical data.

 
$$
\begin{aligned}
      \text{UCL:} \quad &M_0+k s_M\\
      \text{CL :} \quad &M_0\\
      \text{LCL:} \quad &M_0-k s_M
\end{aligned}
$$

Conventional choise of $k$ is 3.

## EWMA chart example

```{r}
h <- ewma(phaseI, newdata = phaseII, title = "EWMA chart: pistonrings")
```

* Observed values are "plus" and predicted values are "dots".
* Default value of smoothing is $0.2$. May be set by the argument
  `lambda`.
* Default value of $k$ is $3$. May be set by the argument
  `nsigmas`.

## Multivariate charts

* In some situations , it is important to watch the covariation between
two or more variables.
* We shall limit our considerations to two variables $y$ and $x$.
* We assume a linear regression equation for $y$
  depending on $x$.
* We have data to estimate the line and the expected deviation
  from the line.
* We have estimates of mean and standard deviation for $x$.

An outlier would deviate from the line and/or the x-mean, so we calculate

* $Zy_x$ denoting the  standardized residual from the
  regression line.
* $Zx$ denoting the standardized residual from the
  expected mean of x.

We expect that both $Zy_x$ and $Zx$ should be within the limits
$\pm2$. In order to get an overall teststatistic, we calculate
$$T^2=\frac{m-1}{m-2}Zy_x^2+Zx^2$$
where $m$ is sample size.

## Multivariate charts

  Alternatively, one might interchange the role of $x$ and $y$. But
  that does not matter. Actually it holds that
  $$T^2=\frac{m-1}{m-2}Zy_x^2+Zx^2=\frac{m-1}{m-2}Zx_y^2+Zy^2$$

*   If the process is in control, then $T^2$ has a chi-square
    distribution with 2 degrees of freedom.
*   It is critical to the process if $T^2$ is large, i.e.\ we only need
    an upper limit.

```{r}
load(url("https://asta.math.aau.dk/datasets?file=T2Example.RData"))
```

```{r}
head(X$X1, 3)
```

## Multivariate chart example
```{r}
head(X$X2, 3)
```

* `X` is a list
* `X$X1` is a matrix, where the rows are samples of variable `X1`
* Actual sample size is 4 and the number of samples is 20.

Similarly `X$X2` has samples for variable `X2`

## Multivariate chart example

```{r}
h <- mqcc(X, type = "T2")
```

When the parameters are estimated from phase I samples, then the reference distribution is not chi-square, but rather a scaled F-distribution.

## Multivariate chart example

```{r}
ellipseChart(h, show.id = TRUE)
```

Observation 10 deviates a lot from the regression line.

The acceptance area is an ellipse.

## Acceptance sampling

  Set-up:

* Production proces where we produce lots of size $N$.
* From the lot we take a sample of size $n$.
* If the number of defective items exceeds the number $c$, then
    the lot is rejected.

  We term this a $(N,n,c)$ sampling plan.
 Possible reasons for acceptance sampling:

* Testing is destructive
* The cost of 100\% inspection is very high
* 100\% inspection takes too long

## Sampling distributions

* Let $X$ denote the number of defective items in the sample.
* Let $p$ denote the fraction of defective items in the lot.
* The correct sampling distribution of  $X$ is the so called
  **hypergeometric** distribution with parameters $(N,n,p)$.


* If $N>>n$, then the sampling distribution of  $X$ is well approximated
  by the simpler **binomial** distribution with parameters $(n,p)$.

* If $N>>n$ and $p$ is smal, then the sampling distribution of
  $X$ is well approximated
  by the much simpler **poisson** distribution with parameter $np$.


## OC curve of a sampling plan

For a given sampling plan $(N,n,c)$ the probability of accepting the lot depends on

* the fraction $p$ of defective items in the lot
* the assumed sampling distribution

  We can use the function `OC2c` in the package
  `AcceptanceSampling` to determine these probabilities.

  Sampling plan: $(1000,100,2)$
```{r}
library(AcceptanceSampling)
OCbin <- OC2c(100, 2) #default binomial
OCpoi <- OC2c(100, 2, type = "poisson")
OChyp <- OC2c(100, 2, type = "hypergeom", N = 1000)
```

## OC curve of a sampling plan

```{r}
par(mfrow=c(2,2)) #division of plot window
plot(1:10, type = "n", axes = FALSE, xlab = "", ylab = "")
text(4, 5, "(N,n,c)\n(1000,100,2)", cex = 2)
xl <- c(0, 0.1)
plot(OChyp, xlim = xl, main = "hyper")
plot(OCbin, xlim = xl, main = "binom")
plot(OCpoi, xlim = xl, main = "poiss")
```

## Find sampling plan

Suppose that $N$ is fixed. If we specify 2 points on the OC curne,
then this determines $(n,c)$.

* PRP: Producer Risk Point with coordinates (p1,q1):  p1 is a
    fraction of defectives, that the producer finds acceptable, e.g.\ p1=0.01. The corresponding probability q1 of accept should then be high,
    e.g. q1=0.95.

* CRP: Consumer Risk Point with coordinates (p2,q2):  p2$>$p1 is a
    fraction of defectives, that the consumer finds unacceptable,
    e.g.\ p2=0.05.  The corresponding probability q2 of accept should
    then be low,e.g. q2=0.01.


```{r}
find.plan(PRP = c(.01,.95), CRP = c(.05,.01))[1:2]
```
Default assumption is binomial sampling.

## Find sampling plan
```{r}
plan <- find.plan(c(.01,.95), c(.05,.01), type = "hyp", N = 200)[1:2]
plan
OChyp <- OC2c(plan$n, plan$c, type = "hyp", N = 200, pd = c(.01,.05))
attr(OChyp, "paccept")
```

We cannot have an exact match of the required values $(0.95,0.01)$
since $n$ and $c$ must be integers.

## Double sampling

Let $0\leq c_1 < r_1$ be integers.

Furthermore, $c_2$ is an integer such that $c_1<c_2$.

* $x_1$: number of defectives in an initial sample of size $n_1$.
* If $x_1\leq c_1$ accept the lot.
* If $r_1\leq x_1$ reject the lot.
* If $c_1< x_1<r_1$: Take a second sample of size $n_2$ and let
    $x_2$ be the number of defectives.
* If $x_1+x_2\leq c_2$ accept the lot. Otherwise reject.

  This is known as a **double sampling** plan.

## OC curve of a double sampling plan
  Determining the OC curve of a double sampling plan requires input of
  $n=c(n_1,n_2)$, $c=c(c_1,c_2)$ and $r=c(r_1,r_2)$, where
  $r_2=c_2+1$.

```{r}
x <- OC2c(c(125,125), c(1,4), c(4,5), pd = seq(0,0.1,0.001))
x
```

## OC curve of a double sampling plan

```{r}
plot(x)
```