---
title: "Quality Control"
author: "The ASTA team"
output:
  pdf_document:
    fig_caption: no
    highlight: tango
    number_section: yes
    toc: yes
  slidy_presentation:
    fig_caption: no
    highlight: tango
  html_document:
    toc: yes
---

```{r include=FALSE}
library(qcc)
library(mosaic)
```

## Outline

* **Quality control**

* **Continuous process variable**

* **Binomial process variable**

* **Poisson process variable**


# Quality control

## Quality control chart
Control charts are used to routinely monitor quality.

Two major types:

* **univariate control**: a graphical display (chart) of one quality
  characteristic
* **multivariate control**: a graphical display of a statistic that
  summarizes or represents more than one quality characteristic

The control chart shows

* the value of the quality characteristic versus the sample number
  or versus time
* a **center line** (CL) that represents the mean value for
  the in-control process
* an **upper control limit** (UCL) and a **lower control limit** (LCL)


The control limits are chosen so that almost all of the data points
will fall within these limits **as long as the process remains in-control**.

## Example

```{r message=FALSE}
library(qcc)
data(pistonrings)
head(pistonrings,3)
```

Piston rings for an automotive engine are produced by a forging
process. The inside diameter of the rings manufactured by the process
is measured on 25 samples(`sample=1,2,..,25`), each of size 5, for the
control phase I (`trial=TRUE`),
when preliminary samples from a process being considered 'in-control'
are used to construct control charts. Then, further 15 samples, again
each of size 5, are obtained for phase II (`trial=FALSE`).

Reference:

Montgomery, D.C. (1991) Introduction to Statistical Quality Control,
2nd ed, New York, John Wiley & Sons, pp. 206-213


## Example

```{r echo=FALSE}
phaseI  <- pistonrings$diameter[1:125]
phaseII <- pistonrings$diameter[126:200]
h <- qcc(phaseI, std.dev = "SD", type = "xbar.one",
         newdata = phaseII, title = "qq chart: pistonrings")
```

We shall treat different methods for determining LCL,CL and UCL. In
that respect, it is crucial that we have

* **phase I data**, where the process is in-control.
* These data are used to determine LCL,CL and UCL.

## The simple six sigma model

Assume that measurements

* is a sample, i.e\ they are independent
* they have a normal distribution
* we know the mean $\mu_0$ and standard deviation $\sigma_0$.

  In this case we dont need phase I data.

* CL=$\mu_0$.
* LCL=$\mu_0-k\sigma_0$.
* UCL=$\mu_0+k\sigma_0$.

 The only parameter to determine is $k$.

 We dont want to give a lot of false warnings, and a popular choise is

* k=3, known as the 3*sigma rule.
* The probability of a measurement outside the control limits is
   then 0.27%, when the proces is in-control.

This means that the span of allowable variation is $6\sigma_0$.

The
concept ``Six Sigma'' has become a mantra in many industrial communities.

## Average Run Length (ARL)

  Let $\mathtt{pOut}$ denote the probability that a measurement is
  outside the control limits. On average this means that we need
  $1/\mathtt{pOut}$ observations before we get an outlier.

  This is known as the *the Average Run Length*:
  $$AVL=\frac{1}{\mathtt{pOut}}$$
  An in-control process with the 3*sigma rule has AVL
```{r}
round(1/(2*pdist("norm", -3, plot = FALSE)))
```
  An in-control process with AVL=500 has k*sigma rule, where k equals
```{r}
-qdist("norm", (1/2)*(1/500), plot = FALSE)
```

## Types of quality control charts.

Depending on the type of control variable, there are various
types of control charts.

+-------------+--------------+---------------+-------------+
| chart       | distribution | statistic     | example     |
+=============+==============+===============+=============+
| xbar        | normal       | mean          | means of a continuous process variable |
+-------------+--------------+---------------+-------------+
| S           | normal       | standard      | standard deviations of a \ |
|             |              | deviation     | continuous process variable |
+-------------+--------------+---------------+-------------+
| R           | normal       | range         | ranges of a continuous\ |
|             |              |               | process variable |
+-------------+--------------+---------------+-------------+
| p           | binomial     | proportion    | percentage of faulty items |
+-------------+--------------+---------------+-------------+
| c           | poisson      | count         | number of faulty items\ |
|             |              |               | during a workday |
+-------------+--------------+---------------+-------------+


#Continuous process variable

## Continuous process variable

Phase I data:

*  $m$ samples with $n$ measurements in each sample.
* For sample $i=1,2,\ldots m$ calculate mean $\bar{x}_i$ and
standard deviation $s_i$.
* Calculate $$\bar{x}=\frac{1}{m}\sum_{i=1}^m \bar{x}_i \quad\text{and}\quad
\bar{s}=\frac{1}{m}\sum_{i=1}^m s_i$$

  When the sample is normal, it can be shown that $\bar{s}$ is a biased
  estimate of the true standard deviation $\sigma$:

* $E(\bar{s})=c_4(n)\sigma$
* $c_4(n)$ is tabulated in textbooks and available in the `qcc` package.

  Unbiased estimate of $\sigma$:
  $$\hat{\sigma}_1=\frac{\bar{s}}{c_4(n)}$$
  Furthermore $\bar{s}$ has estimated standard error
  $$se(\bar{s})=\bar{s}\frac{\sqrt{1-c_4(n)^2}}{c_4(n)}$$

## xbar chart
  
$$
\begin{aligned}
   \text{UCL:}\quad &\bar{x}+3\frac{\hat{\sigma}_1}{\sqrt{n}}\\
   \text{CL:}\quad &\bar{x}\\
   \text{LCL:}\quad &\bar{x}-3\frac{\hat{\sigma}_1}{\sqrt{n}}
\end{aligned}
$$
This corresponds to

* The probability of a measurement outside the control limits is 0.27%.

  If we want to change this probability, we need another z-score. E.g
  if we want to lower this probability to 0.1%, then $3$ should be
  substituted by $3.29$.
  
## Example

```{r eval=FALSE}
phaseI  <- matrix(pistonrings$diameter[1:125]  , nrow=25, byrow=TRUE)
phaseII <- matrix(pistonrings$diameter[126:200], nrow=25, byrow=TRUE)
h <- qcc(phaseI, type = "xbar", std.dev = "UWAVE-SD",
         newdata = phaseII, title = "xbar chart: pistonrings")
```

* `phaseI` is a matrix with $m=25$ rows, where each row is a sample
of size $n=5$.
* Similarly `phaseII` has 15 samples.

The function `qcc` calculates the necessary statistics and optionally makes a plot.

* `phaseI` and `type=` are the only arguments required.
* We want that the limits are based on the unweighted average of standard
  deviations - `UWAVE-SD`. This is not the default.
* We also want to evaluate the phase II data: `newdata=phaseII`.
* Optionally, we can specify the title on the plot.

## Example

```{r echo=FALSE}
phaseI  <- matrix(pistonrings$diameter[1:125]  , nrow = 25,byrow=TRUE)
phaseII <- matrix(pistonrings$diameter[126:200], nrow = 15,byrow=TRUE)
h <- qcc(phaseI, std.dev = "UWAVE-SD", type = "xbar",
         newdata = phaseII, title = "xbar chart: pistonrings")
```

Besides limits we are also told whether the process is above/below CL
for 7 or more consecutive samples (yellow dots).

`run.length=7` is default, but may be changed. If we e.g. want
this to happen with probability 0.2%, then we specify
`run.length=10`.

## S chart: Monitoring variability
  
In most situations, it is crucial to monitor the process mean.

But it may also be a problem if the variability in "quality" gets too high.

In that respect, it is relevant to monitor the standard deviation,
which is done by  the S-chart:

$$
\begin{aligned}
   \text{UCL:}\quad&\bar{s}+3se(\bar{s})\\
   \text{CL:}\quad &\bar{s}\\
   \text{LCL:}\quad &\bar{s}-3se(\bar{s})\\
    &se(\bar{s})=\bar{s}\frac{\sqrt{1-c_4(n)^2}}{c_4(n)}
\end{aligned}
$$

Where $3$ may be substituted by some other z-score depending on
the required confidence level.
```{r eval=FALSE}
h <- qcc(phaseI,type="S", newdata=phaseII, title="S chart: pistonrings")
```

## S chart example
  
```{r echo=FALSE}
h <- qcc(phaseI,type="S", newdata=phaseII, title="S chart: pistonrings")
```

Remark that the plot does not allow values below zero.

Quite sensible when we are talking about standard deviations.

## R chart: Range statistics

  If the sample size is relatively small ($n\leq 10$), it is custom
  to use the range $R$ instead of the standard deviation. The range of a
  sample is simply the difference between the largest and smallest
  observation.

  When the sample is normal, it can be shown that:

* $E(\bar{R})=d_2(n)\sigma$, where $\bar{R}$ is the average of the $m$
    sample ranges.
* $d_2(n)$ is tabulated in textbooks and available in the
    `qcc` package.

  Unbiased estimate of $\sigma$:
  $$\hat{\sigma}_2=\frac{\bar{R}}{d_2(n)}$$
  Furthermore $\bar{R}$ has estimated standard error
  $$se(\bar{R})=\bar{R}\frac{d_3(n)}{d_2(n)}$$
  $d_3(n)$ is tabulated in textbooks and available in the
    `qcc` package.

## Charts based on R
   xbar chart based on $\bar{R}$:
$$
\begin{aligned}
   \text{UCL:}\quad&\bar{x}+3\frac{\hat{\sigma}_2}{\sqrt{n}}\\
   \text{CL:}\quad &\bar{x}\\
   \text{LCL:}\quad &\bar{x}-3\frac{\hat{\sigma}_2}{\sqrt{n}}
\end{aligned}
$$
  This is actually the default in the `qcc` package.

  R chart to monitor variability:
$$
\begin{aligned}
   \text{UCL:}\quad & \bar{R}+3se(\bar{R})\\
   \text{CL:}\quad & \bar{R}\\
   \text{LCL:}\quad & \bar{R}-3se(\bar{R})\\
\end{aligned}
$$

## R chart example

```{r}
h <- qcc(phaseI, type="R", newdata=phaseII, title="R chart: pistonrings")
```


# Binomial process variable

## Binomial variation

Let us suppose that the production process operates in a stable
manner such that

* the probability that an item is defect is $p$.
* successive items produced are independent

  In a random sample of n items, the number D of defective items
  follows a binomial distribution with parameters $n$ and $p$.

  Unbiased estimate of $p$: $$\hat{p}=\frac{D}{n}$$
  which has standard error $$se(\hat{p})=\sqrt{\frac{p(1-p)}{n}}$$

## p chart
  Data from phase I:

* $m$ samples with estimated proportions $\hat{p}_i,\ \ i=1,\ldots,m$
* $\bar{p}$ is the average of the estimated proportions.

 p chart:
$$
\begin{aligned}
   \text{UCL:}\quad & \bar{p}+3\sqrt{\frac{\bar{p}(1-\bar{p})}{n}}\\
   \text{CL:}\quad & \bar{p}\\
   \text{LCL:}\quad & \bar{p}-3\sqrt{\frac{\bar{p}(1-\bar{p})}{n}}\\
\end{aligned}
$$

## Example

```{r message=FALSE}
data(orangejuice)
head(orangejuice, 3)
```

Production of orange juice cans.

* The data were collected as 30 samples of 50 cans.
* The number of defective cans `D` were observed.
* After the first 30 samples, a machine adjustment was made.
* Then further 24 samples were taken from the process.


```{r message=FALSE,eval=FALSE}
with(orangejuice,
     qcc(D[trial], sizes=size[trial], type="p",
         newdata=D[!trial], newsizes=size[!trial]))
```

## Example

```{r echo=FALSE}
h <- with(orangejuice,
          qcc(D[trial], sizes=size[trial], type="p",
              newdata=D[!trial], newsizes=size[!trial]))
```

The machine adjustment after sample 30 has had an obvious effect.

The chart should be recalibrated.

# Poisson process variable
## Poisson variation
  Let us suppose that the production process operates in a stable
  manner such that

* defective items are produced at a constant rate

  The number D of defective items over a time interval of some fixed length
  follows a poisson distribution with mean value $c$.

  Unbiased estimate of $c$: $$\hat{c}=D$$
  which has standard error $$se(\hat{c})=\sqrt{c}$$

## c chart

Data from phase I:

* $m$ sampling periods with mean estimates $\hat{c}_i,\ \ i=1,\ldots,m$
* $\bar{c}$ is the average of the estimated means.

c chart:
$$
\begin{aligned}
   \text{UCL:}\quad & \bar{c}+3\sqrt{\bar{c}}\\
   \text{CL:}\quad & \bar{c}\\
   \text{LCL:}\quad & \bar{c}-3\sqrt{\bar{c}}\\
\end{aligned}
$$