**Authors: Lauritsen J, Nyen B EpiData Association Denmark and Gruk/Kompetancecentret Norway.**

*v1.2 January 2009. First version v1.0 November. 2008*

The scope of this paper is NOT to give an overall introduction to definition of SPC charts (statistical process control charts), but to document the design choices taken when implementing SPC charts in EpiData Analysis. Further lists of example datasets and validation procedures are available on request and will be placed in an appendix at some point. For general information on the EpiData Software project readers are referred to http://www.epidata.dk

An SPC chart can be defined as an x-y graph, where the x represent sequence of measurements and the y value the measurement. Each x value represent a subgroup. In addition to the actual measurements shown as points connected by a line the graph contains control limits, which represent high and low percentiles of a distribution. It is customary to use 3 standard deviations (or sigma) as the upper and lower control limits, but some authors (e.g. Hart et al), suggest to vary this by total number of samples. For non-gaussian distributed data approximations or factors resembling standard deviations are used with the same purpose of finding a limit for separating within “random variaton” values and “outside expected or with sufficiently low probability” values“. but always based on the data at hand.

The implementation of SPC related methods comprises the following: run-charts, g-charts and traditional
control charts such as icharts (also called Xmr charts), pcharts, ccharts and ucharts, but also pareto charts,
which are strictly not SPC charts, but bar charts sorted by categories in descending order of magnitude. For
options, command specification and further examples readers are directed to the user manual for SPC^{1)}

Definitions and naming of SPC (statistical process control) methods and charts are unfortunately not uniform and searching in the research literature reveals a number of - sometimes conflicting – definitions, For the first roughly 50 years from earliest development in th 1930'ies by Shewhard and others the area was basicly only used in industrial production with the health sector catching up much later. Initially in the laboratory branches and since the mid 1980'ies increasingly in a number of specialities, usually related to quality improvement. The main difference of SPC methods in health care and industrial settings is the focus on varying denominators (sampling size per subsample) in health care, whereas in industry usually sampling is planned in detail and based on random sampling of a fixed number of units. The varying exposure volumen (e.g. patient bed days) for a given topic raises issues on needs for standardisation and weighting principles well known in epidemiology or evaluation of public health practice. Other medical practice terms as predictive values for tests well-known from screening of diseases also appear in the medical litterature on SPC, see references later.

Other issues have also been discussed, e.g. whether to designate deviations from random variation as assignable cause or special cause and a debate on the use of “Control” versus more positive phrases, e.g. enhancement or process optimisation. The term Statistical Process Control is maintained in the sense that ”Control” refers to the situation or process in question, not to control of the staff in charge of the process.

The theoretical basis for choosing a particular type of SPC - chart is the underlying distribution of the empirical measure in question. Table 1 gives an overview of this as used in the current implementation (v2.1.1.158) in EpiData Analysis.

**Table 1: Types of charts and connection to theoretical distributions**

Chart type | Type of data and distribution | plot on y-axis | Examples | Data collected for each subsample/point in time |
---|---|---|---|---|

RunChart | any | y: count | 1 | |

Xbar-S | ContinousGauss | y1: mean y2: sd | waiting time to procedure | > 10 per sampling point |

Xbar-R | y1: mean y2: range | 2-10 per sampling point | ||

Pchart | BinaryBinomial | y: proportion | proportion of all patients seen by each doctor | Counts of characteristic and total for each subsample. |

Uchart | CountPoisson | y: rate pr denominator | Rate pf patient falls per patient volume | Date of outcome risk volume since last outcome. |

Cchart | CountPoisson | y: count | count of falls | Counts based on a poisson process |

Ichart | Countany | y1: count y2: moving range | patient count of visits | Counts based on a binary process |

Ichart-R | ||||

Gchart | Rare eventsGeometric (left skewed) | y: time or count units since last occurrence | Days since last infection. Days since last call of acute team. | date or day number in period of outcome for each single occurrence |

pareto | categorical | y1: percent of each category as a bar y2: cumulative percent x: categories as bars | reasons for delay in start of surgery | classification of reasons in categories |

y1 + y2: Y-axis with double graphs, y1: top y2: bottom. x-axis is sequence unless stated otherwise

The user group expected to use EpiData Software for quality improvement and documentation based on SPC graphs includes a large group of very experienced professionals, but quite inexperienced in research and project management based on quantitative statistical methods.

A simplification of the decision tree for SPC related methods as shown in table 2 can therefore be an
advantage:^{2)}

**Table 2: Simplified Overview of analysis commands for use in Quality improvement and documentation**

Type of Data | Recording of | Specification of observations per time unit | Chart name | variables needed |
---|---|---|---|---|

Measurement data | data measurements at each point (subgroup) | arying subgroup, more than one observation in each subgroup | Xbar-S | measurement time |

one observation per subgroup | Ichart | measurement [time]* | ||

Count (attribute) data | Non conformities (Errors) are counted | Varying denominator | Uchart | count volume [time]* |

Constant denominator assumed | Cchart | count [time]* | ||

Proportions | counts and varying total | Pchart | count total [time]* | |

Rare Occurrences | Each incident recorded | One incident | Gchart | (time at incident)+ |

Categorical listing of reasons | Survey of reasons recorded | All observations | Pareto chart | categories [time]* |

** +: For each occurence of the rare event the date or a sequence number is recorded in the dataset**

** *: [time] is meant such that if data are just presented as a sequence of samples, then a possible difference in distance btw. two adjacent datapoints is not relevant. In such cases a sequence (1,2,3…n) will be used for the graphing of data.**

In addition to the commands mentioned already Xbar-R has been developed and will not be removed, but the chart above will form the basis for dialogs and menu's supportive of the beginners use of SPC related methods. The main area of usage for EpiData Software is the health sector, where the mentioned charts above are found most relevant.

Calculations for the chart types above are defined in the following texts:

**xbar-s:** Hart, Robertsen, Hart & Lee. Application of Variables Control Charts to Risk-adjusted Time-
ordered Healthcare Data. Q Manage Health Care. 2004: 99-119. Appendix 1. This results in varying
mean for the S chart and varying control limits for the X and the s chart.

**gchart:** Bennyan J. Number-Between g-Type Statistical Quality Control Charts for Monitoring Adverse
Events. Health Care Management Science 2001: 4: 305-318. Also see: Walberg, Frøslie, Røislen. Local
Hospital Perspective on a Nationwide outbreak of Pseudomonas Aeroginosa in Norway. Infection
Control and hospital epidemiology 2008; 29: 635-41.

**uchart, cchart, ichart, pchart:** Several sources are available, e.g.:
NIST.gov statistical handbook. Wikipedia.org .
Oakland J. Statistical Process Control.
Winkel P. ……….. Carey and Lloyd. Measuring Quality Improvement in Healthcare. appendix 1.

Special versus standard variation is at the essensse of analysis with some cautionary guiding for the user by adding tests for what is termed special causes, assignable (see Oakland) or non-random variation (Hart et al). The main issue here is the relation btw. type-1 error (designating a signal as meaningful when it is a random variation), “true warning” and type-2 error (not finding a non-random variation or special cause). Hart et al discusses this in particular in relation to health data, which are often composed or characterised by variations in sampling, e.g. different patient groups or age groups.

Currently most implementations of control-limits are based on 3 standard deviations (Sigma), but implementing total sample size varying Sigma limits, so-called t-limits will be applied , see Hart et al.3 and Walberg et al.

- Depending on distribution of the data at hand a rough guide is that 60% to 75% of the observations are within one SIGMA on each side of the central value.
- Depending on the distribution 90% to 98% of the observations are within 2 SIGMA on either side of the central value.
- Depending on the distribution 99% to 100% of the observations are within 3 sigma on either side of the central value

These figures should be borne in mind when deciding if a given distribution is within the expectation or shows non-random variation. The use of too restrictive criteria results in type 2 errors, whereas too loose criteria leads to type 1 error.

The following tests for special variation/assignable causes have been implemented in the software. Test 1 is universely accepted as the most important test, except for G-charts where some concerns on non- relevance of upper control limit this has been raised by Walberg et al, 2008. Tests are not implemented for pareto charts.

Test | description | Special comments |
---|---|---|

1 | Run Charts: Total number of runs exceeds expected number of runs. Points on median ignored in runs. Points on the centerline are disregarded. | Expected numbers based are based on a standard table. A run is defined as a sequence of one or more numbers on the same side of the centerline. |

Control Charts: An observation is outside the control limits, at 3 sigma from the centerline. | Control Charts: all other charts | |

2 | K or more points in sequence on the same side of centerline (shift in the process). Values on center line are excluded from the count. | One sequence of K or more points counts as one. E.g. K+2 in sequence: 1 occurrence.K default is 8 |

3 | K or more points decreasing or increasing in sequence (Trend). Sequential values of same size count as one. | One sequence of K or more points counts as one. E.g. K+2 in sequence: 1 occurrence.K default is 6 |

4 | K out of K+1 successive points more than 2 sigma away from the centerline. The one not fulfilling the rule cannot be the first one of the K+1. | One sequence of K+1 or more points counts as 1. E.g. K+2 in sequence: 1 occurrence.K default is 2 |

5 | One sequence of K+1 or more points on same side of centerline counts as 1. E.g. K+2 in sequence: 1 occurrence.K default is 4 |

**For all charts:** Sigma is 3 by default. When T-limits are used this will depend on the number of observations of that subseries.

Gtest: see below, only for G-charts

For certain types of charts and or combinations of options, not all tests are available. E.g. test 2-5 when ”/point“ was added. This indicates that the individual measurements (subgroups) represent different entities and a sequence makes no sense. E.g. when each point represent one hospital.

Based on the comments and clarifications above the following actual commands are available:

**Table 3: Specification and comments on the implementation of single commands**

Chart name | variables needed | example call | Special options, comments or sample test |
---|---|---|---|

Runchart | measurement | runchart lege | /t |

Xbar-S | measurement time | xbar bp visit | |

xbar beddays week | |||

Xbar-R | measurement time /range | xbar bp visit /Range | Notice that the simplified implementation suggests to replace this with Xbar-S |

Ichart | measurement [time]* | ichart lege | moving range: /MR |

ichart lege tid | |||

ichart lege dato | |||

Uchart | count volume [time]* | uchart falls patients | Multiplier: /per= |

uchart falls patients week | |||

Cchart | count [time]* | cchart falls | |

cchart falls month | |||

Pchart | count total [time]* | pchart lege total | |

pchart lege total tid | |||

Gchart | “time at incident” | gchart dato | If weight is given the time value is used the number of times given by the count (non-negative, but could be 0) /NoUL : no upper limit |

“time at incident” [/w]* | gchart day /w=count | ||

Pareto Chart | categories [/w]* | pareto cause | /w: weight variable for count of observations |

*: Elements in [] are optional .

[time]: If [time] is not given the sequence of data is supposed to represent the sampling order.

A number of options have been implemented. Of these one /xlabel can be replaced by having the [time] variable as a string variable, whereas if the [time] variable is numerical the values herein will define the x values.

/B (one or more) | will split - insert break in - the chart at that observation (-s). Value can be date “dd/mm/yyyy” or “mm/dd/yyyy” of same format as the time field. |

/f=y : | freeze control line and center line calculations on the first y points of the sequence |

/t | Perform tests 1-3 and mark in the graph |

/tx | perform one or more of test1-test5 (x= 1,2,3,4,5) |

/tx=y | perform one or more of test1-test5 (x= 1,2,3,4,5), but use the value y as limit according to the test. See default table. |

/tlimit | Use sigma limits depending on number of subgroups (see Hart et al). Show actual values in footnote to table. Default without this option is Sigma=3. |

/sl | show standard deviation (sigma) lines (1 and 2 sigma) |

/tab | Add table of counts below graph |

/nt | Hide documentation table (but mark tests if added in graph) |

/neglcl | show negative as well as positive lower control limit values |

/xlabel=var | Variable contains the labels to show at the X-axis. |

/point | Show pointer for observations, omit the connecting line. Only test 1 made if /t indicated |

/exp=x1 | Exlude observation with value of x=indicated |

/exv=y | Exclude all points with Y-value ⇒ to value (here y) |

/exz | Exclude all observations with Y= zero. Notice: /exv and /exz does not work with Gcharts |

/noinf | Exclude text below X-axis showing no central value information |

/NoUL | do not show upper control limit (only with g-chart) |

The colours are defined by the set command: **GRAPH COLOUR SPC**

With the following default sequence : **GRAPH COLOUR SPC=1307946**

Sequence: datapoints *blue*, centerline *green*, sigma3 *red*, sigma2 *fuchsia*, sigma1 *aqua*, test4 line *yellow*, test5 *skyblue*

/sz | show standard deviation (sigma) zones |

/TGmv | Apply for g-charts a test of running sequences according to Walberg, Frøslie, Røislen. Local Hospital Perspective on a Nationwide outbreak of Pseudomonas Aeroginosa in Norway. Infection Control and hospital pidemiology 2008; 29: 635-41. |

/W: | Frequency weight by variable in G-charts. Meaning replication of the measurement. (Not decided if this will be implemented) |