Specification of EpiData Analysis Implementation of SPC and related methods.

Authors: Lauritsen J, Nyen B EpiData Association Denmark and Gruk/Kompetancecentret Norway.
v1.2 January 2009. First version v1.0 November. 2008

The scope of this paper is NOT to give an overall introduction to definition of SPC charts (statistical process control charts), but to document the design choices taken when implementing SPC charts in EpiData Analysis. Further lists of example datasets and validation procedures are available on request and will be placed in an appendix at some point. For general information on the EpiData Software project readers are referred to http://www.epidata.dk

An SPC chart can be defined as an x-y graph, where the x represent sequence of measurements and the y value the measurement. Each x value represent a subgroup. In addition to the actual measurements shown as points connected by a line the graph contains control limits, which represent high and low percentiles of a distribution. It is customary to use 3 standard deviations (or sigma) as the upper and lower control limits, but some authors (e.g. Hart et al), suggest to vary this by total number of samples. For non-gaussian distributed data approximations or factors resembling standard deviations are used with the same purpose of finding a limit for separating within “random variaton” values and “outside expected or with sufficiently low probability” values“. but always based on the data at hand.

The implementation of SPC related methods comprises the following: run-charts, g-charts and traditional control charts such as icharts (also called Xmr charts), pcharts, ccharts and ucharts, but also pareto charts, which are strictly not SPC charts, but bar charts sorted by categories in descending order of magnitude. For options, command specification and further examples readers are directed to the user manual for SPC1)

Definitions and naming of SPC (statistical process control) methods and charts are unfortunately not uniform and searching in the research literature reveals a number of - sometimes conflicting – definitions, For the first roughly 50 years from earliest development in th 1930'ies by Shewhard and others the area was basicly only used in industrial production with the health sector catching up much later. Initially in the laboratory branches and since the mid 1980'ies increasingly in a number of specialities, usually related to quality improvement. The main difference of SPC methods in health care and industrial settings is the focus on varying denominators (sampling size per subsample) in health care, whereas in industry usually sampling is planned in detail and based on random sampling of a fixed number of units. The varying exposure volumen (e.g. patient bed days) for a given topic raises issues on needs for standardisation and weighting principles well known in epidemiology or evaluation of public health practice. Other medical practice terms as predictive values for tests well-known from screening of diseases also appear in the medical litterature on SPC, see references later.

Other issues have also been discussed, e.g. whether to designate deviations from random variation as assignable cause or special cause and a debate on the use of “Control” versus more positive phrases, e.g. enhancement or process optimisation. The term Statistical Process Control is maintained in the sense that ”Control” refers to the situation or process in question, not to control of the staff in charge of the process.

Relation to statistical distribution

The theoretical basis for choosing a particular type of SPC - chart is the underlying distribution of the empirical measure in question. Table 1 gives an overview of this as used in the current implementation (v2.1.1.158) in EpiData Analysis.

Table 1: Types of charts and connection to theoretical distributions

Chart type Type of data and distribution plot on y-axis Examples Data collected for each subsample/point in time
RunChart any y: count 1
Xbar-S Continous
y1: mean
y2: sd
waiting time to procedure > 10 per sampling point
Xbar-R y1: mean
y2: range
2-10 per sampling point
Pchart Binary
y: proportion proportion of all patients seen by each doctor Counts of characteristic and total for each subsample.
Uchart Count
y: rate pr denominator Rate pf patient falls per patient volume Date of outcome risk volume since last outcome.
Cchart Count
y: count count of falls Counts based on a poisson process
Ichart Count
y1: count
y2: moving range
patient count of visits Counts based on a binary process
Gchart Rare events
Geometric (left skewed)
y: time or count units since last occurrence Days since last infection. Days since last call of acute team. date or day number in period of outcome for each single occurrence
pareto categorical y1: percent of each category as a bar
y2: cumulative percent
x: categories as bars
reasons for delay in start of surgery classification of reasons in categories

y1 + y2: Y-axis with double graphs, y1: top y2: bottom. x-axis is sequence unless stated otherwise

The user group expected to use EpiData Software for quality improvement and documentation based on SPC graphs includes a large group of very experienced professionals, but quite inexperienced in research and project management based on quantitative statistical methods.

A simplification of the decision tree for SPC related methods as shown in table 2 can therefore be an advantage:2)

Table 2: Simplified Overview of analysis commands for use in Quality improvement and documentation

Type of Data Recording of Specification of observations per time unit Chart name variables needed
Measurement data data measurements at each point (subgroup) arying subgroup, more than one observation in each subgroup Xbar-S measurement time
one observation per subgroup Ichart measurement [time]*
Count (attribute) data Non conformities (Errors) are counted Varying denominator Uchart count volume [time]*
Constant denominator assumed Cchart count [time]*
Proportions counts and varying total Pchart count total [time]*
Rare Occurrences Each incident recorded One incident Gchart (time at incident)+
Categorical listing of reasons Survey of reasons recorded All observations Pareto chart categories [time]*

+: For each occurence of the rare event the date or a sequence number is recorded in the dataset
*: [time] is meant such that if data are just presented as a sequence of samples, then a possible difference in distance btw. two adjacent datapoints is not relevant. In such cases a sequence (1,2,3…n) will be used for the graphing of data.

In addition to the commands mentioned already Xbar-R has been developed and will not be removed, but the chart above will form the basis for dialogs and menu's supportive of the beginners use of SPC related methods. The main area of usage for EpiData Software is the health sector, where the mentioned charts above are found most relevant.

Calculations for the chart types above are defined in the following texts:

xbar-s: Hart, Robertsen, Hart & Lee. Application of Variables Control Charts to Risk-adjusted Time- ordered Healthcare Data. Q Manage Health Care. 2004: 99-119. Appendix 1. This results in varying mean for the S chart and varying control limits for the X and the s chart.
gchart: Bennyan J. Number-Between g-Type Statistical Quality Control Charts for Monitoring Adverse Events. Health Care Management Science 2001: 4: 305-318. Also see: Walberg, Frøslie, Røislen. Local Hospital Perspective on a Nationwide outbreak of Pseudomonas Aeroginosa in Norway. Infection Control and hospital epidemiology 2008; 29: 635-41.
uchart, cchart, ichart, pchart: Several sources are available, e.g.: NIST.gov statistical handbook. Wikipedia.org . Oakland J. Statistical Process Control. Winkel P. ……….. Carey and Lloyd. Measuring Quality Improvement in Healthcare. appendix 1.

Non-Random variation.

Special versus standard variation is at the essensse of analysis with some cautionary guiding for the user by adding tests for what is termed special causes, assignable (see Oakland) or non-random variation (Hart et al). The main issue here is the relation btw. type-1 error (designating a signal as meaningful when it is a random variation), “true warning” and type-2 error (not finding a non-random variation or special cause). Hart et al discusses this in particular in relation to health data, which are often composed or characterised by variations in sampling, e.g. different patient groups or age groups.

Currently most implementations of control-limits are based on 3 standard deviations (Sigma), but implementing total sample size varying Sigma limits, so-called t-limits will be applied , see Hart et al.3 and Walberg et al.

  1. Depending on distribution of the data at hand a rough guide is that 60% to 75% of the observations are within one SIGMA on each side of the central value.
  2. Depending on the distribution 90% to 98% of the observations are within 2 SIGMA on either side of the central value.
  3. Depending on the distribution 99% to 100% of the observations are within 3 sigma on either side of the central value

These figures should be borne in mind when deciding if a given distribution is within the expectation or shows non-random variation. The use of too restrictive criteria results in type 2 errors, whereas too loose criteria leads to type 1 error.

The following tests for special variation/assignable causes have been implemented in the software. Test 1 is universely accepted as the most important test, except for G-charts where some concerns on non- relevance of upper control limit this has been raised by Walberg et al, 2008. Tests are not implemented for pareto charts.

Test description Special comments
1 Run Charts: Total number of runs exceeds expected number of runs. Points on median ignored in runs. Points on the centerline are disregarded. Expected numbers based are based on a standard table. A run is defined as a sequence of one or more numbers on the same side of the centerline.
Control Charts: An observation is outside the control limits, at 3 sigma from the centerline. Control Charts: all other charts
2 K or more points in sequence on the same side of centerline (shift in the process). Values on center line are excluded from the count. One sequence of K or more points counts as one. E.g. K+2 in sequence: 1 occurrence.
K default is 8
3 K or more points decreasing or increasing in sequence (Trend). Sequential values of same size count as one. One sequence of K or more points counts as one. E.g. K+2 in sequence: 1 occurrence.
K default is 6
4 K out of K+1 successive points more than 2 sigma away from the centerline. The one not fulfilling the rule cannot be the first one of the K+1. One sequence of K+1 or more points counts as 1. E.g. K+2 in sequence: 1 occurrence.
K default is 2
5 One sequence of K+1 or more points on same side of centerline counts as 1. E.g. K+2 in sequence: 1 occurrence.
K default is 4

For all charts: Sigma is 3 by default. When T-limits are used this will depend on the number of observations of that subseries.

Gtest: see below, only for G-charts

For certain types of charts and or combinations of options, not all tests are available. E.g. test 2-5 when ”/point“ was added. This indicates that the individual measurements (subgroups) represent different entities and a sequence makes no sense. E.g. when each point represent one hospital.

Actual command formulation

Based on the comments and clarifications above the following actual commands are available:

Table 3: Specification and comments on the implementation of single commands

Chart name variables needed example call Special options, comments or sample test
Runchart measurement runchart lege /t
Xbar-S measurement time xbar bp visit
xbar beddays week
Xbar-R measurement time /range xbar bp visit /Range Notice that the simplified implementation suggests to replace this with Xbar-S
Ichart measurement [time]* ichart lege moving range: /MR
ichart lege tid
ichart lege dato
Uchart count volume [time]* uchart falls patients Multiplier: /per=
uchart falls patients week
Cchart count [time]* cchart falls
cchart falls month
Pchart count total [time]* pchart lege total
pchart lege total tid
Gchart “time at incident” gchart dato If weight is given the time value is used the number of times given by the count (non-negative, but could be 0) /NoUL : no upper limit
“time at incident” [/w]* gchart day /w=count
Pareto Chart categories [/w]* pareto cause /w: weight variable for count of observations

*: Elements in [] are optional .
[time]: If [time] is not given the sequence of data is supposed to represent the sampling order.


A number of options have been implemented. Of these one /xlabel can be replaced by having the [time] variable as a string variable, whereas if the [time] variable is numerical the values herein will define the x values.

Regular options

/B (one or more) will split - insert break in - the chart at that observation (-s). Value can be date “dd/mm/yyyy” or “mm/dd/yyyy” of same format as the time field.
/f=y : freeze control line and center line calculations on the first y points of the sequence
/t Perform tests 1-3 and mark in the graph
/tx perform one or more of test1-test5 (x= 1,2,3,4,5)
/tx=y perform one or more of test1-test5 (x= 1,2,3,4,5), but use the value y as limit according to the test. See default table.
/tlimit Use sigma limits depending on number of subgroups (see Hart et al). Show actual values in footnote to table. Default without this option is Sigma=3.
/sl show standard deviation (sigma) lines (1 and 2 sigma)
/tab Add table of counts below graph
/nt Hide documentation table (but mark tests if added in graph)
/neglcl show negative as well as positive lower control limit values
/xlabel=var Variable contains the labels to show at the X-axis.
/point Show pointer for observations, omit the connecting line. Only test 1 made if /t indicated
/exp=x1 Exlude observation with value of x=indicated
/exv=y Exclude all points with Y-value ⇒ to value (here y)
/exz Exclude all observations with Y= zero. Notice: /exv and /exz does not work with Gcharts
/noinf Exclude text below X-axis showing no central value information
/NoUL do not show upper control limit (only with g-chart)

Colouring of lines

The colours are defined by the set command: GRAPH COLOUR SPC
With the following default sequence : GRAPH COLOUR SPC=1307946
Sequence: datapoints blue, centerline green, sigma3 red, sigma2 fuchsia, sigma1 aqua, test4 line yellow, test5 skyblue

Not implemented yet (or not decided):

/sz show standard deviation (sigma) zones
/TGmv Apply for g-charts a test of running sequences according to Walberg, Frøslie, Røislen. Local Hospital Perspective on a Nationwide outbreak of Pseudomonas Aeroginosa in Norway. Infection Control and hospital pidemiology 2008; 29: 635-41.
/W: Frequency weight by variable in G-charts. Meaning replication of the measurement. (Not decided if this will be implemented)
1) Nyen B, Lauritsen J. Brukerveiledning for SPC-modulen i EpiData Analysis. Gruk & EpiData Association, latest ed. See www.gruk.no
2) Partially based on recommendations by Carey RG. Improving Healthcare with Control Charts. ASQ Quality Press, Milwaukee, Wisconsin, 2003. ISBN 0-87389-562-2. (page 20)
techdocs/analysis/spcimplementation.txt · Last modified: 2011/12/13 09:47 by torsten.bonde.christiansen
Recent changes RSS feed Debian Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki