Zymes llc. Understanding Statistics in Science and Medicine



Knowledge-center

Understanding Statistics in Science and Medicine

Why do we use statistics?

Since it is not possible to study an entire population to compare the effects of one treatment to another, most clinical studies are conducted on small sample groups chosen to represent the population of interest (e.g. adults with diabetes). The assumption is made that if the sample group truly represents the population, then the results of a study of the sample group should reflect the results as if the entire population was studied.

The data generated from a study are generally described mathematically in 3 ways

(1) Summary of the individual results

(2) Variability among the individual results or how different one is from the others

(3) Significance - is one treatment truly different from another

Summary Information

The intention is to find one number to summarize each set of results. Typical “summary statistics” are the mean, median, and the mode. Let’s use the following set of results to illustrate these definitions:

6, 3, 13, 12, 11, 20, 6, 8, 6, 12

(a) The mean is the average; the sum of the values divided by the number of measurements in the set or 97 / 10 = 9.7. The mean is the statistic of choice when there is a normal distribution of data, like test scores on an exam. Few students get the high A; few get the low D, and most get the average (mean) score of B or C.

(b) The median is the number in the middle of a series of numbers. Here, half of the measurements are above the middle number and half are below. For this summary statistic the same numbers from the above set must be rearranged sequentially to see that the median is 11 (5 numbers above and 5 numbers below).

3, 6, 6, 6, 8, 11, 11, 12, 12, 13, 20

Median is used when there is not a normal distribution of data, where the data is lopsided (skewed) one way or the other. For example, median is a good summary statistic to use when describing family income since the few families that have very high incomes would unduly affect the use of average in a lopsided way, suggesting that families make more money than most actually do.

(c) The mode is the number that occurs most frequently in a series of numbers. From the set above, it is clear that 6 occurs three times, which is more frequently than any other number. Mode is used when the data are strongly lopsided one way or the other.

Variability

After the data are summarized, it is useful to determine the degree of variability. Some measures of variability are range, standard deviation, and standard error. Let’s use these 2 new data sets as examples.

Set 1: 3, 3, 3, 4, 4, 4, 5, 5, 5, 6

Set 2: 2, 4, 6, 8, 10, 12, 14, 16, 20

(a) The range provides the highest and lowest number in the data set. In the examples above the range for set 1 is 1 to 6, the range for set 2 is 2 to 20.

(b) Standard deviation (SD) and standard error (SE) are mathematically complex calculations. What you need to know is that large SD and SE indicate a lot of variability in the data, and small SD and SE indicate that the data values are clustered closely around the mean.

In set 1 of the example above, the mean is 4.2, the SD is 1.03 and the SE is 0.4. The numbers do not vary greatly around the mean and the SD and SE are small. Hence, there is low variability. Conversely, in set 2 of the example, the values are not clustered closely around the mean of 11, and the SD (6.1) and SE (1.9) are larger. This set has greater variability. These two data sets are graphed in the bar chart below. The height of the colored bars is represented by the mean. The bars extending above and below the mean indicate the size of the SD and SE. When the data has little variability, the error bars are short, as in set 1, whereas they are long in cases of high variability as occurs with set 2.

Significance

When studies compare treatment effects, it is necessary to have a way to determine if there is a true difference between the effects. For example, performing a coin toss 100 times and then repeating it might result in getting heads 49 times the first time and 51 times the second time. Although the measured results are different, they are not significantly different. On the other hand, if you get heads 49 times the first set and only 5 times the second set, there is a significant difference between the two sets which may be attributable to a particular cause.

The statistic generally used to determine significance is the “P value”. Typically, the p value that is deemed significant is p<0.05 (read, p is less than point 05). If the p value is less than 0.05, the difference between the treatment groups is considered statistically significant. It means that there is less than a 5% chance that the difference between the two treatment groups occurred by chance and that the difference between the two groups is most likely attributable to a treatment effect rather than chance. If the p value is greater than 0.05 (p>0.05), then it is likely that the difference between the two treatment groups occurred by chance.

Confidence intervals (CI) tell us the range within which the real effect is likely to occur. As with the 5% chance described above for the p value, the confidence interval usually provided is the 95% confidence interval. For example if a new cholesterol drug lowered cholesterol by a mean of 20 mg/dL with a 95% CI of 4 to 35, there is a 95% probability that the true effect of the drug lies within this range. If the CI does not overlap 1.0, the result is said to be significant at p<0.05. In our example the CI (4 - 35) does not overlap 1.0, therefore the treatment effect is considered to be statistically significant.

There are 2 main types of significance: statistical significance and clinical significance. Statistical significance mathematically determines that the difference between two treatment groups is unlikely to have occurred due to chance. Statistical significance is typically denoted by p values and/or confidence intervals. Clinical significance implies that the treatment effect is large enough to have a practical meaning to patients and/or health care providers. In some cases a result can be statistically significant, but have no practical importance. For example, a study may show that drug A statistically increased the life span of patients by 12 hours compared to drug B. However, 12 hours has no real clinical significance.

Power Analysis

The last concept is the “power analysis”. It is important to perform a power analysis as part of the study design because it helps determine how many subjects are needed in the study (called the sample size). If the sample size is too low, the study will not provide reliable answers. If sample size is too large, time and money will be wasted studying more patients than are needed to get a reliable answer.

Page modified: 2006-11-20 16:24:15.