Statistical Terms

In the posts ahead, I will delve into clinical studies. Those posts are full of statistical terms like sensitivity, specificity, positive predictive value, and more. I encourage you to bookmark this post, as I'll reference the terms in it frequently.

The Basics

Many medical tests produce one of four possible outcomes. I'll use a concrete example throughout this post. Imagine a school of 10,000 students, 100 of whom have strep throat. The school nurse has a rapid strep test she can give to every student. There are four possible outcomes.

A true positive (TP) means the student has strep and the test correctly says positive.
A false positive (FP) means the student does not have strep, but the test incorrectly says positive.
A true negative (TN) means the student does not have strep, and the test correctly says negative.
A false negative (FN) means the student has strep, but the test incorrectly says negative.

Sensitivity (True Positive Rate)

Of all the people who actually have the disease, what percentage does the test correctly identify?

Sensitivity = True Positives ÷ (True Positives + False Negatives)

In the school example, let's say 100 students actually have strep, and the test correctly identifies 80 of them but misses 20.

Sensitivity = 80 ÷ (80 + 20) = 80%

Sensitivity tells you how good the test is at catching the disease when it's there, and a test with high sensitivity misses very few cases (good at ruling out disease), while a test with low sensitivity lets a lot of cases slip through.

Specificity (True Negative Rate)

Of all the people who do not have the disease, what percentage does the test correctly identify as negative?

Specificity = True Negatives ÷ (True Negatives + False Positives)

In the school example, 9,900 students don't have strep. Let's say the test correctly identifies 9,850 of them as negative and incorrectly flags 50 as positive.

Specificity = 9,850 ÷ (9,850 + 50) = 99.5%

Specificity tells you how good the test is at leaving healthy people alone, and a test with high specificity produces very few false alarms, while a test with low specificity sends a lot of healthy people down unnecessary pathways.

Positive Predictive Value (PPV)

Of all people who tested positive, how many actually have the disease?

PPV = True Positives ÷ (True Positives + False Positives)

In the school example, the test produces 80 true positives and 50 false positives (130 total positive results).

PPV = 80 ÷ 130 = 61.5%

Unlike sensitivity and specificity, PPV depends heavily on how common the disease is in the population being tested, a concept called prevalence.

Negative Predictive Value (NPV)

Of all people who tested negative, how many are actually disease-free?

NPV = True Negatives ÷ (True Negatives + False Negatives)

In the school example, the test produces 9,850 true negatives and 20 false negatives - 9,870 total negative results.

NPV = 9,850 ÷ 9,870 = 99.8%.

Like PPV, NPV also depends on disease prevalence.

Number Needed to Screen (NNS)

Of all people screened, how many must be tested to find one case?

NNS = 1 ÷ (Prevalence × Sensitivity)

In the school example, with 1% strep prevalence and 80% sensitivity, you'd need to test 125 students to find one true strep case.

NNS = 1 ÷ (0.01 × 0.80) = 125

NNS tells you how efficiently a screening test uses resources. A lower NNS means the test finds cases efficiently, while a higher NNS means you're testing a lot of people for each case found.

Confidence Interval (CI)

A confidence interval is a range of values that likely contains the true value of a measurement, given the uncertainty inherent in studying a sample rather than the entire population.

In a different school example, let's say you survey 200 students about whether they like the cafeteria, and 60% say yes. But if you surveyed all 10,000 students, the true number might differ. A 95% confidence interval might be 53% to 67%, meaning you can be 95% confident the real answer falls somewhere in that range. A narrow confidence interval (such as 50% to 52%) means the study had a lot of data and the estimate is precise. A wide confidence interval (such as 20% to 80%) means the study was small or variable, and the true value could be substantially different from the reported number.

Lead Time Bias

Lead time bias is the appearance that screening improves survival, but, instead, just moves the date of diagnosis earlier without changing the date of death.

In another different school example, imagine a student is going to get the flu on March 1 and recover on March 10, regardless of any intervention. If you test the student on February 25 and find the flu virus five days early, it looks like the student “survived” the flu for 15 days instead of 10. Survival time measured from the date of diagnosis is inherently misleading for screening tests. A test that detects cancer one year earlier will usually make survival statistics look better. This is why the gold standard for proving a screening test works is a reduction in mortality and not an improvement in survival from diagnosis.

Connecting the Dots

In the posts ahead, starting with deep dives into key studies, I'll be using these terms frequently. If you find yourself needing a refresher on the statistical terms as you go through the newsletter, come back to this post.

Disclosures. Before Stage One is written by Michael LaPelusa, MD in his personal capacity. Views expressed are his own and do not represent the views of any institution. Content is provided for informational purposes only and should not be relied upon as medical, legal, business, investment, or tax advice. Nothing here is a recommendation to undergo, avoid, prescribe, or order any medical test or treatment, nor a recommendation to buy or sell any security. Readers should consult their own physicians and advisers regarding clinical, financial, and legal decisions. The author does not hold positions in any company discussed unless explicitly disclosed in the post. See full disclosures.