I’m a Physiotherapist not a Statistician!

Do you remember those two terms we all learned back in physio school, sensitivity and specificity? Well you don't need to be a statistician to understand the importance of test sensitivity and specificity; in fact, even a basic understanding of these two concepts will enhance a physiotherapist's (PT) clinical decisions and outcomes. So today we will discuss what properties make up a good physical exam test, why they matter, and how it looks in clinical practice.

On day one, we start with a screen for red and yellow flags to ensure our patient belongs in our clinic, we're also determining if they require co-treatment with another clinician. Once we decide the patient’s symptoms can be managed by a PT, we move on to generate a hypothesis as to what and why some movements and tissues have become symptomatic. From here we go on to the physical exam where we discover and learn about how this patient moves and responds to active or passive movements and stresses. The intention behind it is to either corroborate or negate our hypothesis.

So what kinds of tests are PTs using in the physical exam these days? Traditionally, PTs use active and passive tests designed to uncover impairments and provoke symptoms under the assumption that these tests are stressing specific tissues. Further, it is from the combination of subjective findings and these physical tests where a diagnosis or clinical impression is formed. By now, however, most of us are aware of the substantial body of evidence suggesting that our special tests are not specific to one tissue. The majority of the clinical tests we use have a diagnostic accuracy that isn't any better than chance. Further, the proposition of labeling our patients with a biomedical diagnosis has the potential to do more harm than good, especially when we can't validly say what the cause of their symptoms are. I'd like to raise an important question, do we always need a specific diagnosis? It's worth a think. It's also worth listening to this BJSM podcast found here. But I digress. Due to the recent scrutiny on special tests, we have seen a paradigm shift in our physical exams... Enter from stage left the Symptom Modification Procedure (SMP).

Figure 1: an example of SMP - the clinician assists scapular upward rotation in effort to reduce pain and increase range of motion during shoulder flexion.

Figure 1: an example of SMP - the clinician assists scapular upward rotation in effort to reduce pain and increase range of motion during shoulder flexion.

The SMP works something like this: the clinician finds a symptomatic movement and then attempts to reduce symptoms and or increase function using education, postural cues, manual techniques or voluntary muscle activation. Essentially, the examination becomes the treatment, and vice versa. So if the clinician finds something that reduces symptoms, they can use this as a teaching and/or treating point. This demonstrates to the patient that their symptoms are malleable, and it provides more targeted and meaningful exercises. Some camps suggest SMP will increase adherence to an exercise program and constructively challenge beliefs that their pain is unmodifiable. Other folks believe it removes locus of control and reinforces thought viruses which impacts self-efficacy. For more on how this works, and specifically for shoulder pain, check out some work by shoulder expert Jeremy Lewis found here. For a more critical discussion on SMP check out Adam Meakins recent blog found here

So with the emergence of the SMP should we completely abandon special tests? I would argue no. Even though special tests do not tell us which tissue has the issue it is a common tongue within and across professions; additionally, many of these tests can also be used to prognosticate. I'd also be remiss if I didn't admit that there are some special tests that are actually pretty darn good and powerful. And a good test, one with strong psychometric properties, helps increase the probability that a pain-generating tissue is coming from a particular body region. From this, clinicians gain more objective data to inform a clinical impression and the treatment soon to follow.

Now we all want to use high yield tests right? So how are we to determine if a test is worth using? Arguably, this could be accomplished by considering the following two terms: sensitivity and specificity. There are other test properties to consider (i.e intra and inter-rater reliability, positive and negative predictive values and likelihood ratios) but sensitivity and specificity are simple and can provide any clinician with some insight into the diagnostic accuracy of a test. And when I say diagnostic accuracy I mean the test lets us know, with some degree of certainty, if a condition or symptomatic tissue is absent or present in that body region being tested. So let’s start with a quick refresher on sensitivity and specificity and dive into why this matters in clinical practice.

In the world of physical therapy, the process of hypothesis generation begins with the history. From their story and responses to targeted questions we determine a subjective estimate of how probable the patient in front of us has a suspected condition - this is known as our pre-test probability. Next we conduct the physical exam. Perhaps we start with screening tests to rule out the more serious pathologies and areas we believe are not part of the problem. From there we get to the good stuff: we start the process of ruling in the suspected condition with a few good tests. Once testing is complete we re-evaluate the shift in probability of the suspected condition being absent or present - this is our post-test probability. So what tests do you use to provide you with confidence that the symptoms are indeed coming from a particular body region? What’s their diagnostic accuracy? Do your exams attempt to rule out and rule in a condition? Let’s begin to discuss special test sensitivity using an example.

The Ottawa Ankle Rule is a highly effective and a simple algorithm to identify those patients who require radiography after an ankle injury. This algorithm has a sensitivity and specificity of about 100% and 40% respectively. First, a highly sensitive test is most helpful to the clinician when the test result is negative. Why? Well a sensitive test will identify the proportion of patients with the condition or disease with a positive result. If a test is 100% sensitive like the Ottawa Ankle Rule, it will accurately identify those who require radiography to detect ankle fracture 100% of the time. With a negative test result, it is very unlikely the condition is present and therefore the patient does not need radiography. In this case, if a patient has no palpable bony tenderness in the foot or ankle and is walking into your treatment room, it's highly unlikely they need an x-ray. Other very good screening tests include: the Ottawa Knee Rule, the Canadian Cervical Spine Rule, and the Elbow Extension Test. Again, the clinician is interested in that negative result (i.e. the test does not reproduce the patient’s symptoms) when using highly sensitive tests so we are less likely to incorrectly identify a body region as problematic. In general a test is considered highly sensitive or specific when it is greater than 90% (2).

In quick summary, here is a familiar mnemonic below:

Sensitive test and Negative rules OUT a condition = SNOUT

Specific test and Positive test rules IN a condition = SPIN

Often times there is trade off between a test's sensitivity and specificity, but by having a basic working knowledge of a test’s diagnostic accuracy gives some understanding if a condition in a certain body region is absent or present. Now how much of a shift in probability is based on likelihood ratios: the index of suspicion is increased or decreased with positive and negative likelihood ratios respectively. A positive likelihood ratio can be useful when a test is positive and a negative likelihood ratio is useful when a test is negative. See below:

Figure 2: adapted from Cleland 2005 (3). If a test or intervention yields a positive LR of 5 or greater or a negative LR of 0.2 or less - this is something that might be worth incorporating into clinical practice.

Figure 2: adapted from Cleland 2005 (3). If a test or intervention yields a positive LR of 5 or greater or a negative LR of 0.2 or less - this is something that might be worth incorporating into clinical practice.

Now that we have revisited sensitivity and specificity and briefly touched on likelihood ratios, how does this look in clinical practice? First, just reflecting on your practice, what tests would you use to help rule out or rule in a condition in a body region? When conducting these tests, how much do they shift the probability that the condition you are looking for actually exists?

Let us assume we are assessing a 30 year old female patient who does not have red or yellow flags present (since you used excellent screening questions) with primary complaints of intermittent, right-sided posterior thigh pain, moderate to high irritability, aggravated by getting in and out of a car and relieved with sitting while the right leg is outstretched and externally rotated. By considering the aggravating and easing factors what body area do you suspect is symptomatic? If we start with screening tests, what would you use for the lumbar spine or the sacro-iliac joint (SIJ)? How would you screen or rule in a hip condition? Do these tests shift your post-test probability? Let us try diagnosis by exclusion and see what we uncover with the current case.

By understanding some basic psychometric properties of a few good tests of the lower quadrant, we include the Slump test to screen the lumbar spine for potential referred pain to the hamstring. The Slump test has been shown to possess moderate to high sensitivity (Sn=0.84) and specificity (Sp=0.83) (4). This means there is a low percentage of patients who are both misdiagnosed with and wrongfully cleared of having lumbar spine involvement. In this case, the Slump test was performed and found to be negative, meaning it failed to reproduce the patient’s symptoms and because this test possesses high sensitivity, we might suspect the symptoms are not coming from the lumbar spine. If the Slump test was not appropriate for the patient in front of you, another good test--the Straight Leg Raise (Sn=0.52, Sp=0.89)--might be useful as well. However, the SLR does not appear to be as strong at screening and may be more useful at ruling in a back condition.

We can never stand on one test alone, so perhaps after seeing normal lumbar range of motion and finding a negative Slump test we move away from the lumbar spine and decide to screen the SIJ? True SIJ pain is far more uncommon than low back pain, but we may choose to rule it out anyways by doing a Thigh Thrust test, which has also been shown to be highly sensitive (Sn = 0.88) (5). In this case, the Thigh Thrust provoked some familiar symptoms, but not exactly as the patient described. Next we continue to stress test the SIJ using Distraction, a moderately specific test (Sp = 0.81), but there is no symptom reproduction. There is a cluster of SIJ examination tests as described by Laslett et al. to detect SIJ as the pain-generator. The sensitivity and specificity for three or more of six positive SIJ tests were 94% and 78%, respectively with a positive LR of 4.3 (5).

Figure 3: Your classic Slump Test - will help screen the lumbar spine and a few other regions but we are looking for reproduction of symptoms or obvious side-to-side asymmetries in range of motion.

Figure 3: Your classic Slump Test - will help screen the lumbar spine and a few other regions but we are looking for reproduction of symptoms or obvious side-to-side asymmetries in range of motion.

We have already screened the lumbar spine and SIJ using the Slump test and the Thigh Thrust respectively. This occurred within a few minutes and hopefully by using a few more good tests we can track down those symptoms to the hip…? First we start with the FADDIR test, which demonstrates excellent value as a screening tool (Sn=0.94-0.99) (6). This test reproduces the patient’s symptoms, thus, I cannot rule out a hip condition. We may have also got similar results from another good hip screen, the FABER test. To help rule in a hip condition, the Thomas test is highly specific and useful to aid in diagnosis of a intra-articular hip conditions (7), so we do this test and it also reproduced the concordant symptoms - the posterior thigh pain. We could have also used an Active Straight Leg Raise test.  We will likely want to gather hip active and passive range of motion and manual muscle test data just for more markers to monitor change, but by now we might feel that there is enough information to conclude that something is happening at the hip joint by using a selection of good test items.

Obviously many orthopedic conditions do not solely affect one body region, and in this case, it would be reasonable to expect to find impairments in the lumbar region too. But the point is that from the history, we were able to come up with a hypothesis and through diagnosis by exclusion we attempted to confirm our clinical impression. With deductive reasoning we can efficiently and effectively rule out possible symptom generators from neighbouring body regions to assist with uncovering the symptoms in the suspected body region. And if we are using tests with good diagnostic accuracy we can shift the probability and the likelihood that the patient has the condition of interest. Using high yield tests as described above also affords the important opportunity to screen for anything else that may sensitize the system and affect prognosis, such as psychosocial issues, fear of movement, job satisfaction, recent life changes, and lifestyle habits. Growing your knowledge of good tests, using an effective and efficient exam will allow you to better appreciate the whole person presenting in front of you.

AMP’d Clinical Bottom Line

  1. Reflect on your physical examination tests that you commonly use and look into the diagnostic accuracy of them.

  2. Use special tests that help screen or rule in conditions to better inform your clinical decision making. Although SMP may likely be more salient for treatment.

  3. These tests are not specific to any one tissue. They aid in diagnosis and should not be used to determine a “medical label” for a patient.


1) Bachmann et al. Accuracy of Ottawa ankle rules to exclude fractures of the ankle and mid-foot: systematic review. BMJ. 2003.

2) Obuchowski et al. Ten Criteria for Effective Screening. Am J Roent. 2001;176:1357-62

3) Cleland, J. Orthopaedic clinical examination: An evidence-based approach for physical therapists. Icon Learning Systems, Carlstadt, NJ. 2005.

4) Majlesi et al. The Sensitivity and Specificity of the Slump and the SLR tests in Patient with Lumbar Disc Herniation. Journal of Clinical Rheumatology. April 2008.

5) Laslett, M. Diagnosis of Sacroiliac Joint Pain: Validity of individual provocation tests and composites of tests. Manual Therapy. January 2005.

6) Reiman et al. Diagnostic accuracy of clinical tests for the diagnosis of hip femoroacetabular impingement/labral tear: a systematic review with meta-analysis. BJSM. 2014.

7) Reiman et al. Diagnostic accuracy of clinical tests of the hip: a systematic review with meta-analysis. BJSM. 2012.