1Professor, Department of Physical Therapy and Athletic Training, Northern Arizona University, Flagstaff, Arizona 86004, USA
2Staff Physical Therapist, Healthsouth Rehabilitation Hospital, Mesa, Arizona 85206, USA
3Staff Physical Therapist, Therapeutic Associates Inc. Ballard Physical Therapy, Seattle, Washington 98107, USA
4Doctor of Physical Therapy Student, Department of Physical Therapy and Athletic Training, Northern Arizona University, Flagstaff, Arizona 86004, USA
*Address for Correspondence: Dr. Mark Westover Cornwall, Professor, Department of Physical Therapy and Athletic Training, Northern Arizona University, Flagstaff, Arizona 86004, USA, Email: firstname.lastname@example.org
Dates: Submitted: 07 June 2017; Approved: 21 June 2017; Published: 23 June 2017
How to cite this article: Cornwall MW, Lane C, Norwood J, Patterson S, Strauss D. Reliability and validity of the Sit-To-Stand Test to assess Global Foot Mobility. J Sports Med Ther. 2017; 2: 066-073.
Copyright: © 2017 Cornwall MW, et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The Sit-to-Stand test (STST) involves comparing the change in a person’s non-weight-bearing and weight-bearing foot posture to quickly classify a person’s overall foot mobility. Despite the simplicity of the test, its reliability and validity has not been established. The purpose of this study is to determine the intra-rater and inter-rater reliability of the STST as well as its validity. Ninety-seven subjects with a mean age of 25 years (±3.7) participated in the study. Each subject’s foot posture from non-weight-bearing to weight-bearing was evaluated by two different raters. Each rater classified each subject’s change in foot posture as “Hypomobile”, “Normal” or “Hypermobile”. This same procedure was repeated approximately one week later without the raters being able to review what their original classification for that subject had been. The subjects also had their foot mobility quantified by measuring the height and width of their dorsal arch in both non-weight-bearing and weight-bearing. These quantitative measures of foot mobility were then classified as “Hypomobile”, “Normal”, or “Hypermobile” using quartiles. A series of Cohen’s Kappa coefficients were used to assess the amount of agreement between the visual classifications by each rater as well as the classification between the observational and objective classifications. The between-day Kappa coefficients ranged from 0.613 to 0.719 and the inter-rater Kappa coefficients ranged from 0.473 to 0.531. The Kappa coefficients between the visual and quantitative classifications ranged from 0.281 to 0.436. The STST should therefore be used with caution because of its moderate between-rater reliability and validity.
Either limited or excessive foot mobility, particularly that of the medial longitudinal arch, has been shown to influence lower extremity kinematics. Williams et al. reported that the runners in their study who had mobile arches demonstrated decreased internal rotation excursion of their tibia, a greater eversion-to-tibial internal rotation ratio, decreased second peak vertical ground reaction force, and decreased vertical loading rates compared to those with normal or limited mobility . In an earlier study, Williams reported that decreased mobility of the arch in runners was related to an increased need for compliance at other lower extremity joints, such as the knee, and they theorized that arch mobility could therefore be related to running-related injuries . In 2016, Wyndow and associates reported that foot mobility was significantly related to the frontal plane projection angle of the lower extremity during a single leg squat activity in healthy individuals . Specifically, they reported that individuals with higher midfoot mobility had a greater frontal plane projection angle and recommended that the amount of foot mobility be considered in the clinical management of knee-related disorders.
In 2010, Barton and colleagues reported that individuals with patellofemoral pain had a more pronated foot posture as well as increased foot mobility compared to a control group . These findings were further supported in a study by McPoil et al. the following year. In that study, foot mobility, as measured by the change in arch height between weight-bearing and non-weight-bearing, was four times more likely to be seen in individuals with patellofemoral pain compared to a control group . Furthermore, Milles and associates reported that individuals with anterior knee pain who had increased midfoot mobility were more likely to experience a reduction in their symptoms when treated with pre-fabricated orthoses . Foot hypermobility has also been associated with an increased risk of other injuries in sports, particularly the lower extremity . Investigators have reported a relationship between foot mobility and such conditions as plantar fasciitis , lower extremity osteoarthritis , medial tibial stress syndrome [10,11] and anterior cruciate ligament injuries in females .
In the literature, foot mobility has been assessed with a variety of methods. One such method is that of the navicular drop test. Brody first described the navicular drop test in 1982. It consists of measuring the vertical change in the height of the navicular tuberosity between subtalar joint neutral position while standing and relaxed standing . The test is therefore considered a measure of sagittal plane mobility of the midfoot. Because inter-rater reliability of the navicular drop test has been reported to range from poor to moderate [14-16], other methods of assessing foot mobility have been proposed. McPoil et al., described an alternative method of measuring vertical change of the arch by assessing the change in the height of the dorsum of the arch rather than the navicular tuberosity during weight-bearing and non-weight bearing. Using this method, they demonstrated good to excellent levels of intra-rater and inter-rater reliability . Although reliability measures were not reported, assessment of foot mobility has also been described using the change between weight-bearing and non-weight-bearing of sagittal radiographic measures such as the calcaneal inclination angle and the calcaneal-first metatarsal angle .
In 2009, McPoil and colleagues described a method of assessing medial-lateral and vertical movement of the midfoot in both weight-bearing and non-weight-bearing that did not require palpation of the navicular tuberosity. Their study included 345 healthy individuals and they reported very high intra-rater and inter-rater reliability values for all of their measurements. In the same paper, they also described a measurement called the “Foot Mobility Magnitude”, which represented the composite value for both the difference in dorsal arch height (vertical change in arch mobility) as well as the difference in midfoot width (change in medial-lateral midfoot mobility) .
Based on the relationship between foot mobility and lower extremity kinematics and injury, assessment of foot mobility should be included as part of a comprehensive physical examination for those individuals with foot-related injuries or disorders. Not only may such an assessment help clinicians to evaluate the person’s overall foot function, but it may also assist in determining the appropriate footwear or foot orthoses prescription. While a variety of methods exist to assess foot mobility, they may not be suitable either because of marginal test reliability, the need for a radiographic image or limited time or a lack of equipment. Hoppenfeld described in his 1976 book, what he termed a “test for rigid or supple flat feet”, based on observing the foot in sitting and then in standing . The purpose of the test was to allow clinicians to quickly and easily determine the degree of foot mobility of an individual. In order for such a test to be clinically useful, however, it must demonstrate acceptable levels of within-rater and between-rater reliability and also have adequate validity. Although the test proposed by Hoppenfeld, sometimes referred to as the “Sit-to-Stand Test” (STST) is simple and quick, no data exists regarding its reliability nor its validity. Therefore, the purpose of this study was to assess the within-rater and between-rater reliability of the STST as well as determine if it is consistent with quantitative measures of overall foot mobility.
A total of ninety-seven individuals (25 male, 72 female) between the age of 20 and 46 years participated in the study. Table 1 shows the demographic characteristics of the individuals used in this study such as height, weight, body mass index (BMI) and foot posture, measured using the Foot Posture Index (FPI).
Reliability: Sixty-one individuals (14 male, 47 female) between the age of 22 and 46 years participated in the reliability phase of the study. Table 1 also contains the mean demographic information for the subjects used in this phase of the study. Each subject was instructed to sit on the edge of a table with their knees at 90 degrees and their feet dangling off the edge of the table and not touching the floor. While in this position, the overall shape and posture of their foot was observed by two different raters. The subject then stood with their feet comfortably apart and with a self-selected amount of lower extremity “toeing out”. The subject’s foot shape and posture was again observed. Rater 1 was an entry-level physical therapy student with minimal experience in evaluating or treating foot related conditions. Rater 2 was a licensed physical therapist with over 20 years of experience evaluating and treating foot related conditions. Based on the perceived change in foot posture from non-weight-bearing to weight-bearing, each subject’s global foot mobility for each extremity was rated as “Hypomobile” (<25% change), “Normal” (25-75% change), or “Hypermobile” (>75% change) by both raters without knowledge of the other rater’s classification. This same procedure was repeated approximately one week later without the raters being able to review what their original rating for that subject had been. In order to avoid biasing the raters, the FPI was performed after each rater had made their classification.
Although both feet were measured, only the right extremity was used for statistical analysis. In addition to descriptive statistics, a series of Cohen’s Kappa coefficients, adjusted for both “prevalence” and “bias,” , were calculated to determine the magnitude of intra-rater and inter-rater agreement for the STST.
As shown in table 2, the overall between-day agreement for Rater 1 and 2 was 74.2 and 81.3% respectively. The prevalence and bias adjusted between-day Kappa coefficients for the two raters was 0.613 for Rater 1 and 0.719 for Rater 2. See table 2. Using the classification proposed by Landis et al. , such values would indicate “substantial” between-day reliability for each rater. Table 2 also shows the overall agreement and Kappa coefficients indicating the between-rater reliability of assessments on day 1 and 2. Overall agreement between Rater 1 and 2 was 65.6% and 68.8% with the Kappa coefficients being 0.484 for Rater 1 and 0.531 for Rater 2. The classification proposed by Landis et al. , would characterize such values as being “moderate”. Table 3 shows the 3x3 tables used to determine the between-rater agreement.
Validity: All 97 subjects were included for the validity phase of the study. Each subject’s dorsal arch height (DAH) and midfoot width (MFW) was measured at 50% of their overall foot length, first in non-weight-bearing and then again in weight-bearing using a digital caliper or linear gauge and the methodology described by McPoil et al. . The vertical change (DiffDAH) and horizontal change (DiffMFW) of the foot between the non-weight-bearing and weight-bearing measurements was then calculated for each foot. A global foot mobility measure, called the mobility magnitude (MM), was then calculated for each subject using the following formula : MM = √(DiffDAH)2+(DiffMFW)2 Finally, each of the above measures were standardized to the subject’s overall foot length . The resulting normalized values, expressed as a percentage of foot length, were then classified as “Hypermobile”, “Normal” or “Hypomobile” based on quartiles. The first or lowest quartile was designated as “Hypomobile”, while the second and third quartiles were designated as “Normal”. The fourth or highest quartile was designated as “Hypermobile”. Again, although both feet were measured, only the right extremity was used for statistical analysis. In addition to descriptive statistics, a series of Cohen’s Kappa coefficients, adjusted for bias and prevalence , were used to assess the amount of agreement between the visual classification of foot posture change from non-weight-bearing to weight-bearlng that was assigned by Rater 2 and the classification based upon quartiles from the quantitative measures of foot mobility.
The mean normalized values for DiffDAH, DiffMFW and MM for each visual classification by Rater 2 is shown in table 4. The resulting 3x3 tables for DiffDAH, DiffMFW and MM in addition to the percent agreement and Kappa coefficient values, adjusted for bias and prevalence, between Rater 2 and the classification based on the quartiles of the quantitative foot mobility measurements are shown in table 5. As can be seen, the amount of agreement between the visual classification of foot mobility by Rater 2 and the quantitative classification using quartiles varied depending on which quantitative measure was used. The agreement between the visual classification and classification based on DiffDAH had the lowest Kappa value (0.281), while the agreement between the visual classification and classification based on MM had the highest Kappa value (0.436). Based upon the suggested classification proposed by Landis, the Kappa values between visual classification and the classification based on either DiffDAH or DiffMFW would be considered “fair”. The Kappa value between visual classification and classification based on MM would be considered “moderate” .
With respect to between-day reliability, rater experience does not appear to have a large effect. Values for Rater 2 were only slightly greater than those of Rater 1, and the categorical rating of each rater was “substantial”. The larger between-day agreement values compared to between-rater agreement values seen in the current study is consistent with many other clinical tests used as part of a foot and ankle examination. These include such things as ankle dorsiflexion , subtalar joint neutral palpation [23,15], and navicular drop [15,16].
An analysis of the 3x3 tables used to calculate the percent agreement and Kappa coefficients between each rater shows that the two raters appear to agree more often with each other for those feet characterized as “Hypermobile” compared to those characterized as “Hypomobile”. This might be related to the fact that with “Hypermobile” feet, the flattening of the medial longitudinal arch and the concurrent widening of the foot is easier to see or is more pronounced compared to those with limited foot motion. In 2011, Cornwall et al., showed that the change in midfoot width had a stronger relationship to overall foot posture compared to arch height as measured by the FPI . This dominant role in overall foot posture may therefore have contributed to the higher agreement by the raters.
Based on the findings of this study, the authors feel that the within-rater and between-rater agreement of the STST is sufficient for the test to be used as part of a comprehensive physical examination of the foot. Because the within-rater and between-rater agreement was only “moderate” or “substantial”, the test should be used with caution, especially when comparing the rating of one clinician to that of another.
The validity of the STST was measured by comparing the classification of foot mobility by Rater 2 to that based on quartiles of the normalized quantitative values of foot mobility. This comparison yielded either “fair” or “moderate” agreement between the two classifications (Table 5). It is important to note that both DiffDAH and DiffMFW showed “fair” agreement between visual and quantitative classification, while MM demonstrated “moderate” agreement. MM is a calculated value based on both the vertical and horizontal change in foot posture and therefore represents a more comprehensive representation of the foot’s change from non-weight-bearing to weight-bearing . The greater agreement between visual classification and the quantitative classification using MM illustrates that Rater 2 did not focus on either vertical or horizontal change in foot posture, but rather the global change in foot posture that was observed when determining the classification.
Based on the resulting 3x3 tables used to calculate Kappa values, Rater 2 was better at accurately identifying “Hypermobility.” Rater 2 agreed 57.1% and 63.6% of the time with the classification of “Hypermobility” using DiffDAH and DiffMFW respectively, but only 6.9% and 3.4% of the time with the classification of “Hypomobility.” Using MM, Rater 2 agreed with the quantification classification of “Hypermobility” 78.9% of the time, but only 7.7% of the time with the quantification classification of “Hypomobility.” As can be seen in table 5, a large number of feet that were classified as “Hypomobile” by the quantitative measure were classified as “Normal” by Rater 2. This would indicate that a change in foot posture of less than 25% is more difficult to visually discriminate compared to a change greater than 75%. It is possible that the sample of subjects used in the study did not have an adequate representation of those with limited foot mobility, therefore, such individuals would be wrongly classified as having normal rather than limited mobility. In consideration of such a possibility, the distribution of FPI values for the subjects in the study was analyzed. The mean FPI for the subjects in the current study was +3.2 with a standard deviation of 3.2 (Table 1). The distribution of these scores were normally distributed and 20% of the subjects had an FPI of zero or less, which is considered to be significantly supinated . Since there is a statistically significant positive relationship between a people’s FPI and the amount of foot mobility , it is therefore reasonable to assume that the current study had an adequate representation of individuals with limited foot mobility.
A limitation of the current study is the small number of males measured compared to females. Normalizing the quantification of foot mobility relative to each subject’s foot length, however, increased measurement reliability and reduced the possible bias introduced by such a large proportion of females in the study and increased . In addition, because there is no expectation that a rater would visually classify a male’s foot mobility differently than a female’s foot, the authors feel that the lack of more males in the study had minimal impact on the results. A replication of the study with more males would be able to confirm this.
The finding of “fair” to “moderate” validity of the STST does not preclude clinicians from using the test to quickly and easily classify a person’s overall foot mobility, but its use does warrant caution, especially with regard to its interpretation. On the other hand, the STST has value beyond that of classifying foot mobility. For example, the test would provide valuable information regarding a person’s willingness and ability to fully load their foot, especially if more extensive gait analysis is not warranted, not possible or is contraindicated. As such, clinicians may choose to use the STST without classifying a person’s overall foot mobility.
The results of this study indicate that the STST has “substantial” between-day reliability and “moderate” between-rater reliability and that neither is influenced significantly by the experience of the rater. Further, “fair” to “moderate” validity of the STST was found when compared to quantitative measures of foot mobility. The authors believe that these findings are sufficient for clinicians to use the test as part of a comprehensive physical examination, but that it should be used with some caution. Such caution stems from the fact that the agreement between a visual classification of foot mobility and a classification based on quantitative measures yielded an overall agreement of between 35.1% and 46.4%. Such overall low agreement casts considerable doubt on the interpretation of the test’s result. In addition, identifying those with limited foot mobility was poor. As such, the authors feel that clinicians that are interested in classifying individuals as “Hypermobile”, “Normal” or “Hypomobile” would likely be better served by using more quantitative methods.