Assessing the Reliability of Patient, Nurse, and Family Caregiver Symptom Ratings in Hospitalized Advanced Cancer Patients

Cheryl  L. Nekolaichuk; Thomas  O. Maguire; Maria Suarez-Almazor; W.  Todd Rogers; Eduardo Bruera

© 1999 by American Society of Clinical Oncology

Assessing the Reliability of Patient, Nurse, and Family Caregiver Symptom Ratings in Hospitalized Advanced Cancer Patients

From the Division of Palliative Care Medicine, Department of Oncology, Department of Educational Psychology, and Department of Public Health Sciences, University of Alberta, Edmonton, Alberta, Canada.

Address reprint requests to Cheryl L. Nekolaichuk, PhD, Department of Oncology, c/o Gray Nuns Community Hospital and Health Centre, Room 4011, 1100 Youville Dr West, Edmonton, Alberta, Canada, T6L 5X8; email cln1{at}ualberta.ca

Next Section

Abstract

PURPOSE: The purpose of this study was to examine the reliability of symptom assessments in advanced cancer patients under various conditions, including multiple raters (patients, nurses, and family caregivers), occasions, and symptoms.

PATIENTS AND METHODS: The study sample consisted of 32 advanced cancer patients admitted to a tertiary palliative care unit. Symptom assessments were completed for each patient on two separate occasions, approximately 24 hours apart. On each occasion, the patient, the primary care nurse, and a primary family caregiver independently completed an assessment using the Edmonton Symptom Assessment System (ESAS). The ESAS is a nine-item visual analogue scale for assessing symptoms in palliative patients. The reliability of the assessments (r) was examined using generalizability theory.

RESULTS: Three important findings emerged from this analysis. First, the analysis of individual symptom ratings provided a more meaningful representation of the symptom experience than total symptom distress ratings. Secondly, patients, nurses, and caregivers varied in their ratings across different patients, as well as in their ratings of shortness of breath, which may have been a result of individual rater variability. Finally, reliability estimates (r), based on a single rater and one occasion, were less than .70 for all symptoms, except appetite. These estimates improved substantially (r ≥ .70) for all symptoms except anxiety and shortness of breath, using three raters on a single occasion or two raters across two occasions.

CONCLUSION: The findings from this study reinforce the need for the development of an integrated symptom assessment approach that combines patient and proxy assessments. Further research is needed to explore individual differences among raters.

APPROXIMATELY ONE in three people will develop cancer at some point in their lifetime.¹ Half of these individuals will eventually die from progressive disease.² Despite the high frequencies of debilitating symptoms associated with advanced cancer,^3-6 few patients will undergo any form of systematic symptom assessment before they die. Although the reasons are complex, the lack of simple, reliable bedside assessment tools is a major barrier to the routine use of symptom assessments in the terminally ill.^7-9

Underlying this paucity of psychometrically sound measures, a second barrier rests with this question: Whom do we select as the “gold standard” for conducting symptom assessments? Although researchers and clinicians have traditionally adopted patient ratings as the gold standard,^10,11 these subjective responses need to be interpreted with caution. Patients experience symptoms at three different levels; these levels are production (the physiologic response that produces the symptom), perception (the process by which the symptom reaches the brain cortex), and expression (the patient's outward expression of a symptom). The symptom experience, however, can only be measured at the level of expression.¹² Although two patients may express the same intensity of a symptom, such as pain (eg, both give a score of 8 out of 10), they may experience different components of production (nociception), existential suffering, and/or suffering expressed through the pain.

In the development of assessment instruments, there has been a tendency to use either the patient's subjective response (predominantly) or a third party rating as the gold standard. For example, some tools, such as the Edmonton Symptom Assessment System (ESAS)¹³ and the Symptom Distress Scale (SDS),¹⁴ were designed to obtain subjective patient ratings, whereas others, such as the Support Team Assessment Schedule (STAS)¹⁵ and the Karnofsky Performance Status (KPS) scale,¹⁶ were developed for third party assessments.

Despite this reliance on a single gold standard, medical decisions are influenced by a number of factors, including the patient's symptom experience, the perceptions of professional and family caregivers, and related behavioral signs and cues. The use of an integrated symptom assessment approach, involving patient and proxy ratings, would be of great value across the spectrum of advanced cancer patients. For cognitively intact patients, a consensus of patient and proxy assessments would be an ideal outcome. Even with discordant assessments, the use of such an integrated approach could help address apparent differences in symptom assessment among patients and proxy raters and potentially provide educational opportunities for training individuals to complete symptom assessments for the terminally ill. For cognitively impaired patients, the selection of individuals who may best understand and represent the patient's symptom experience would be an ideal outcome. If such an integrated approach for symptom assessments were to be used, then it would be important to determine the degree of consistency of ratings across patients and caregivers.

The purpose of this study was to examine the reliability of patient and proxy (nurse and family caregiver) symptom ratings in advanced cancer patients, across a number of conditions of interest. An initial objective was to determine the extent to which multiple raters, occasions, and symptoms contribute to the variability in symptom assessment scores. A second objective was to determine the number of raters that would be needed to obtain a reasonable estimate of reliability (ie, r = .80) for the assessment of individual symptoms.

A number of studies that compare patient and proxy symptom ratings have been conducted (for comprehensive review, see Sprangers and Aaronson¹⁷). Many of these studies have focused on a single symptom, such as pain,^18-21 anxiety,²² depression,²³ or quality of life,^24-30 using physicians, nurses, and/or primary family caregivers as proxy raters. Most proxy rater studies have been conducted within the context of chronic illness or cancer. More recently, a number of studies have focused on the role of proxy assessments in palliative care.^31-36

Although the findings remain inconsistent across studies, Sprangers and Aaronson¹⁷ identified some notable trends in proxy assessments of patients with chronic disease, including cancer. Both health care providers and significant caregivers tend to underestimate quality of life and performance status of patients, while overestimating psychological symptoms, such as depression or anxiety. Within the realm of quality of life, family caregivers may be more consistent in assessing patients' psychological health, whereas health care providers may be better able to assess patients' physical symptoms. In addition, caregivers who live in close proximity to patients may be more accurate in their assessments than those who live further away. In terms of pain assessment, health care providers tend to consistently underassess pain, a pattern that has been confirmed by later studies.^20,36

In contrast, other studies have shown no differences across rater groups. For example, results from a study comparing patient and proxy assessments, using the STAS, revealed that professional assessments were comparable to patient and family assessments.¹⁵ Similarly, Sneeuw et al^28,29 concluded that proxy ratings of quality of life were reasonably consistent with patient ratings, although there were greater discrepancies in patient-proxy scores in physically and cognitively impaired patients.²⁸

Despite these inconsistent findings, proxy raters play an important role in symptom assessment in advanced cancer patients. There are a number of reasons why proxy assessments may be helpful or necessary: (1) to provide an additional perspective when patients overreport³⁷ or underreport their symptoms^38,39 or when patients are mildly to moderately confused⁴⁰; (2) to minimize the impact of inaccurate assessments by health care providers on the management of care^20,41; (3) to increase the reliability of symptom assessment measures³⁵ (Nekolaichuk et al, manuscript submitted for publication); (4) and to reduce the loss of missing data in longitudinal studies.³⁰ Further research comparing symptom assessments by patient and proxy raters, including the use of family members, is supported in the literature.^17,34,41

In our study, reliability was assessed using generalizability theory, which provides a comprehensive, practical approach for assessing reliability.⁴² It offers a distinct advantage over the traditional classical test theory approach.⁴³ The classical approach does not provide an overall picture of multiple sources of error, nor does it consider the interactions among these error sources. In contrast, using generalizability theory, it is possible to concurrently consider multiple sources or conditions of interest that may be contributing to the variability in test scores (systematic and random error). Through the identification and control of systematic error sources, the accuracy of scores can be improved.

To identify different sources of error, studies may intentionally be designed to examine the reliability of scores under various factors or facets. A facet is a set of conditions of measurement that, because of differences among the conditions, may contribute to the observed variability among the scores. For example, in the assessment of symptoms, the person doing the assessment (raters facet), the types of symptoms experienced (symptoms facet), and the fluctuating course of the disease process (occasions facet) could potentially contribute to the variability of symptom assessments. Ideally, we would like to be able to identify the variability in the symptom experience among patients, with minimal influence from these confounding factors. Using the principles of generalizability theory, it is possible to design a number of studies to assess the extent to which each of these facets (ie, raters, symptoms, and occasions) are contributing to the variability (and inconsistency) in scores. By controlling for some of these sources of error, it would be possible to obtain more consistent assessments.

We have provided a more detailed overview of generalizability theory elsewhere (Nekolaichuk et al, manuscript submitted for publication). For a comprehensive review of this approach, also see Shavelson and Webb,⁴² Brennan,⁴⁴ and Cronbach et al.⁴⁵

Previous Section Next Section

PATIENTS AND METHODS

Design

Using the principles of generalizability theory,⁴² this study was designed to assess the reliability of symptom ratings under three specific conditions of interest (facets): (1) raters (patients, nurses, and family members), (2) occasions (two assessments, 24 hours apart), and (3) symptoms (pain, tiredness, nausea, depression, anxiety, drowsiness, appetite, well-being, and shortness of breath). The assessments occurred within the hospital environment of a tertiary palliative care unit, which was staffed by an interdisciplinary team consisting of three full-time academic palliative care physicians, 24-hour nursing care, a social worker, pastoral care worker, physiotherapist, occupational therapist, pharmacist, and volunteers. A standard clinical practice of this unit was to have patients complete twice-daily symptom assessments either by themselves or with the assistance of a nurse and/or family member. As an extension of this practice, this study was designed to obtain three independent symptom assessments: the patient's self-rating and two proxy ratings by the patient's primary care nurse and by a primary family caregiver. To assess the reliability of symptom ratings over time, these assessments were conducted on two separate occasions 1 day apart (for a total of six assessments per patient). Patients continued to receive their regular medical treatments as part of their inpatient care. For this study design, only those patients who could complete the assessments independently were considered.

Sample

The sample included 32 hospitalized patients admitted to a tertiary palliative care unit. The following criteria were used for participant selection: (a) ability to speak English; (b) cognitively able to understand and complete the assessment tool independently; and (c) willingness to participate. Potential participants were asked to participate in this study once their condition had been stabilized on the unit. The study design was approved by a research ethics board, and written informed consent was obtained from the participants before their participation in the study. All participants had advanced cancer, which was either locally recurrent or metastatic in nature.

Two proxy raters were selected for each patient, the primary care nurse (the nurse caring for the patient that day) and a primary family caregiver. The primary family caregiver was identified by the patient as the person who provided the most consistent care for the patient in the home setting. A total of 15 nurses and 32 family members participated in the study. For a given patient, the same nurse and family member completed both assessments.

Measures

Symptom assessments were completed using the ESAS,¹³ a clinically oriented tool for assessing symptoms in terminally ill patients. The ESAS consists of nine 100-mm visual analogue scales for assessing the intensity of the following symptoms: pain, tiredness, nausea, depression, anxiety, drowsiness, appetite, well-being, and shortness of breath. Scores range from 0 (no or best possible symptom) to 100 (worst possible symptom), with larger values representing greater symptom distress.

The Mini-Mental Status Examination (MMSE)⁴⁶ is a commonly used measure for assessing five distinct components of cognitive functioning: orientation, memory, attention and calculation, recall, and language. Scores range from 0 of 30 points (severely cognitively impaired) to 30 of 30 points (cognitively intact). A cutoff score of 24 of 30 points has traditionally been used to differentiate between cognitively intact (MMSE ≥ 24) and cognitively impaired (MMSE < 24) individuals.

The Edmonton Functional Assessment Tool⁴⁷ is a 10-item tool designed to assess functional status of palliative care patients. Scores range from 0 (functional) to 30 (severe dysfunction). Low scores (< 10) indicate functional independence, whereas high scores (> 20) indicate functional dependence.

Data Collection Procedures

The patient and two proxy raters completed an assessment of the patient's symptoms on two separate occasions, approximately 24 hours apart. On each occasion, symptoms were independently assessed at the same time by the patient, the primary care nurse, and the designated primary family caregiver. Working independently, each rater prospectively rated the intensity of each of the nine symptoms. The same nurse and family member completed the assessments on both occasions. In addition to the symptom assessments, the patient's cognitive and functional status were assessed by a research nurse using the MMSE and the Edmonton Functional Assessment Tool, respectively. These assessments were completed on both occasions, before the symptom assessments by the three raters. All assessments were completed in the hospital.

Statistical Analysis

Using a previous generalizability design as a framework (Nekolaichuk et al, manuscript submitted for publication),³⁵ the reliability of symptom ratings was assessed under various conditions, including multiple raters, occasions, and symptoms. Two series of studies, a series of generalizability (G) studies and a series of decision (D) studies, were undertaken. The purpose of a G study is to gather information about the different sources of measurement variability. The ultimate goal is to map out as many potential sources of variability as possible. Using the information gleaned from a G study, a D study provides a framework for evaluating the effectiveness of different designs that incorporate a variety of conditions within an applied setting. The primary goal of a D study is to identify an efficient and clinically relevant design that maximizes reliability while minimizing error.⁴²

First, a series of planning or G studies was conducted, using a fully crossed design, to assess the extent to which raters, occasions, and symptoms were contributing to the variability in observed symptom scores. It could be argued that raters were nested within patients, and that the analyses should reflect this. In this study, however, the motive was to generalize along the rater facet at three fixed levels: self-rating, rating by primary care nurse, and rating by family caregiver. Viewed in this fashion, the design was treated analytically as a fully crossed design.

Based on the G study findings, a series of D studies was conducted to determine the number of raters and occasions required to obtain a reasonable estimate of reliability of symptom ratings. Reliability estimates and standard errors of measurement were calculated for each of the nine symptoms across both occasions, using varying numbers or conditions of raters and occasions.

These two series of G and D studies were designed to answer the following questions:

1. How consistent are the ratings by different raters over time when averaged across the nine symptoms?

Design 1: Persons by raters by occasions by symptoms mixed design, with symptoms fixed.
2. How consistent are the ratings by different raters across time for a specific symptom?

Design 2: Persons by raters by occasions random design for each symptom (total of nine G studies).
3. How many raters and occasions would be required to obtain a reasonable estimate of reliability for a specific symptom?

Design 3: D studies for ratings of patients' symptoms (persons by raters by occasions design; total of nine D studies).

Previous Section Next Section

RESULTS

A summary of patient characteristics is listed in Table 1. As listed in this table, the participants were older (mean age, 57.9 years), with an equal representation of women and men; diagnosed with advanced cancer; cognitively intact; and functionally independent.

View this table:

Table 1.

Patient Characteristics

The results for the G study and D study designs are listed in Tables 2 to 6. These findings are discussed within the framework of the questions that each of these designs addressed.

How Consistent are the Ratings by Different Raters Over Time When Averaged Across Symptoms?

The results for Design 1 (three-facet, persons by raters by occasions by symptoms G study mixed design) are listed in Table 2. There are seven sources of variability reported in this table (ie, variance components for persons, raters, and occasions, plus interaction terms). These variance components have been averaged across the conditions of the symptom facet, which is fixed. Thus, the symptom facet, itself, does not appear in the table. The largest variance component, the person effect, accounted for the highest percentage of total variance (75.8%), which indicates that there were individual differences across patients, a desirable outcome. The second largest variance component, that for persons by raters, accounted for 12.0% of the total variance, suggesting that raters were rating some patients differently than other patients. This finding is not entirely surprising because the raters varied across patients (ie, although the same nurse rated the same patient on both occasions, different nurses rated different patients). The third largest variance component (ie, for persons by raters by occasions) was relatively low, accounting for 10.9% of the total variance. Thus, a large proportion of the variability was accounted for by other components of the model. The remaining effects, including the rater effect, did not contribute appreciably to the variability of assessment scores. This suggests that the raters were generally quite consistent in their assessments when averaged across symptoms and both occasions, apart from the differences noted for the person by rater effect.

View this table:

Table 2.

Design 1: Analysis of a Three-Facet Person by Rater by Occasion by Symptom [p × r × o × s] G Study Mixed Design with s Fixed

Table 2 provides an overview of the effects at the aggregate level (when averaged across the nine symptoms). It does not, however, provide any indication of the degree of variability in symptom ratings at the individual symptom level. To address this issue, a second series of G studies was designed for individual symptoms (Design 2), the results of which follow.

How Consistent Are the Ratings by Different Raters Across Time for a Specific Symptom?

The results of Design 2 (two-facet persons by raters by occasions random G study design) are listed in Tables 3 and 4. The variance components for these two-facet designs (Table 3) and percentage of total variance attributable to these facets and their interactions (Table 4) are reported for each of the nine symptoms.

View this table:

Table 3.

Design 2: Variance Components Based on a Two-Facet p × r × o Random G Study Design

View this table:

Table 4.

Design 2: Percent of Total Variance Based on a Two-Facet p × r × o Random G Study Design

As listed in Table 4, the percent of total variance attributable to the residual (persons by raters by occasions) effect ranged from 15.6% (tiredness) to 41.1% (anxiety). Ideally, this percentage should be relatively low. For five of the nine symptoms (ie, anxiety, depression, nausea, well-being, and drowsiness), the percentage of total variance for persons by raters by occasions was more than 20%, suggesting that a substantial portion of the variance remained unexplained by the design. The percent of total variance for the person by rater effect ranged from 9.4% (appetite) to 33.4% (shortness of breath), suggesting that some raters assessed some patients differently than other patients. This is consistent with the findings from the first design (Table 2), in which the person by rater effect was more than 10% (ie, 12.0%). The percent of total variance accounted for by the rater effect was less than 10% for all symptoms. Most notably, the rater effect for shortness of breath (8.8% of total variance) was considerably higher than the remaining symptoms. Thus, although the raters were reasonably consistent in their ratings when averaged across the two occasions, there were notable differences among the raters across different patients (ie, person by rater effect) and among the raters themselves (ie, rater effect) for shortness of breath. Figure 1 illustrates these differences between the person by rater and rater effects for each of the nine symptoms.

View larger version:

PowerPoint Slide for Teaching

Fig 1.

Percent of total variance for rater (r) effect and person by rater (pr) effect for each symptom (person by rater by occasion G study random design).

As also listed in Table 4, the occasion effect, rater by occasion effect, and person by occasion effect did not contribute to a substantial portion of the total variance (ie, < 10%) for all symptoms, apart from the person by occasion effect for pain (12.1% of total variance). These findings indicate that most symptom ratings remained reasonably consistent over the 24-hour assessment period. The higher percent of total variance attributable to the person by occasion effect for pain suggests that pain ratings varied from one occasion to the next for individual patients. For example, patients with high pain ratings at occasion one may not necessarily have had high pain ratings at occasion two. A possible explanation for this difference is that patients were continuing to be treated for fluctuating pain levels that had not stabilized to the same extent as the other symptoms.

How Many Raters and Occasions Would be Required to Obtain a Reasonable Estimate of Reliability for a Specific Symptom?

The results of the D study analyses (Design 3) are listed in Table 5. Reliability estimates (generalizability coefficients; r) are reported for each symptom, based on varying numbers of raters and occasions. The estimates for a single rater on a single occasion ranged from .35 (anxiety) to .72 (appetite), which are quite low. These estimates (r) improved substantially with three raters on a single occasion, ranging from .62 (anxiety) to .87 (appetite). For the symptoms of drowsiness, well-being, tiredness, and appetite, these estimates were above the .80 cutoff set for this study; whereas the estimates for nausea and depression approached this value. For two of the symptoms (anxiety and shortness of breath), however, reliability estimates (r) were below .70, even with three raters. The reliability estimates for two raters on two occasions were similar to the estimates obtained with three raters on a single occasion. The highest reliability estimates (r) were obtained with three raters and two occasions, ranging from .69 (shortness of breath) to .91 (appetite). The amount gained (in terms of the relative increase in reliability) from adding an additional assessment time for three raters is not as great, however, as increasing the number of raters from one to three on a single occasion.

View this table:

Table 5.

Design 3: D Studies for Ratings of Patients' Symptoms (two-facet p × R × O design)

When interpreting reliability estimates, it is important to also consider the standard error of measurement. Ideally, we would like to have large reliability estimates (ie, r close to 1.0) with low standard errors of measurement, which would suggest the ratings are consistent, and there is minimal variability across raters. A summary of the relationship between reliability estimates and standard errors of measurement, based on different numbers of raters for a single occasion, appears in Table 6. For each symptom, as the number of raters increased from one to three, the reliability estimates increased while the standard errors of measurement decreased. For three raters, the standard errors of measurement were less than or equal to 10 mm for all symptoms, apart from pain. Figure 2 provides a graphical representation of these differences in standard errors of measurement based on the number of raters. Similar results were obtained for reliability estimates and standard errors of measurement based on varying number of raters for two occasions.

View this table:

Table 6.

Reliability Estimates and Standard Errors of Measurement Based on Decision Study for Ratings of Patients' Symptoms (p × R × O design)

View larger version:

PowerPoint Slide for Teaching

Fig 2.

Standard errors of measurement (mm) for each symptom based on number of raters on a single occasion (person by rater by occasion G study random design).

Previous Section Next Section

DISCUSSION

The major purpose of this study was to assess the reliability of symptom ratings under a number of different conditions, including multiple raters, occasions, and symptoms. Using the principles of generalizability theory,⁴² we identified three substantial findings from our research.

First, patients, nurses, and family caregivers were reasonably consistent in their symptom ratings when averaged across all symptoms (total symptom distress) and both occasions. The use of a total symptom distress score, however, does have its limitations because differences at the individual symptom level remain undetected. Similar to the findings in a previous study (Nekolaichuk et al, manuscript submitted for publication), these findings suggest that the representation of symptoms as individual symptom profiles would be more meaningful than the use of a total symptom distress score.

Second, at the individual symptom level, patients, nurses, and family caregivers were reasonably consistent in their ratings when averaged across the two occasions. There were notable differences, however, in that some raters rated some patients differently than others. As mentioned previously, some of this variation may have been because of the fact that the raters varied across patients (ie, the same nurse rated the same patient across both occasions, but different nurses may have rated different patients). Another possible explanation is that some symptoms may be more difficult to assess than others because of the limited visual cues accompanying these symptoms. For example, some symptoms, such as shortness of breath (ie, 8.8% and 33.4% total variance as a result of rater effect and person by rater effect, respectively) and anxiety (21.8% total variance as a result of person by rater effect), may have a substantial psychological component that is not directly observable. Thus, proxy raters may have more difficulty assessing these symptoms. These results concur with other findings in the literature that suggest that proxy raters are better able to assess physical, rather than psychological, symptoms,³⁰ and that they tend to underestimate symptoms such as depression and anxiety, in comparison with patients.¹⁷

Third, for most symptoms, a minimum of three raters on a single occasion or two raters across two occasions would be required to obtain reliability estimates above 0.70. These findings are consistent with a previous study focusing on patient and proxy symptom assessments,³⁶ in which reliability estimates substantially improved with three raters as opposed to one. In this previous study, reliability estimates of symptom ratings of advanced cancer patients were also assessed across multiple raters (patients, physicians, and nurses), occasions (two assessments within approximately 10 days of hospital admission), and symptoms. In contrast, this study was designed to assess symptom ratings of relatively stable patients by patients, nurses, and family members, across a shorter time period of 24 hours. The same nine symptoms were assessed in both studies. The reliability estimates based on the three raters of patients, nurses, and family members (three raters, single occasion, in this study) were higher than the estimates based on patients, nurses, and physicians (three raters, occasion 1, from the previous study) for all symptoms except nausea and shortness of breath. These findings suggest that patients, nurses, and family caregivers may be more consistent in their ratings than patients, nurses, and physicians. One limitation in comparing these two studies, however, is that the first study involved patients who were in acute symptom distress, whereas this study focused on patients who were stabilized. Further research is needed to explore differences among different proxy raters, including nurses, physicians, and family caregivers, as well as in the assessment of patients with different acuity levels of symptom distress.

These findings provide a framework for using regular symptom assessments within the clinical setting. Although they are based on an advanced cancer patient sample, this framework can be applied throughout the different stages of the cancer trajectory from initial diagnosis to advanced disease. For example, the use of a symptom profile provides a more meaningful representation of a patient's symptom experience than a total symptom distress score. This approach may help clinicians screen for patients at risk of developing debilitating symptoms, as well as target patient-specific interventions for symptom relief. Regular symptom assessments and documentation of symptom profiles in the patient's chart provide an effective means for monitoring the patient's symptom experience and evaluating the effectiveness of interventions over time. The use of multiple (patient and proxy) raters provides a more complete picture of the patient's symptom experience. For some symptoms, it may be possible to use proxy ratings in place of patient ratings, particularly in situations where patients are unable to complete their own assessments. For those symptoms with a substantial psychological component, such as shortness of breath, the use of multiple raters can be helpful to identify discrepancies among the raters' perceptions. This information would be useful for developing patient-specific interventions that potentially incorporate medical treatments and complementary approaches, such as relaxation training, visualization, or cognitive-behavioral techniques.

The increased reliability with multiple raters lends further support for the use of an integrated symptom assessment approach in cancer patients. Rather than focusing separately on either patient or proxy assessments, the use of an integrated approach, involving multiple raters, would result in greater consistencies in scores. An integrated symptom assessment approach could be used to enhance patient care in a number of ways: to identify and screen patients at risk; to target interventions; to monitor the stability of symptoms over time; to identify discrepancies in assessments between patients and caregivers that may be useful for developing further therapeutic interventions; to identify symptoms that may be more difficult to assess than others, particularly as a proxy rater; to develop educational programs for training formal and informal caregivers in symptom assessment; and to assess the family member's perceptions of symptom distress experienced by the patient.

The involvement of family members in symptom assessment can help identify potential interventions for symptom relief across the illness trajectory. Enlistment of family members at the early stages of the cancer diagnosis provides opportunities for educating family members about potential signs and symptoms associated with advancing disease. In situations where the patient may no longer be able to complete the assessments, the family member's ratings could be used as an approximation of the patient's experience. In the final stages of life when patients may become nonresponsive, the involvement of family members in symptom assessment can alleviate some of the family members' concerns about the patient's comfort level by educating them about the physiologic changes that patients may experience (eg, death rattle). The use of family members for symptom assessment throughout the illness trajectory is further supported by Kristjanson et al.³⁴

Despite the differences in findings across studies, proxy raters will continue to play an important role in symptom assessment in advanced cancer patients. Some specific areas in need of further research include the following: (a) the development of an integrated assessment tool that includes both patient (when available) and proxy ratings; (b) a closer examination of the underlying reasons for discrepancies between patient and proxy assessments, particularly in complex symptom experiences; (c) the development of appropriate assessment techniques in patients with cognitive failure, including the use of proxy raters; and (d) a comparison of patient and proxy assessments over a continuous time period with both acutely distressed and stabilized patients.

Appropriate interpretation of the patient's symptom expression by health care providers is essential for the development of effective therapeutic interventions. An overestimation of the production (physiologic) component of the symptom experience, such as pain, may result in excessive opioid doses and toxicity. An underestimation of this component may augment the patient's suffering and decrease quality of life. Through the development of a reliable, integrated symptom assessment approach that incorporates multiple raters and multidimensional profiles, effective interventions that will ultimately enhance patients' comfort and quality of life can be made.

Previous Section Next Section

Footnotes

The findings reported in this manuscript were part of a larger study funded by a Project on Death in America research grant.

Received March 31, 1999.
Accepted June 30, 1999.

Previous Section

References

↵

National Cancer Institute of Canada: Canadian Cancer Statistics 1998. Toronto, ON, National Cancer Institute of Canada, 1998
↵

World Health Organization Expert Committee: Report on cancer pain relief and palliative care: Technical Series 804. Geneva, Switzerland, World Health Organization, 1990
↵

Bruera E: Research in symptoms other than pain, in Doyle D, Hanks G, MacDonald N (eds): Textbook of Palliative Medicine. London, England, Oxford, 1993, pp 87-92
Breitbart W, Bruera E, Chochinov H, et al: Neuropsychiatric syndromes and psychological symptoms in patients with advanced cancer. J Pain Symptom Manage10:131-141, 1995

CrossRef Medline
Vainio A, Auvinen A: Prevalence of symptoms among patients with advanced cancer: An international collaborative study. J Pain Symptom Manage12:3-10, 1996

CrossRef Medline
↵

Beeney LJ, Butow PN, Dunn SM: “Normal” adjustment to cancer: Characteristics and assessment, in Portenoy RK, Bruera E (eds): Topics in Palliative Care (vol 1). New York, NY, Oxford University Press, 1997, pp 213-244
↵

Higginson I: Audit methods: A community schedule, Higginson I (ed):Clinical Audit in Palliative Care34-47Oxford, England, Radcliffe Medical Press, 1993
Higginson I: Audit methods: Validation and inpatient use, Higginson I (ed):Clinical Audit in Palliative Care48-54Oxford, England, Radcliffe Medical Press, 1993
↵

Mortimer JE, Bartlett NL: Assessment of knowledge about cancer pain management by physicians in training. J Pain Symptom Manage14:21-28, 1997

CrossRef Medline
↵

Portenoy RK: Cancer pain: General design issues, in Max MB, Portenoy RK, Laska EM (eds): Advances in Pain Research and Therapy (vol 18): The Design of Analgesic Clinical Trials. New York, NY, Raven Press, 1991, pp 233-266
↵

Bruera E, Watanabe S: New developments in the assessment of pain in cancer patients. Support Care Cancer2:312-318, 1994

CrossRef Medline
↵

Bruera E: Patient assessment in palliative cancer care. Cancer Treat Reviews22:3-12, 1996 (suppl A)

CrossRef
↵

Bruera E, Kuehn N, Miller MJ, et al: The Edmonton Symptom Assessment System (ESAS): A simple method for the assessment of palliative care patients. J Palliat Care7:6-9, 1991

Medline
↵

McCorkle R, Young K: Development of a symptom distress scale. Cancer Nurs1:373-378, 1978

Medline
↵

Higginson IJ, McCarthy M: Validity of the support team assessment schedule: Do staffs' ratings reflect those made by patients or their families? Palliat Med7:219-228, 1993

Medline
↵

Yates JW, Chalmer B, McKegney FP: Evaluation of patients with advanced cancer using the Karnofsky performance status. Cancer45:2220-2224, 1980

CrossRef Medline
↵

Sprangers MAG, Aaronson NK: The role of health care providers and significant others in evaluating the quality of life of patients with chronic disease: A review. J Clin Epidemiol45:743-760, 1992

CrossRef Medline
↵

Peteet J, Tay V, Cohen G, et al: Pain characteristics and treatment in an outpatient cancer population. Cancer57:1259-1265, 1986

CrossRef Medline
Grossman SA, Sheidler VR, Swedeen K, et al: Correlation of patient and caregiver ratings of cancer pain. J Pain Symptom Manage6:53-57, 1991

CrossRef Medline
↵

Cleeland CS, Gonin R, Hatfield AK, et al: Pain and its treatment in outpatients with metastatic cancer. N Engl J Med330:592-596, 1994

CrossRef Medline
↵

Yeager KA, Miaskowski C, Dibble SL, et al: Differences in pain knowledge and perception of the pain experience between outpatients with cancer and their family caregivers. Oncol Nurs Forum22:1235-1241, 1995

Medline
↵

Lampic C, Nordin K, Sjoden P: Agreement between cancer patients and their physicians in the assessment of patient anxiety at follow-up visits. Psycho-Oncol4:301-310, 1995
↵

Passik SD, Dugan W, McDonald MV, et al: Oncologists' recognitions of depression in their patients with cancer. J Clin Oncol16:1594-1600, 1998

Abstract/FREE Full Text
↵

Slevin ML, Plant H, Lynch D, et al: Who should measure quality of life, the doctor or the patient? Br J Cancer57:109-112, 1988

Medline
Blazeby JM, Williams MH, Alderson D, et al: Observer variation in assessment of quality of life in patients with oesophageal cancer. Br J Surg82:1200-1203, 1995

Medline
Kosmidis P: Quality of life as a new end point. Chest 109:110S-112S, 1996 (suppl 5)
Sigurdardottir V, Brandberg Y, Sullivan M: Criterion-based validation of the EORTC QLQ-C36 in advanced melanoma: The CIPS questionnaire and proxy raters. Qual Life Res5:375-386, 1996

CrossRef Medline
↵

Sneeuw KCA, Aaronson NK, Osoba D, et al: The use of significant others as proxy raters for the quality of life of patients with brain cancer. Med Care35:490-506, 1997

CrossRef Medline
↵

Sneeuw KCA, Aaronson NK, Sprangers MAG, et al: Value of caregiver ratings in evaluating the quality of life of patients with cancer. J Clin Oncol15:1206-1217, 1997

Abstract/FREE Full Text
↵

Brunelli C, Costantini M, Di Giulio P, et al: Quality-of-life evaluation: When do terminal cancer patients and health-care providers agree? J Pain Symptom Manage15:151-158, 1998

CrossRef Medline
↵

Butters E, Higginson I, George R, et al: Palliative care for people with HIV/AIDS: views of patients, carers and providers. AIDS Care5:105-116, 1993

Medline
Higginson I, Priest P, McCarthy M: Are bereaved family members a valid proxy for a patient's assessment of dying? Soc Sci Med38:553-557, 1994
Hinton J: How reliable are relatives' retrospective reports of terminal illness? Patients' and relatives' accounts compared. Soc Sci Med43:1229-1236, 1996
↵

Kristjanson L, Nikoletti S, Porock D, et al: Congruence between patients' and family caregivers' perceptions of symptom distress in patients with terminal illness. J Pall Care14:24-32, 1998

Medline
↵

Nekolaichuk CL, Bruera E, Maguire T: An examination of the reliability of the Edmonton Symptom Assessment System (ESAS) in a palliative care setting. J Pall Care14:125-126, 1998 (abstr 51)
↵

Nekolaichuk C, Bruera E, Spachynski K, et al: A comparison of patient and proxy symptom assessments in advanced cancer patients. Pall Med13:311-323, 1999
↵

Bruera E, Schoeller T, Wenk R, et al: A prospective multicenter assessment of the Edmonton Staging System for cancer pain. J Pain Symptom Manage10:348-355, 1995

CrossRef Medline
↵

Cleeland CS: The impact of pain on the patient with cancer. Cancer54:2635-2641, 1984

CrossRef Medline
↵

Zenz M: Morphine myths: Sedation, tolerance, addiction. Postgrad Med 67:S100-S102, 1991 (suppl 5)
↵

Pereira J, Hanson J, Bruera E: The frequency and clinical course of cognitive impairment in patients with terminal cancer. Cancer79:835-842, 1997

CrossRef Medline
↵

Higginson IJ: Can professionals improve their assessments? J Pain Symptom Manage15:149-150, 1998 (commentary)

CrossRef Medline
↵

Shavelson RJ, Webb NM: Generalizability Theory: A Primer. Newbury Park, CA, Sage, 1991
↵

Crocker L, Algina J: Introduction to Classical and Modern Test Theory. Fort Worth, TX, Court Brace, 1986
↵

Brennan RL: Elements of Generalizability Theory (revised ed). Iowa City, IA, ACT Publications, 1992
↵

Cronbach LJ, Gleser GC, Nanda H, et al: The Dependability of Behavioral Measurements: Theory of Generalizability of Scores and Profiles. New York, NY, John Wiley, 1972
↵

Folstein MF, Folstein S, McHugh PR: “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. J Psychiatric Res12:189-198, 1975

CrossRef Medline
↵

Kaasa T, Loomis J, Gillis K, et al: The Edmonton Functional Assessment Tool: Preliminary development and evaluation for use in palliative care. J Pain Symptom Manage13:10-19, 1997

CrossRef Medline

[1] ↵

National Cancer Institute of Canada: Canadian Cancer Statistics 1998. Toronto, ON, National Cancer Institute of Canada, 1998

[2] ↵

World Health Organization Expert Committee: Report on cancer pain relief and palliative care: Technical Series 804. Geneva, Switzerland, World Health Organization, 1990

[3] ↵

Bruera E: Research in symptoms other than pain, in Doyle D, Hanks G, MacDonald N (eds): Textbook of Palliative Medicine. London, England, Oxford, 1993, pp 87-92

[4] Breitbart W, Bruera E, Chochinov H, et al: Neuropsychiatric syndromes and psychological symptoms in patients with advanced cancer. J Pain Symptom Manage10:131-141, 1995

CrossRef Medline

[5] Vainio A, Auvinen A: Prevalence of symptoms among patients with advanced cancer: An international collaborative study. J Pain Symptom Manage12:3-10, 1996

CrossRef Medline

[6] ↵

Beeney LJ, Butow PN, Dunn SM: “Normal” adjustment to cancer: Characteristics and assessment, in Portenoy RK, Bruera E (eds): Topics in Palliative Care (vol 1). New York, NY, Oxford University Press, 1997, pp 213-244

[7] ↵

Higginson I: Audit methods: A community schedule, Higginson I (ed):Clinical Audit in Palliative Care34-47Oxford, England, Radcliffe Medical Press, 1993

[8] Higginson I: Audit methods: Validation and inpatient use, Higginson I (ed):Clinical Audit in Palliative Care48-54Oxford, England, Radcliffe Medical Press, 1993

[9] ↵

Mortimer JE, Bartlett NL: Assessment of knowledge about cancer pain management by physicians in training. J Pain Symptom Manage14:21-28, 1997

CrossRef Medline

[10] ↵

Portenoy RK: Cancer pain: General design issues, in Max MB, Portenoy RK, Laska EM (eds): Advances in Pain Research and Therapy (vol 18): The Design of Analgesic Clinical Trials. New York, NY, Raven Press, 1991, pp 233-266

[11] ↵

Bruera E, Watanabe S: New developments in the assessment of pain in cancer patients. Support Care Cancer2:312-318, 1994

CrossRef Medline

[12] ↵

Bruera E: Patient assessment in palliative cancer care. Cancer Treat Reviews22:3-12, 1996 (suppl A)

CrossRef

[13] ↵

Bruera E, Kuehn N, Miller MJ, et al: The Edmonton Symptom Assessment System (ESAS): A simple method for the assessment of palliative care patients. J Palliat Care7:6-9, 1991

Medline

[14] ↵

McCorkle R, Young K: Development of a symptom distress scale. Cancer Nurs1:373-378, 1978

Medline

[15] ↵

Higginson IJ, McCarthy M: Validity of the support team assessment schedule: Do staffs' ratings reflect those made by patients or their families? Palliat Med7:219-228, 1993

Medline

[16] ↵

Yates JW, Chalmer B, McKegney FP: Evaluation of patients with advanced cancer using the Karnofsky performance status. Cancer45:2220-2224, 1980

CrossRef Medline

[17] ↵

Sprangers MAG, Aaronson NK: The role of health care providers and significant others in evaluating the quality of life of patients with chronic disease: A review. J Clin Epidemiol45:743-760, 1992

CrossRef Medline

[18] ↵

Peteet J, Tay V, Cohen G, et al: Pain characteristics and treatment in an outpatient cancer population. Cancer57:1259-1265, 1986

CrossRef Medline

[19] Grossman SA, Sheidler VR, Swedeen K, et al: Correlation of patient and caregiver ratings of cancer pain. J Pain Symptom Manage6:53-57, 1991

CrossRef Medline

[20] ↵

Cleeland CS, Gonin R, Hatfield AK, et al: Pain and its treatment in outpatients with metastatic cancer. N Engl J Med330:592-596, 1994

CrossRef Medline

[21] ↵

Yeager KA, Miaskowski C, Dibble SL, et al: Differences in pain knowledge and perception of the pain experience between outpatients with cancer and their family caregivers. Oncol Nurs Forum22:1235-1241, 1995

Medline

[22] ↵

Lampic C, Nordin K, Sjoden P: Agreement between cancer patients and their physicians in the assessment of patient anxiety at follow-up visits. Psycho-Oncol4:301-310, 1995

[23] ↵

Passik SD, Dugan W, McDonald MV, et al: Oncologists' recognitions of depression in their patients with cancer. J Clin Oncol16:1594-1600, 1998

Abstract/FREE Full Text

[24] ↵

Slevin ML, Plant H, Lynch D, et al: Who should measure quality of life, the doctor or the patient? Br J Cancer57:109-112, 1988

Medline

[25] Blazeby JM, Williams MH, Alderson D, et al: Observer variation in assessment of quality of life in patients with oesophageal cancer. Br J Surg82:1200-1203, 1995

Medline

[26] Kosmidis P: Quality of life as a new end point. Chest 109:110S-112S, 1996 (suppl 5)

[27] Sigurdardottir V, Brandberg Y, Sullivan M: Criterion-based validation of the EORTC QLQ-C36 in advanced melanoma: The CIPS questionnaire and proxy raters. Qual Life Res5:375-386, 1996

CrossRef Medline

[28] ↵

Sneeuw KCA, Aaronson NK, Osoba D, et al: The use of significant others as proxy raters for the quality of life of patients with brain cancer. Med Care35:490-506, 1997

CrossRef Medline

[29] ↵

Sneeuw KCA, Aaronson NK, Sprangers MAG, et al: Value of caregiver ratings in evaluating the quality of life of patients with cancer. J Clin Oncol15:1206-1217, 1997

Abstract/FREE Full Text

[30] ↵

Brunelli C, Costantini M, Di Giulio P, et al: Quality-of-life evaluation: When do terminal cancer patients and health-care providers agree? J Pain Symptom Manage15:151-158, 1998

CrossRef Medline

[31] ↵

Butters E, Higginson I, George R, et al: Palliative care for people with HIV/AIDS: views of patients, carers and providers. AIDS Care5:105-116, 1993

Medline

[32] Higginson I, Priest P, McCarthy M: Are bereaved family members a valid proxy for a patient's assessment of dying? Soc Sci Med38:553-557, 1994

[33] Hinton J: How reliable are relatives' retrospective reports of terminal illness? Patients' and relatives' accounts compared. Soc Sci Med43:1229-1236, 1996

[34] ↵

Kristjanson L, Nikoletti S, Porock D, et al: Congruence between patients' and family caregivers' perceptions of symptom distress in patients with terminal illness. J Pall Care14:24-32, 1998

Medline

[35] ↵

Nekolaichuk CL, Bruera E, Maguire T: An examination of the reliability of the Edmonton Symptom Assessment System (ESAS) in a palliative care setting. J Pall Care14:125-126, 1998 (abstr 51)

[36] ↵

Nekolaichuk C, Bruera E, Spachynski K, et al: A comparison of patient and proxy symptom assessments in advanced cancer patients. Pall Med13:311-323, 1999

[37] ↵

Bruera E, Schoeller T, Wenk R, et al: A prospective multicenter assessment of the Edmonton Staging System for cancer pain. J Pain Symptom Manage10:348-355, 1995

CrossRef Medline

[38] ↵

Cleeland CS: The impact of pain on the patient with cancer. Cancer54:2635-2641, 1984

CrossRef Medline

[39] ↵

Zenz M: Morphine myths: Sedation, tolerance, addiction. Postgrad Med 67:S100-S102, 1991 (suppl 5)

[40] ↵

Pereira J, Hanson J, Bruera E: The frequency and clinical course of cognitive impairment in patients with terminal cancer. Cancer79:835-842, 1997

CrossRef Medline

[41] ↵

Higginson IJ: Can professionals improve their assessments? J Pain Symptom Manage15:149-150, 1998 (commentary)

CrossRef Medline

[42] ↵

Shavelson RJ, Webb NM: Generalizability Theory: A Primer. Newbury Park, CA, Sage, 1991

[43] ↵

Crocker L, Algina J: Introduction to Classical and Modern Test Theory. Fort Worth, TX, Court Brace, 1986

[44] ↵

Brennan RL: Elements of Generalizability Theory (revised ed). Iowa City, IA, ACT Publications, 1992

[45] ↵

Cronbach LJ, Gleser GC, Nanda H, et al: The Dependability of Behavioral Measurements: Theory of Generalizability of Scores and Profiles. New York, NY, John Wiley, 1972

[46] ↵

Folstein MF, Folstein S, McHugh PR: “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. J Psychiatric Res12:189-198, 1975

CrossRef Medline

[47] ↵

Kaasa T, Loomis J, Gillis K, et al: The Edmonton Functional Assessment Tool: Preliminary development and evaluation for use in palliative care. J Pain Symptom Manage13:10-19, 1997

CrossRef Medline

Journal of Clinical Oncology

Assessing the Reliability of Patient, Nurse, and Family Caregiver Symptom Ratings in Hospitalized Advanced Cancer Patients

Abstract

PATIENTS AND METHODS

Design

Sample

Measures

Data Collection Procedures

Statistical Analysis

RESULTS

How Consistent are the Ratings by Different Raters Over Time When Averaged Across Symptoms?

How Consistent Are the Ratings by Different Raters Across Time for a Specific Symptom?

How Many Raters and Occasions Would be Required to Obtain a Reasonable Estimate of Reliability for a Specific Symptom?

DISCUSSION

Footnotes

References

This Article

Classifications

Services

Citing Articles

Google Scholar

PubMed

Social Bookmarking

Navigate This Article

Current Issue