Evaluation of Cognitive Function Associated With Chemotherapy: A Review of Published Studies and Recommendations for Future Research

  1. Ian F. Tannock
  1. From the Princess Margaret Hospital; and St Michael's Hospital, Toronto, Ontario, Canada
  1. Address reprint requests to Ian Tannock, MD, PhD, FRCPC, Princess Margaret Hospital, Department of Medical Oncology, 610 University Ave, Toronto, ON M5G 2M9, Canada; e-mail: Ian.Tannock{at}uhn.on.ca

Abstract

Purpose There is evidence that some cancer survivors suffer cognitive impairment after chemotherapy. Determining if a patient has cognitive impairment is challenging, especially because impairment is usually subtle.

Patients and Methods We assessed the design of studies evaluating cognitive function during or after chemotherapy in adult patients with solid tumors. We also reviewed methods used to evaluate cognitive function in subjects with other diseases and make recommendations for future studies.

Results We identified 22 studies that met our criteria: 82% included women with breast cancer. Eight studies were longitudinal, 12 were cross-sectional, and two were follow-ups of cross-sectional studies. Sixteen studies used a battery of neuropsychological (NP) tests to assess subjects, and 13 included a control group. Ten studies (45%) had no explicit definition of cognitive impairment; most others used z scores or T scores and defined impairment based on standard deviations below the mean, but there was no consistency in for the cutoff point used or the number of tests required.

Conclusion There is no consistency in defining cognitive impairment, in the NP batteries used, or in statistical methods in studies of cognitive function of cancer patients. We suggest guidelines to define criteria for cognitive impairment. Use of summary scores and control groups is recommended. Practice effect should be adjusted for in longitudinal studies. A balance is needed between comprehensive batteries and briefer tests, which still need to be sensitive to mild impairment.

INTRODUCTION

Evidence is emerging that some cancer survivors suffer cognitive impairment as a result of chemotherapy.1-8 While many patients report decline in cognitive function after chemotherapy, the impairment is usually subtle and may occur intermittently, so it can be quite difficult to obtain objective evidence of cognitive impairment, and to determine the domains that are affected. Herein we review articles that have evaluated cognitive function in adults who have received chemotherapy, focusing on methodologic aspects, rather than the incidence and extent of cognitive impairment, which has been described in review articles.9-18 Based on our review, we make recommendations for the design of future cognitive studies.

PATIENTS AND METHODS

A literature search was performed using MEDLINE, PsychINFO, PubMed, and the Cochrane Database of Clinical Trials using the following keywords: cognition or cognitive disorders or cognitive function or neuropsychological or neurocognitive and chemotherapy or antineoplastic agents or cancer. The search strategy was restricted to subjects older than 18 years and to publications in English from 1966 to May 2006. The review was limited to original studies in patients with solid malignancies (excluding those with brain tumors, metastases, or brain irradiation), who had received or were receiving chemotherapy. Studies were excluded if they used brief screening tests, such as Folstein's Mini Mental Status Examination,19-22 subjective measures, or electrophysiological tests as the only test of cognitive function, or if they had fewer than 15 patients. The reference list of each of the relevant studies was also searched to identify any further studies.

Summary statistics were used to describe the number and type of studies and the frequency of methods that were used.

OVERVIEW OF PUBLISHED STUDIES

The search strategy revealed 1,182 articles: 135 articles were pertinent to assessing cognitive function in cancer patients but only 22 met our criteria. Two studies prepublished electronically,23,24 and three studies combining patients with solid tumors and hematologic malignancies were included.2,25,26

Table 1 summarizes the 22 studies that evaluated cognitive function during and/or after chemotherapy for patients with solid tumors.1-6,8,23-38 Sixteen studies (73%) were exclusive to women with breast cancer, and only six studies (27%) included treatment groups with more than 50 patients.6,26-30 Eight studies were longitudinal, with six (29%) having baseline assessments before chemotherapy,24,29-33 while 12 (55%) are cross-sectional with cognitive assessment on only one occasion, and two are follow-ups of earlier studies.27,34

Table 1.

Summary of Cognitive Articles

Seventeen studies (77%) evaluated cognitive function with a battery of seven to 16 neuropsychological (NP) tests, with most taking 2 to 3 hours to administer. Four studies (18%) used the High Sensitivity Cognitive Screen (HSCS)1,6,23,27 and three (14%) included a computerized NP assessment.8,23,35

Thirteen studies (59%) included a control group: seven (32%) compared groups with different types of treatment or stages of disease (ie, patients who received chemotherapy v patients who received local treatment only)2-4,24,28,34,35 and six (27%) compared patients receiving chemotherapy with healthy controls.1,6,8,27,30,32 Three of eight longitudinal studies included a longitudinal control group.24,30,32

Ten studies (45%) based their definition of cognitive impairment on the number of standard deviations (SD) below control scores (or normative data) on one or more cognitive tests, using standardized Z or T scores, with the majority using more than 2 SD as their cutoff point. Ahles et al2 defined overall impairment as being in the lowest quartile on four or more of nine domains. The studies using the HSCS classified cognitive impairment as a moderate or severe category summary score.1,6,23,27 Seven studies (32%) did not define cognitive impairment,8,24-26,29,36,37 and two reported a cutoff for impairment on individual tests but no definition for overall cognitive impairment.28,38

Four studies defined a reliable change index (RCI) with cognitive decline outside the 90% to 95% CI constituting significant change.23,30-32 Overall, 12 studies (55%) computed a summary or impairment score for evaluating cognitive function.1-6,8,23,27,28,34,35

LESSONS FROM THE PUBLISHED STUDIES AND FROM EVALUATION OF COGNITIVE FUNCTION IN SUBJECTS WITH OTHER DISEASES

NP Tests

NP tests should be valid, reliable, and have good sensitivity and specificity. Validity refers to whether an instrument measures what it purports to measure39 and implies that the test(s) provide an appropriate evaluation of cognitive function. They should evaluate all of the important cognitive domains, and there should be an established relationship between test scores and a reference standard.40

Reliability is a measure of error that is inherent in an instrument.41 Reliability implies internal consistency, so that there is correlation between test items that measure similar attributes, and reproducibility, which requires stability over time on repeated administration (intrarater reliability) and the same results when administered by different raters (inter-rater reliability).39,41 A useful instrument must discriminate between subjects with normal and abnormal cognitive function. Sensitivity is a measure of the ability of the test(s) to correctly identify subjects who have cognitive impairment, while specificity is the ability to correctly determine those who do not have impairment.42 Tests that are administered to the same subjects more than once need to be responsive—to detect change when it occurs.43 Responsiveness is confounded by practice effect, whereby subjects perform better on subsequent tests, and this needs to be accounted for in longitudinal studies.

Most neuropsychologists recommend a comprehensive battery of NP tests as the gold standard for assessing cognitive function, but vary in opinion as to which tests, and how many, should be incorporated into a battery. A test battery should include assessment of the full range of psychological functions, both general and specific, allow comparison to a control group or normative data, and include validated tests with good sensitivity and specificity.44 Extensive batteries take several hours,45,46 and require a trained psychometrist to administer them and a neuropsychologist to interpret them.47 An 8-hour battery has been recommended for subjects with HIV,45 but is not feasible for large trials or for patients about to receive chemotherapy, many of whom have had a recent diagnosis of cancer and major surgery.

Testing of cognitive function is a compromise between extensive batteries, which are likely to be more sensitive and detect more subtle impairment, and shorter tests that are more practical for serial assessment of patients receiving chemotherapy. The latter seek to determine if impairment is present, and to indicate whether in-depth testing is warranted. Heaton and colleagues have deleted tests that measure similar attributes, thereby reducing the 8-hour HIV battery to 3- to 4-hour and 1.5-hour options that focus on functions most likely to be impaired, with incorporation of a wide enough range of tests to detect less common impairments.48-50 They compared sensitivity and specificity of NP batteries of different lengths and reported minimal loss in sensitivity with the shorter battery.42 Focused NP batteries can have high specificity (up to 0.98) and good sensitivity (> 0.70) for detecting impairment, correctly classifying 87% in one sample.51

Most of the studies reviewed here used comprehensive NP batteries. The very short Mini Mental Status Examination is not a useful screening test for subtle cognitive impairment.52 The HSCS, used in four studies,1,6,23,27 can be administered in 25 minutes, and is reported to be reliable and valid,53,54 but its sensitivity to detect subtle cognitive impairment is unknown, and it is subject to practice effect.23 There is insufficient information to support the use of other brief summary tests.

Computerized tests have been developed to replicate traditional tests and to provide briefer screens. Three studies used computer tests as part of their NP battery.8,23,35 Potential advantages and disadvantages of computer tests are outlined in Table 2. 47,55-58 They are useful, but despite considerable overlap, they do not necessarily provide the same information as traditional tests.55,56

Table 2.

Computerized NP Tests

TYPES OF IMPAIRMENT

The reviewed studies are inconsistent in describing types of cognitive dysfunction, although commonly affected domains include: complex attention/concentration, verbal and visual memory, and processing speed.1-8 Similar types of cognitive impairment occur in patients with early HIV infection45,51 and multiple sclerosis,59 where cognitive dysfunction is spotty, subtle, and variable,51,60 and consistent with subcortical abnormality.45,51,61-64 Testing has focused on attention, processing speed, memory, learning, retrieval, language, visuo-perception, constructional abilities, motor skills, and executive function,45,50,51,65 and it seems appropriate to also focus on these areas for cancer patients.

ANALYSIS OF TEST RESULTS

Defining Cognitive Impairment

Most NP tests report their results as Z (mean = 0; SD = 1) or T (mean = 50; SD = 10) scores, thus indicating how many SDs an observation is from the mean for a normal population. A definition of cognitive impairment is usually based on cutoff scores for the various tests. Receiver-operator characteristic curves can evaluate sensitivity and specificity for various threshold scores to determine the cutoff point that gives the most appropriate balance between type I and type II errors,66 but this is rarely done. Subtle cognitive impairment is most commonly defined as more than 1 SD below the mean, while more severe impairment is defined by a cutoff at 1.5 to 2 SD below the mean.61,67 The definition of cognitive impairment is complex because NP batteries include several tests, which are scored individually. If an abnormal result on any single test is used to define cognitive impairment, the probability of falsely classifying a subject as impaired increases with the number of NP tests, such that with 20 tests and a significance level of 5% the probability of at least one significant result given no true difference is 0.64.68 Most studies require impairment on more than one measure to classify a subject's performance as abnormal, but false positives and negatives will still occur.

Different criteria that have been used to indicate cognitive impairment in the published studies are summarized in Table 1. Lack of consistency in the definition of cognitive impairment makes comparisons between them difficult.

Control Groups

Classification of impairment depends on the comparator population: the distribution of test results for normal subjects depends on age, education, and other factors,40,44,69 and the comparator population should be as similar as possible to the experimental group. For example, women with breast cancer are often highly educated: they may test as normal compared with the age-adjusted general population, but this may be quite abnormal for them. A concurrent control group is preferred to using standard population based scores, but there are inherent problems in obtaining valid control groups for patients with cancer. It is unethical to perform a randomized trial of chemotherapy versus observation where chemotherapy is known to improve survival. Thus, control groups of patients who also have cancer will generally include those with more favorable stages of disease, in which chemotherapy is not recommended. The alternative is to obtain a healthy control group matched for age and socioeconomic status, but evaluation in the patient group is confounded by the stress associated with a recent diagnosis of cancer and surgery.

The aforementioned problems may have influenced interpretation of a study highlighting the presence of cognitive dysfunction before chemotherapy31: comparison was with standardized scores rather than a control group, the study combined subjects who completed different numbers of NP tests, and subjects were classified as impaired based on one poor test score.

Group Means

Eleven of 22 studies included a comparison of mean scores on NP tests between subjects and a control group, although it was the primary method of analysis in only seven of them. Reporting only group means can obscure subtle cognitive impairment because scores of high-achieving individuals may counterbalance those with moderate impairment.48,62,70 For longitudinal data, comparison of group means is confounded by floor and ceiling effects because subjects with a low baseline may not decline by a further SD below the population mean, and patients with initial high scores require a proportionately larger drop from baseline.67,71,72 When group means are used, they should be accompanied by the proportion of subjects classified as cognitively impaired based on a predefined criterion.48

Demographically Corrected T Scores

Heaton and his colleagues recommend converting individual raw test scores to standardized scores and then calculating demographically corrected T scores using normative data corrected for age, education, sex, and where possible, ethnicity.60-62,73 This approach enables comparison of actual and expected scores for each individual. Using standard cutoffs for defining impairment gives consistent diagnostic specificity for test measures, with a known rate of false positives. Lowering the cutoff reduces the false positive rate but at the expense of increasing false negatives.74 This approach enables a comparison of scores across tests and between groups, and facilitates comparison of strengths and weaknesses in specific domains of cognitive function.

Deficit Scores

One way to mitigate the problem of multiple tests is to compute a summary NP impairment score to reflect overall performance.48 Deficit scores can be calculated for each domain and then an average computed: they are weighted by both the number and severity of deficits and have been used widely to define cognitive impairment.55,60,61 A method used to convert individual T scores to the deficit rating is presented in Table 3, with scores ranging from 0 (no impairment) to 5 (severe impairment). Impairment ratings for each test are then averaged to create a global deficit score (GDS) for each subject.48,51,55,60,62

Table 3.

Conversion for Transforming T Scores Into Deficit Scores62

A GDS of ≥ 0.5, roughly equivalent to averaging mild impairment on 50% of the tests, has been proposed as the optimal cutoff point for detecting cognitive impairment.55,61,62 Using the clinical ratings as the gold standard, the GDS, with a cognitive impairment cutoff of 0.5 in HIV-positive patients, showed sensitivity of 0.77, specificity of 0.92 to 0.96, positive predictive value of 0.88, negative predictive value of 0.83, and a likelihood ratio (binary) of 9.38.60-62 An optimal cutoff for cancer patients has not been determined. The GDS is more sensitive in detecting mild NP impairment than group means,60,61 and therefore has lower type II (false negative) error. Four of the 22 reviewed studies calculated a GDS.

Neuropsychological Clinical Ratings

Clinical rating of NP tests to assess cognitive impairment has been recommended by the National Institute of Mental Health workgroup in HIV45,61 and has been used in other disease states.62,75,76 Neuropsychologists are blinded to patient details, other than test data and demographics, and assign the clinical ratings using standardized guidelines. A nine-point scale (from 1 = above average to 9 = severely impaired) can be used for each functional domain, and a separate rating derived for global NP function based on the domain ratings. A score of 5 or higher indicates significant impairment in a given domain. Diagnosis of global impairment requires impairment in at least two of eight functional domains.60,61,77 This method gives more weight to mild deficits suggestive of an acquired brain disorder, and less weight to patterns indicative of developmental disabilities or low education.60 It is particularly suitable for detecting subtle and spotty cognitive impairment,45,60,77 but requires an experienced neuropsychologist and is time consuming. A high correlation between GDS and clinical ratings has been demonstrated in the HIV setting.60,62 Neuropsychological clinical ratings were not used in any of the reviewed studies.

REPEATED MEASURES AND LONGITUDINAL CHANGES

An individual's cognitive function is likely to decline initially during chemotherapy and then either stabilize or improve after ceasing treatment. Unless the NP battery is responsive, and testing is performed at appropriate time intervals, the decline may be missed. Longitudinal studies must differentiate between true change in performance, practice effect, and chance variation.

Extreme test scores in individuals at baseline tend to regress to the mean of the sample on subsequent assessments.58,70,78 Tests with poor reliability are more likely to have greater regression to the mean due to increased measurement error.58,78

Practice Effect

Practice effect results in improved performance on repeat testing and is influenced by age, education level, the time interval between tests, and the number of reassessments.48,70,79,80-82 The use of alternative forms of a test reduces, but does not eliminate, practice effect78,79,83 and alternative versions must be equally sensitive and valid. Practice effect is most pronounced between the first and second assessments, with higher test-retest reliability between subsequent tests.70,84 Thus, one way to minimize practice effect is to give subjects practice trials before recording a baseline.78,85 Ideally, a control group should be tested at the same times to determine and correct for practice effect.

Of the 22 studies reviewed only eight were longitudinal in design and these were published recently (2004 to 2006).23-25,29-32,37 The HSCS was particularly susceptible to practice effect.23

Different methods used to analyze longitudinal changes in cognitive function are reviewed in the following sections.

Mean Change

Reporting mean change in individual or group scores is simple, but does not consider variability within each assessment and may overestimate changes in cognitive function.78,86,87 Group mean change from baseline was reported in the study of O'Shaughnessy et al.37 As acknowledged by the authors, it may not reflect individual changes and patients might improve their score due to practice effect rather than improved cognitive function.

Percentage Change

A simple definition of change used in some longitudinal studies is a percentage change (most commonly 20%) in scores from baseline.67,72,88,89 This does not account for practice effect but does allow results to be compared across studies.

Cohen's d

Cohen's d is a statistical parameter that measures the magnitude of a treatment effect: it is the difference between the group means divided by the pooled SD.90,91 Cohen's d can be used to compare practice effects between groups or tests.70,90

RCI

The RCI compares the change in individual test scores with changes in a control population to determine if the change is greater than would be expected by measurement error alone. It is calculated by subtracting the score for the first assessment (X1) from the second assessment (X2) and then dividing by the SE of the difference in the control population (RCI = X2-X1/SEdiff). An RCI score can be interpreted as a z score, with changes greater than a designated cutoff point (generally the 95th percentile with a z score of 1.96) being unlikely due to chance, and therefore regarded as a true change.78,92 An adaptation of the RCI method includes an adjustment for practice effect based on a normative sample.78 A subject's follow-up score can be predicted by adding the mean practice effect of a control group to the subjects' baseline test result,93 although practice effect may differ in experimental and control groups.92

The RCI includes an estimate of reliability, and provides a cutoff above which changes can be classified as important. It requires a control group and only partially accounts for regression toward the mean.58 The method has been used extensively in other patient populations.87,93,94 Four of the reviewed studies estimated a RCI with a 90% to 95% CI,23,30-32 although in two of them normative data were used in place of a control group.

Regression

Linear regression models use control data to predict longitudinal scores from the baseline score; they require a control group with serial measures. When the observed difference is significantly greater than the predicted score, it is likely that a true change in cognitive function has occurred.92 Multiple regression is able to account for factors that may influence repeat measures, such as test-retest interval and demographic factors.78,92 A comparative study in neurologically stable subjects demonstrated that regression models were better able to predict subsequent assessments than use of the RCI, and produced narrower CIs.92,95,96 Regression techniques have been used to assess change in patients with traumatic brain injuries,97 HIV,93 and postcardiac bypass surgery.87 Three of the reviewed studies used a regression model.2,29,30

RECOMMENDATIONS

We recommend a compromise between NP batteries that are brief enough to be incorporated into clinical trials for cancer patients, yet comprehensive enough to detect an underlying problem, if it exists. The selected measures need to be sensitive to a wide range of impairments, specific, suitable for serial repetition in a longitudinal study, and have minimal practice effect. The ideal test should have established responsiveness in cancer patients and have demographically corrected normative data available.51,66,98 One example of a suitable battery for cancer patients, and the domains assessed, is given in Table 4. Similar tests could be substituted or the battery extended for a more comprehensive assessment.

Table 4.

Suitable Battery for Use in Future Studies of Cognitive Dysfunction Associated With Chemotherapy

Statistical analysis will depend on the study design and the available data (ie, is there a control group, will there be sequential testing?) We recommend using a GDS derived from demographically corrected T scores to classify cognitive impairment, thus minimizing problems associated with multiple tests.

Longitudinal data with testing before chemotherapy gives important information not obtained in a cross-sectional study. Whenever possible control groups should be used; the ideal is to have one group of controls with malignancy who do not require chemotherapy and a second healthy control group. Analysis of longitudinal data requires tests with good test-retest reliability and must account for practice effect. Use of a RCI adjusted for practice effect, or a regression model, is recommended.

Investigation of cognitive impairment associated with chemotherapy is an important area of research that presents methodologic challenges. Those conducting such research should learn from the experience of diagnosing cognitive impairment in other populations, and adopt those methods most suitable for cancer patients.

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The author(s) indicated no potential conflicts of interest.

AUTHOR CONTRIBUTIONS

Conception and design: Janette Vardy, Ian F. Tannock

Financial support: Ian F. Tannock

Collection and assembly of data: Janette Vardy

Data analysis and interpretation: Janette Vardy, Sean Rourke, Ian F. Tannock

Manuscript writing: Janette Vardy, Ian F. Tannock

Final approval of manuscript: Janette Vardy, Sean Rourke, Ian F. Tannock

Acknowledgments

We thank Jolie Ringash, MD, MSc, FRCPC, and Greg Pond for reviewing the manuscript.

Footnotes

  • published online ahead of print at www.jco.org on May 7, 2007.

  • Supported by grants from the National Cancer Institute of Canada and the Susan G. Komen Foundation, and by an ASCO Young Investigator Award (J.V.).

  • Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.

  • Received July 13, 2006.
  • Accepted March 1, 2007.

REFERENCES

| Table of Contents
  • Advertisement
  • Advertisement
  • Advertisement