- © 2008 by American Society of Clinical Oncology
Developing Clinical Recommendations for Breast, Colorectal, and Lung Cancer Adjuvant Treatments Using the GRADE System: A Study From the Programma Ricerca e Innovazione Emilia Romagna Oncology Research Group
- Rossana De Palma,
- Alessandro Liberati,
- Giovannino Ciccone,
- Elena Bandieri,
- Maurizio Belfiglio,
- Manuela Ceccarelli,
- Maurizio Leoni,
- Giuseppe Longo,
- Nicola Magrini,
- Maurizio Marangolo and
- Fausto Roila
- From the Agenzia Sanitaria Regionale Regione Emilia Romagna, Bologna; Università degli Studi di Modena e Reggio Emilia; Centro Valutazione Efficacia Assistenza Sanitaria, AUSL Modena; Centro Prevenzione Oncologica, Azienda Ospedaliera Molinette, Torino; Consorzio Mario Negri Sud, Santa Maria Inbaro, Chieti, Italy; Azienda Sanitaria Locale Ravenna; and the Azienda Ospedaliero Policlinico, Perugia, Italy
- Corresponding author: Rossana De Palma, MD, PRI-ER Oncology Research Group, Agenzia Sanitaria Regionale, Viale Aldo Moro 21, Bologna, Italy; e-mail: rdepalma{at}regione.emilia-romagna.it
Abstract
Purpose In the area of anticancer drugs, the legitimate search for effective interventions can be jeopardized by the strong pressure for accelerated approval, which may hinder the full assessment of their benefit-risk profile. We aimed to produce drug-specific recommendations using an explicit approach that separates the judgments on quality of evidence from the judgment about strength of recommendations.
Materials and Methods We used the GRADE (Grades of Recommendation, Assessment, Development, and Evaluation) system to develop recommendations for the use of specific anticancer drugs/regimens; 12 clinical questions relevant to adjuvant treatment of breast (three), colorectal (four) and lung (five) cancer have been assessed by multidisciplinary panels supported by a group of methodologists.
Results For nine of 12 questions, recommendations were produced (one strong and six weak in favor and one weak and one strong against the index treatment); for the remaining three questions no specific course of action could be recommended. The perceived benefits to risk balance of the treatment was the most important and statistically significant (P < .01) predictor of panels’ recommendations and of their strength, whereas panelists’ personal (age, sex) and professional (specialty) characteristics were not statistically associated.
Conclusion Because the GRADE system sets out an explicit process going from evaluation of the quality of evidence and benefit-risk profile to the judgment of the strength of recommendations, in this experience, it proved very useful to combine methodologic rigor with the interdisciplinary participation that is important in the definition of evidence based clinical policies.
INTRODUCTION
Health care systems are not well equipped to deal systematically with innovation. Traditional health technology assessment (HTA) is based on a posteriori evaluation of interventions already entered clinical practice, and it is, therefore, difficult to carry out the kind of evaluation that would be necessary.1-3
The Emilia Romagna Health Care Agency launched a special program—PRI-ER, or Programma Ricerca e Innovazione dell’Emilia Romagna—aimed at systematically introducing evaluative methods within its health care system, targeting promising innovations or interventions whose benefit-risk profile appears still uncertain.4
The area of anticancer treatments is an obvious candidate for HTA activities. Anticancer drugs well represent the changes occurring in the field of drug development and registration, where new compounds are often registered with a still largely immature benefit-risk profile.5-8 New and often expensive molecules, in fact, enter clinical practice with limited evidence of effectiveness and safety and ill defined indication(s), leaving a potential for inappropriate use.
In this project, we focused on tumors whose frequency, health care burden, and use of medical treatments is common and for which evidence-based practice guidelines already exist. Within these tumors (breast, colorectal, and lung) we set out to produce drug-specific recommendations, similar to Cancer Care Ontario's Program in Evidence-Based Care,9-11 targeting specific open clinical questions for adjuvant treatment. We also aimed at identifying open research questions where the program could promote confirmatory trials.
We used the GRADE (Grades of Recommendation, Assessment, Development, and Evaluation) system, recently proposed to overcome shortcomings of previous approaches,12-14 which is based on a sequential assessment of the quality of evidence followed by an analysis of the benefit-risk balance and subsequent judgment about the strength of recommendations.
This article reports on our experience and discusses the relationship(s) among panel members’ characteristics, quality of scientific evidence, benefit-risk balance of any given drug regimen, and the direction and strength of the recommendation(s).
MATERIALS AND METHODS
This project was based on the steps briefly described in detail in the following sections and in Table 1.
Definition of the Project's Objectives
A kick-off workshop was convened in February 2005 to discuss experiences in cancer guidelines programs in the United Kingdom, Canada, and France.15 This helped focusing our aims on the production of drug-specific recommendations involving multidisciplinary panels.
Identification of the Coordinating Group
A 10-person coordinating group (CG)—five members with expertise in oncology and five in critical appraisal and research synthesis—oversaw the process. The CG was appointed by the regional agency with the tasks of (a) undertaking the initial literature review; (b) preparing the training material and the “summary of findings tables” needed for the panel to formulate recommendations; and (c) chairing panel meetings and drafting the initial versions of the recommendations. Details on the literature review are available from the authors.
The Multidisciplinary Panels, Disclosure of Conflicts of Interest, and Clinical Questions
We convened three multidisciplinary panels on adjuvant treatment for breast, colorectal, and lung cancers. Panel members were chosen to include representatives of hospitals from around the region and to represent all relevant specialties/expertise (medical oncologists, radiotherapists, surgeons, pathologists, internists, pneumonologists, pharmacists, and patient representatives). All but five of those invited agreed to participate (Table 2). Each panel members was asked to disclose any tie he/she had in the last 5 years with pharmaceutical companies manufacturing the drugs considered in the recommendations. Of 57 panel members (16 medical oncologists and 41 others), none was a regular consultant, two (3%) received a research grant of more than €30,000, and 15 (26%) participated in trials using the index drugs (12 were medical oncologists).
The identification of clinical questions was guided by consideration of (a) the relative importance of the treatment; (b) the lack of conclusive recommendations in existing guidelines; and (c) the interest of the local oncology community. This way, 12 clinical questions (three for breast, four for colorectal, and five for lung cancer) were identified (Table 3).
The GRADE System and Its Application in Our Study
To develop our recommendations, we used GRADE because it represents an explicit assessment of the quality of evidence, the balance between benefits and risks, and the strength of recommendations. Separation of the judgments on quality of evidence and strength of recommendations is a critical and defining feature of GRADE. Five limitations—related to study quality, consistency, directness, precision, and reporting bias—may lead to its downgrading. Large effects and dose-response gradient can lead to upgrading quality of evidence.12 Given the type of clinical questions we addressed (relatively new drugs with only a few trials available) and the type of studies eligible (only randomized controlled trials) we could downgrade the evidence only if one of the aforementioned drawbacks occurred. The main criteria used were presence of serious limitations in study conduct, duration of follow-up, and type (relevance) of end points used. We downgraded quality from “good” to “fair” whenever one or both of the following occurred: (a) follow-up less than 5 years or (b) disease-free instead of overall survival as the main outcome.
Making recommendations then involves tradeoffs between benefits and harms and therefore four elements should be considered: tradeoffs, quality of evidence, translatability of evidence into a specific setting and uncertainty about the baseline risk for the population.
Using these evaluations, recommendations can be classified into four mutually exclusive categories: Do it, probably do it, probably don’t do it, don’t do it. In this study, we also allowed panels to abstain from making a recommendation—adding specific suggestions for new studies to be undertaken—when evidence was too sparse. All steps in the process are shown in Table 1.
Panel Activities
After identification of panel members, a first meeting was held to introduce the GRADE system and its key features in comparison with other existing approaches. Overall, the panels had seven meetings as they completed the following tasks.
During the first meeting, panel members refined the clinical questions and choose the outcomes of interest relevant to deciding whether a given adjuvant treatment is worth recommending. Then, they individually voted using a scale of 1 to 9 on whether each outcome should or should not be considered in the assessment.
Between the second and third meeting, the CG identified relevant studies and prepared for each relevant outcome “evidence tables,” with short comments on all the predefined dimensions of quality (“ie, study design, study quality, consistency, and directness); quantitative summaries of effect for each outcome were also provided (copies of this material are available, in Italian, from the authors). The CG was also in charge of producing “summaries of findings tables” providing data on absolute and relative risk reduction on the outcomes previously identified as critical for the decision. At the third meeting, the material was presented and discussed. Between this and the subsequent meeting, panel members were asked to individually rate the quality of evidence (for each item separately, and then across all items), the balance between benefits and risks, and the draft recommendation. Provisional results were presented in draft form to panel members, highlighting agreement and disagreement. Final adjudication of the recommendation (s) was made after extensive discussion and, if unanimity could not be reached, by majority rule.
External Review of the Recommendations Before Dissemination
The CG prepared a draft of the final document. This was fed back to panel members and subsequently presented to external reviewers (n = 18) asked to comment on the best format but not on the content of the recommendations. The material was prepared in hard-copy and electronic format.16
Data Analysis
Data were collected though hard-copy forms during the panels face-to-face meetings, and via e-mail between meetings. Information about panel members’ personal characteristics (age, sex, specialty) and individual judgment of the quality of evidence, benefit-risk profile and judgment about the recommendations were collected individually.
To evaluate the potential influence of individual panel members’ characteristics and of the main variables considered by the GRADE system on the final strength and direction of the recommendations, we analyzed all 189 valid ratings obtained by three panels (comprising a total of 57 members) who assessed the evidence and voted for 12 recommendations. Thirty-five votes (of the total of 234 that should have been expressed) were not available because individual panel members did not vote or were absent at a meeting.
Logit regression models were used to analyze predictors of three dependent variables: (a) quality of evidence (high or low); (b) balance between benefits and risk (positive, uncertain, negative); (c) strength and direction of recommendations (strong positive, weak positive, uncertain, weak negative, strong negative). Adjusted odds ratios and 95% CIs were estimated with binary or ordered (for dependent variables with more than two ordinal outcomes) logit models, with a robust variance estimator. Individual panelists were used as clusters.
RESULTS
Panels completed their activities through seven face-to-face meetings (including two initial training sessions). Panel members went through the steps illustrated in the Materials and Methods section (and in Table 1), and each member undertook an individual appraisal of the summaries of findings tables provided by the CG.
Analysis of Panels’ Performance of the Steps of the GRADE System
Within different clinical questions, the distribution of panel judgments on the quality of evidence, balance of benefits and risk, and strength of recommendation for each clinical question varied (Fig 1). Overall, there was always variation in the assessment of quality and evaluation of the balance between benefits and risks. This, in turn, led to differences in the strength of recommendations, suggesting different criteria used by panel members in assessing the quality of available information and the influence of the supporting evidence (or lack thereof).
Table 4 illustrates the distribution of the row data for the 189 ratings expressed by panel members relative to the 12 clinical scenarios. Overall, quality of evidence was rated high/intermediate in 79% of cases. The benefit-risk balance was rated “uncertain” in approximately half of the cases (48%), with little variation across disease sites (range, 43% to 53%). By contrast, wide variation emerged in the proportions indicating a positive benefit-risk balance for breast and colorectal (53% and 52% respectively) versus lung (20%) cancer treatments.
A “weak positive” recommendation was the most frequent option chosen by panelists (51%; range, 65% for breast to 45% for lung). The “uncertain” category accounted for 47 votes (25%).
None of the panelists’ characteristics (specialty, age, sex) was associated with any of the judgments analyzed. Oncologists rated quality of evidence higher compared with all other panelists, whereas no differences resulting from specialty were detected in the judgments of the benefit-risk balance and strength of recommendations.
Data reported in Table 4 helps explore the internal consistency of the method. When quality of evidence was “high-intermediate” (n = 150), the benefit-risk profile was more often “favorable” (74% to 49%) compared with when quality was “low” (2% to 5%). Better quality of evidence was associated with positive recommendations: 61% weak and 13% strong as opposed to 13% weak and 3% strong when quality was low.
Table 5 reports on the analysis of the relationships between panel members’ characteristics and the judgments on quality of evidence, benefit-risk profile, and direction and strength of recommendations. Overall, no statistically significant association emerged. Further exploration of data indicates that the last three variables were, as expected, significantly associated. Specifically, a high quality of evidence is associated with a positive benefit-risk balance (the probability of a higher rating on the benefit-risk balance is 4.89 times higher if quality of evidence is high) and with a stronger positive recommendation (odds ratio = 3.52; 95% CI, 1.78 to 6.98). The benefit-risk balance was the most important predictor of the direction and strength of the recommendation.
Content and Presentation of the Recommendations
The assessments described herein led to the recommendations reported in Table 3. Overall, there were two strong and seven weak recommendations, and three instances where panels concluded that no recommendation could be formulated.
The final template of the recommendations includes: (a) clinical question and its target population; (b) recommendation including its strength; (c) main reason(s) for grading; (d) distribution of panel members’ votes on quality of evidence, benefit-risk profile, and strength of recommendation; and (e) summaries of finding tables. Moreover, the full text of the recommendation included a session labeled “evidence in context” in which panels described the target population for the treatment and the information to be given to patients to facilitate their choices.
DISCUSSION
In setting up this project, we sought to assess whether a participatory mechanism in the production of clinical recommendations would work and whether GRADE was suitable for this purpose, adding scientific rigor to the process.
Our data provide a positive answer to both questions. Producing evidence-based recommendations for the appropriate use of anticancer treatments in everyday practice is a challenge that needs an innovative, scientifically sound, and participatory process. High patient expectations, commercial pressures, clinical and organizational constraints, and availability of resources need to be considered together if one bets on the survival and sustainability of universal health care systems.
Anticancer drugs are a hot topic in the discussion about the adequacy of current standards for approval of new drugs. While the current US Food and Drug Administration and European Medicine Agency (EMEA) legislation requires as a prerequisite that a drug it is found effective in well conducted clinical trials before approval, the reality is that a new drug is often approved only on the basis of its effects on surrogate outcome, with limited follow-up and sometimes using data obtained from phase II rather than phase III studies.6 Although strong concerns have been raised about the need of more coherent standards from regulatory bodies,5-7 health care systems should be equipped to deal with the introduction of yet experimental interventions. This could be done by creating and supporting a framework so that the information that is missing is produced through pragmatic studies while managing the introduction in clinical practice through guidelines. This is the approach taken by Regione Emilia-Romagna17, supporting clinical trials,18 and other HTA activities19-21 in oncology. Almost simultaneously a highly innovative funding scheme called National Research Program for Independent Research has been implemented by the Italian Drug Agency (AIFA).22 Two large randomized controlled trials emerged from questions identified by PRI-ER's panels have been approved and funded within the 2006 call.
Our experience suggests that GRADE is feasible and facilitates a multidisciplinary interaction: first, because it fostered a team atmosphere among health professionals of the regional oncology network; and second, because it allowed reconciliation of the traditional separation between clinicians’, methodologists’, and administrators’ points of view as well as allowing for patients to play a more active role.23
That said, the variation(s) found in the way panel members appraised the quality of evidence (Fig 1) could be seen as a drawback of the system and a fundamental limitation to its viability. On the contrary, we believe that the in-depth assessment of the evidence that underlies an intervention is one of GRADE's distinctive features. To avoid losing this richness, the recommendations should therefore not be presented as a Yes/No conclusion, but the results of the assessment process should be presented transparently in all its determinants. This led us to choose the template for the presentation of our recommendations described in the Results section.
Given its in-depth assessment, GRADE seems likely to produce “more conservative and justified” recommendations. Noticeably, at the same time as ours, other guidelines organizations (National Institute of Clinical Excellence [NICE] and Cancer Care Ontario) issued recommendations for the use of trastuzumab and aromatase inhibitors for breast and oxaliplatin for colon cancer. Compared with those produced by others,24-28 our recommendations offer an explicit rationale and justifications as well as a full account of the amount of existing uncertainty and disagreement among panel members.
Less clear is how to make sense of the different interpretations on the benefit-risk profile as a function of the judgment of the quality of evidence and determinant of the strength and direction of the recommendations. Although it would be hard to expect that the judgment of the benefit-risk profile would be homogeneous among panel members, our results suggest that there is coherence between the different steps required by GRADE (Table 1).
In general, differences in ratings within and among panels are not surprising because the type of available literature and the results of the studies were different. In lung cancer, the panel reviewed a topic (adjuvant radiotherapy in completely resected stage I and II NSCLC) with studies showing, with good confidence, a detrimental effect of treatment; this lead to a strong negative recommendation. In breast cancer, we dealt with clinical questions fraught with uncertainty, and this may explain the internal spread of judgments (Fig 1) as well as the strength of the recommendations. Moreover, it must be borne in mind that we purposely chose controversial questions resulting either from conflicting results among primary studies or from lack of relevant evidence, with recent drugs still under confirmatory investigations.
The influence of panel composition (by age, sex, and specialty) did not predict the strength of recommendations, even though we may not have had sufficient statistical power to identify important differences. On the other hand, the quality of evidence and, even more, the balance between benefit and risk, are the only predicting factors of the final strength and direction of recommendations. This is reassuring because it suggests that no major bias occurred as a result of the composition of panels or of potential conflicts of interest of panelists, phenomena that have both been already documented in previous research.29-34
AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST
The author(s) indicated no potential conflicts of interest.
AUTHOR CONTRIBUTIONS
Conception and design: Rossana De Palma, Alessandro Liberati, Giuseppe Longo, Nicola Magrini, Maurizio Marangolo, Fausto Roila
Collection and assembly of data: Rossana De Palma, Alessandro Liberati, Giovannino Ciccone, Elena Bandieri, Maurizio Belfiglio, Manuela Ceccarelli, Maurizio Leoni
Data analysis and interpretation: Rossana De Palma, Alessandro Liberati, Giovannino Ciccone, Maurizio Belfiglio, Maurizio Leoni, Giuseppe Longo
Manuscript writing: Rossana De Palma, Alessandro Liberati, Giovannino Ciccone
Final approval of manuscript: Rossana De Palma, Alessandro Liberati, Giovannino Ciccone, Elena Bandieri, Maurizio Belfiglio, Manuela Ceccarelli, Maurizio Leoni, Giuseppe Longo, Nicola Magrini, Maurizio Marangolo, Fausto Roila
Appendix
Multidisciplinary Panel Members
The following individuals (clinicians, public health doctors, pharmacists and consumers representatives) were members of the multidisciplinary panels: Amadori D, Ardizzoni A, Armaroli L, Balduzzi A, Bazzoli F, Beccati D, Berridge L, Bertoni F, Biasco G, Boaron M, Boni C, Bretti S, Briganti L, Busutti L, Calandri C, Calzoni P, Cartei F, Caserta C, Casetti T., Conte PF, Cuzzoni Q, Donadio M, Durante E, Eusebi V, Falcone F, Faggiuolo R, Francioni F, Frezza G, Fumagalli M, Garcea D, Gianessi W, Giulianini G, Grappa M, Lanza G, Lelli G, Lorenzo G, Luppi G, Malpighi M, Manghi I, M Marangolo, Martoni A, Mazzetti GP, Minguzzi M, Mori CA, Nanni O, Natalini G, Palmieri M, Pasquini E, Petocchi B, Petropoulacos L, Poletti V, Polico R, Rossi G, Spagnoli G, Taffurelli M, and Vanzo C.
Project Coordinating Group
The following individuals were part of the project coordinating group: Rossana De Palma, Elena Bandieri, Maurizio Belfiglio, Giovannino Ciccone, Manuela Ceccarelli, Maurizio Leoni, Giuseppe Longo, Nicola Magrini, Maurizio Marangolo, Fausto Roila and Alessandro Liberati.
Example of the Template of a Recommendation: Use of Trastuzumab in the Adjuvant Therapy in Breast Cancer
Question and Target Population
In women with HER-2 positive breast cancer (HER-2 3+ in immunohistochemistry or FISH test positive) without cardiac impairment, is trastuzumab recommended as adjuvant therapy?
Recommendation
In HER-2–positive early breast cancer patients without cardiac impairment, trastuzumab could be used as adjuvant therapy.
RECOMMENDATION: WEAK POSITIVE
Main Reason for Grading
In available studies, trastuzumab significantly reduces the incidence of recurrences with clear cardiotoxic effects. The short follow-up does not allow an evaluation of either long-term side effects or the maintenance of therapeutic benefits. The benefits of trastuzumab could be evaluated by the predictive and prognostic factors.
Studies
Three phase III trials (two of which provided the preliminary results from 6.738 patients) support the use of trastuzumab in neoadjuvant regimen of breast cancer. The median follow-up is short (2 years and 1 year in the American and European trials, respectively).
Panel Judgments
a. Quality of evidence. The evidence on the outcomes of efficacy and safety is of “moderate” quality. The distribution of the judgment given by the panelists are reported here: the quality of evidence is considered moderate by the majority (10 of 18) of the panel, while three members rated it “high” and five, “low.”
b. Balance risk/benefit. Slightly more than half of the members (10 of 18) deemed that the treatment's benefits outweigh its risks or downsides. However, a sizeable minority among the others considered this balance “uncertain” (seven members) or “negative” (one member).
c. Strength of recommendation. The majority of members (12 of 18) opted for a “weak positive” recommendation, while one member preferred “strong positive,” four were “uncertain,” and one felt the treatment should be discouraged (“weak negative”).
Evidence in Context: Making Explicit Issues That Should Be Considered in the Implementation of the Recommendation
-
Patients toward whom this recommendation is directed are those having characteristics similar to the inclusion criteria defined in published studies: patients with HER-2–positive breast cancer (HER-2 3+ in immunochemistry, FISH test positive), with no demonstrated cardiac impairments either at the beginning or at the end of the adjuvant therapy. Most men under investigation (83% to 84%) were younger than 60 years and showed prominently positive axillary nodes (NSABP trial only positive nodes patients, NCCTG N9831 trial negative nodes, < 13%; HERA trial, 33% negative nodes).
-
Due to its non-negligible cardiotoxicity, trastuzumab should not be administered to female patients with congestive heart impairments, coronary disease, myocardial infarction, uncontrolled hypertension, cardiomyopathy, left ventricular dysfunction, valvular disease, and clinically significant arrhythmia.
-
Trastuzumab administration is dependent on the existence of an adequate left ventricular functionality (LVEF 55% evaluated both at the beginning and at the end of adjuvant chemotherapy, and complementary radiotherapy, if performed, with no absolute decrease in LVEF higher than 15% with respect to the initial evaluation).
-
Trastuzumab administration together with chemotherapy increases cardiotoxicity risk.
Information for Patients
-
If administered in addition to chemotherapy, trastuzumab can cause significant cardiotoxicity. Its administration should therefore be taken into consideration after careful evaluation of the risk-benefit balance for the individual patient.
-
Currently, due to the limited follow-up period, long-term effects are unknown. This goes both for the benefits in terms of overall and disease-free survival and adverse effects, particularly cardiotoxicity.
Further Considerations
During panel activities, a trial by the Finland Herceptin group was published. The trial assessed trastuzumab effectiveness in addition to docetaxel or vinorelbine adjuvant therapy within a group of 232 Herb-positive patients during a 9-week period.
The 3-year disease-free survival was higher for the trastuzumab-treated group (89%) than the group receiving no treatment (78%); the related hazard ratio was 0.42. The results suggested the opportunity to evaluate the effectiveness of different treatment periods in future investigations.
Presentation of RCT Results and Limitations
Footnotes
-
Supported by Contratto No. 249, Bando Nazionale Ricerca Finalizzata 2005, Ministero della Salute, Italia. Supported in part by the Fondo per l’Innovazione, an unrestricted grant provided by the following companies: AstraZeneca, Chiesi, Cordis Glaxo Smith Kline, Lilly, Pfizer Novartis, Siemens, and Takeda.
Authors’ disclosures of potential conflicts of interest and author contributions are found at the end of this article.
- Received April 18, 2007.
- Accepted October 23, 2007.