- © 2004 by American Society of Clinical Oncology
In Reply:
- Marc L. Citron,
- Donald A. Berry,
- Constance Cirrincione,
- Clifford Hudis,
- Larry Norton,
- Eric P. Winer,
- William J. Gradishar,
- Nancy E. Davidson,
- Silvana Martino,
- Robert Livingston,
- James N. Ingle,
- John Carpenter,
- David Hurd,
- James F. Holland,
- Barbara Smith,
- Carolyn I. Sartor,
- Eleanor H. Leung,
- Jeffrey Abrams,
- Richard L Schilsky and
- Hyman B. Muss
- ProHEALTH Care Associates LLP, Lake Success, NY
- The University of Texas M.D. Anderson Cancer Center, Houston, TX
- Cancer and Leukemia Group B Statistical Office, Durham, NC
- Memorial Sloan-Kettering Cancer Center, New York, NY
- Dana-Farber Cancer Institute and Brigham and Women’s Hospital, Boston, MA
- Northwestern University, Chicago, IL
- Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD
- The John Wayne Cancer Institute, Santa Monica, CA
- University of Washington, Seattle, WA
- Mayo Clinic, Rochester, MN
- University of Alabama at Birmingham, Birmingham, AL
- Wake Forest University Comprehensive Cancer Center, Winston-Salem, NC
- Ohio State University, Columbus, OH
- Massachusetts General Hospital, Boston, MA
- University of North Carolina School of Medicine, Chapel Hill, NC
- Cancer and Leukemia Group B Data Office, Durham, NC
- National Cancer Institute, Bethesda, MD
- Cancer and Leukemia Group B Central Office, Chicago, IL
- University of Vermont, Burlington, VT
Keith et al properly cite several immune modulating effects of filgrastim that were not known at the time the clinical trial C9741 was designed and conducted. However, published data from an earlier Intergroup clinical trial in the adjuvant treatment of breast cancer addresses their point and does not support their hypothesis. Cancer and Leukemia Group B (CALGB) 9344, which applied doxorubicin plus concurrent cyclophosphamide followed or not by paclitaxel, tested three different dose levels of doxorubicin. Only the highest dose—90 mg/m2—required filgrastim support in each of four cycles [1]. There was no suggestion of a benefit between that highest-dose chemotherapy and other arms that used lower dose levels of doxorubicin plus cyclophosphamide without routine filgrastim. Although not statistically different, the highest-dose arm was the worst performing arm in the trial.
While it is noted that the patients receiving dose-dense chemotherapy on C9741 received eight to 12 cycles of filgrastim, rather than four in CALGB 9344, this is unlikely to be a complicating factor for two reasons. The first is that there is no indication of superiority of the filgrastim-containing arm in CALGB 9344. The second is that in C9741, the sequential dose-dense treatment, delivering 12 cycles of filgrastim, was not superior to the concurrent dose-dense treatment, which delivered eight. There is no suggestion of a cumulative or total dose beneficial effect of filgrastim. Hence, we believe that the benefits seen in C9741 were as a result of the more frequent administration of chemotherapy, and that use of filgrastim did not by itself augment the efficacy of dose-dense treatment.
Dr Atkins raises provocative issues about the statistical analysis of the results of Intergroup C9741. He indicates that we assumed “similarity of the treatment in each individual category.” This is not correct. A factorial analysis allows for comparing individual arms through modeling the main effects of dose-density and treatment sequence and also their interaction. For example, the sequential, every-3-week arm could be found inferior to the other regimens if both sequential and every-3-week were found inferior as main effects and with no interaction. It could also be found inferior if one or neither of the main effects had been significant but the interaction was significant. In the actual trial, the every-3-week effect was found inferior but sequential was not, and the interaction was not significant. The log-rank P value in comparing the two every-3-week arms, concurrent versus sequential, is .85. There is not even a hint of a difference between the two every-3-week arms. Therefore, our conclusion is that the sequential, every-3-week arm is inferior to both every-2-week arms but not to the concurrent, every-3-week arm.
As a technical matter, we want to correct Dr Atkins' statement that “failure to show a statistically significant interaction between dose density and sequence of treatment means that we cannot be 95% sure that there is no interaction.” A correct implication of the first clause is that the data are not in the extreme 5% tails of the distribution of the coefficient of the interaction term under the assumption that there is no interaction. Dr Atkins and other readers who do not like or understand this correct but obscure interpretation are referred to the Bayesian approach described by Berry [2,3]. Indeed, the wording of Dr Atkins' version is a type of Bayesian conclusion, one whose correctness would depend on which prior distribution is assumed.
Regarding our “specious” statement, it is a fact that the study was not designed to make comparisons among the arms. The trial used a factorial design because we wanted a single trial to answer two important scientific questions: the effects of dose density and treatment sequence. We could have powered the study for comparisons of the various arms by ensuring high power to assess interactions between these two main effects. But that would have required more than twice the sample size.
Drawing conclusions beyond those set out in a trial protocol is dangerous because of the biases involved. However, in Figure 4 of our article, we presented by-arm survival distributions [4]. One reason was to show the data associated with our statistical conclusion that the interaction terms were not significant. Another was to address the obvious clinical need to treat patients using regimens and not “main effects and interactions.” But we did not give statistical measures such as P values comparing pairs of arms because they have no inferential, scientific, or medical meaning.
The crux of Dr Atkins’ letter seems to be the comparison of the two sequential arms, every-2-week and every-3-week. If these two arms are compared in isolation, then the conclusion is that they are not statistically different; the log-rank P values for both disease-free and overall survival are greater than .05 (although not much greater). But these two arms are basic components of a bigger design and are not isolated. Their comparison is the same as the respective comparison of the two concurrent arms. The power and beauty of factorial analysis is that it allows for borrowing strength about dose-density across the sequential and concurrent comparisons. This borrowing actually occurred in C9741 because the results are consistent with the same effect of dose-density in both the sequential and concurrent cases. Indeed, in proportional hazard models, the numerical risk reduction as a result of dose-density is essentially identical in the two subsets, concurrent and sequential. There is not the slightest hint of an interaction. Our conclusion in this regard could not be cleaner.
Dr Atkins suggests that the reader can “appropriately use his or her own estimates of expected probabilities to apply Bayes' theorem to subgroup analyses and thereby determine the posterior probability that the results can be trusted.” We agree that the Bayesian approach can be helpful in interpreting empirical results. However, we doubt that many readers will be able to follow Dr Atkins' suggestion, so we will help. The relevant data are shown in Figure 4 of our article [4]. Taking a Bayesian approach to subgroup analysis [3], the same sort of “borrowing strength” occurs as mentioned above, and for the same reason. The reader who uses reasonably open-minded prior distributions [2,3] will conclude that there is a high posterior probability of a difference between the two sequential every-2-week and every-3-week arms.
Returning to a point in the first paragraph of Dr Atkins' letter, no such reader will conclude that the sequential every-3-week arm is inferior to all three of the other arms. The only readers who will draw this conclusion are those with a high prior probability (that is, separate from the present study) that this is the case. Quite generally, a Bayesian approach would lead to the same conclusions that we drew, except for those Bayesians who were convinced a priori about the relative merits of the treatment regimens used in our trial. We know of no evidence to support such an assumption and so do not reach that conclusion.
Hyrniuk, Ragaz, and Peters raise a provocative interpretation of CALGB 9741 by considering the results solely in terms of dose-rate or dose-intensity. The latter concept, as defined in a landmark paper by Hryniuk and Levine [5], is an important concept in the development of chemotherapy for breast cancer and other malignancies.
Dose-density, while indeed an aspect of dose-intensification, is not the same as either dose-rate or dose-intensity. Dose-density is not described with an equation. It is a relative term that compares drugs and regimens given at fixed and constant dose sizes and numbers. CALGB 9741 addresses the same drugs and same dosages with a shorter treatment interval.
If one introduces other factors, such as different drugs or dosages, it becomes impossible to attribute changes in outcome to dose-density. For example, if drug A is given at a dose of 100 mg/m2 once every 4 weeks or at a dose of 50 mg/m2 once every other week, the dose-intensity would appear to be the same—25 mg/m2 per week. However, if 50 mg/m2 is a less effective dose than 100 mg/m2, then these two regimens would be different and the question would be whether or not two lesser doses equal one better dose. Dose-rate and dose-intensity calculations avoid this complication by assuming that the total drug exposure over time is all that matters. We see no evidence that this is correct.
Hryniuk and his colleagues suggest a three-way, retrospective subset analysis between age, nodal status, and dose density. Subset analysis is dangerous and subject to statistical problems of multiplicity. However, since this particular subset was defined by the correspondents on the basis of data separate from the present trial, we complied with their request by considering patients who were younger than 50 years and who had at least four positive lymph nodes. Only 357 patients qualified. The observed disease-free survival benefit of every-2-weeks in this subset was not statistically different than it was in the overall trial cohort.
Hryniuk et al call dose-density old news in chemotherapy, citing a 1980 study in Hodgkin's disease [6]. However, that study makes no mention of compressing the interval between doses.
The correspondents also theorize that cyclophosphamide contributed less than doxorubicin and paclitaxel. Our trial did not address this question, and we know of no empirical evidence to support their theory.
Authors' Disclosures of Potential Conflicts of Interest
The following authors or their immediate family members have indicated a financial interest. No conflict exists for drugs or devices used in a study if they are not being evaluated as part of the investigation. Owns stock (not including shares held through a public mutual fund): Hyman B. Muss, Amgen, Enzon. Acted as a consultant within the last 2 years: Marc L. Citron, Amgen; Clifford Hudis, Amgen; Robert Livingston, Amgen; James N. Ingle, Pfizer, Novartis; John Carpenter, AstraZeneca; Hyman B. Muss, Wyeth. Performed contract work within the last 2 years: James N. Ingle, Pfizer, Novartis. Served as an officer or member of the board of a company: Hyman B. Muss, American Board of Internal Medicine. Received more than $2,000 a year from a company for either of the last 2 years: Marc L. Citron, Amgen; Eric P. Winer, BMJ Japan; Robert Livingston, Amgen; Edith A. Perez, Genentech, Bristol Myers, Aventis; John Carpenter, Amgen; Hyman B. Muss, Network for Medical Communication.