- © 2010 by American Society of Clinical Oncology
Immunohistochemical Methods for Predicting Cell of Origin and Survival in Patients With Diffuse Large B-Cell Lymphoma Treated With Rituximab
- Paul N. Meyer⇑,
- Kai Fu,
- Timothy C. Greiner,
- Lynette M. Smith,
- Jan Delabie,
- Randy D. Gascoyne,
- German Ott,
- Andreas Rosenwald,
- Rita M. Braziel,
- Elias Campo,
- Julie M. Vose,
- Georg Lenz,
- Louis M. Staudt,
- Wing C. Chan and
- Dennis D. Weisenburger
- From the University of Nebraska Medical Center, Omaha, NE; Rikshospitalet-Radiumhospitalet Medical Center, University of Oslo, Oslo, Norway; British Columbia Cancer Agency, Vancouver, British Columbia, Canada; Robert-Bosch-Krankenhaus, Stuttgart; University of Würzburg, Würzburg; Charité–Universitätsmedizin Berlin, Berlin, Germany; Oregon Health and Science University, Portland, OR; University of Barcelona, Barcelona, Spain; and Division of Cancer Treatment and Diagnosis, Center for Cancer Research, National Cancer Institute, Bethesda, MD.
- Corresponding author: Paul N. Meyer, MD, PhD, University of Nebraska Medical Center, 983135 Nebraska Medical Center, Omaha, NE 68198-3135; e-mail: pnm105{at}hotmail.com.
-
Presented in part at the 99th Annual Meeting of the United States and Canadian Academy of Pathology, March 20-26, 2010, Washington, DC.
Abstract
Purpose Patients with diffuse large B-cell lymphoma (DLBCL) can be divided into prognostic groups based on the cell of origin of the tumor as determined by microarray analysis. Various immunohistochemical algorithms have been developed to replicate these microarray results and/or stratify patients according to survival. This study compares some of those algorithms and also proposes some modifications.
Patients and Methods Two-hundred and sixty-two cases of de novo DLBCL treated with rituximab and cyclophosphamide, doxorubicin, vincristine, and prednisone (CHOP) or CHOP-like therapy were examined.
Results The Choi algorithm and Hans algorithm had high concordance with the microarray results. Modifications of the Choi and Hans algorithms for ease of use still retained high concordance with the microarray results. Although the Nyman and Muris algorithms had high concordance with the microarray results, each had a low value for either sensitivity or specificity. The use of LMO2 alone showed the lowest concordance with the microarray results. A new algorithm (Tally) using a combination of antibodies, but without regard to the order of examination, showed the greatest concordance with microarray results. All of the algorithms divided patients into groups with significantly different overall and event-free survivals, but with different hazard ratios. With the exception of the Nyman algorithm, this survival prediction was independent of the International Prognostic Index. Although the Muris algorithm had prognostic significance, it misclassified a large number of cases with activated B-cell type DLBCL.
Conclusion The Tally algorithm showed the best concordance with the microarray data while maintaining prognostic significance and ease of use.
INTRODUCTION
Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous group of B-cell lymphomas with wide variation in patient survival. Microarray analysis has shown that patients with DLBCL expressing a gene expression profile (GEP) of germinal center B cells (GCBs) have a longer survival than those with a GEP of activated B cells (ABCs).1,2 Because it is currently impractical to perform microarray analysis on every patient with DLBCL, various immunohistochemical algorithms have been developed to predict the cell of origin and/or survival. These algorithms use different combinations of antibodies to germinal center or activated B-cell–related proteins to obtain a desired result. The results of the algorithms developed by Hans et al and Choi et al have correlated well with the corresponding GEP results and have also demonstrated clear survival differences between the GCB and non-GCB DLBCL groups.3,4 The results of algorithms developed by other authors have not been compared with the corresponding GEP results and rely predominantly on survival differences between the immunophenotypic groups.5–7 Because some of these algorithms were published before rituximab was commonly used in the treatment of DLBCL, the usefulness of these algorithms for prognostication has been called into question.6,8,9 Our goal was to compare these algorithms in a well-characterized group of patients with DLBCL treated with standard chemotherapy including rituximab.3–7 During this study, we also evaluated some new methods to predict the cell of origin and survival in DLBCL.
PATIENTS AND METHODS
A total of 262 cases of de novo DLBCL treated with rituximab and cyclophosphamide, doxorubicin, vincristine, and prednisone (CHOP) or CHOP-like therapies were obtained from the Nebraska Lymphoma Study Group registry (61 cases), British Columbia Cancer Center (51 cases), Norwegian Radium Hospital (47 cases), University of Barcelona (44 cases), Cleveland Clinic Foundation (21 cases), University of Würzburg (20 cases), and Oregon Health Sciences Center (18 cases). Patients ranged in age from 13.5 to 92 years, with a median age of 62.3 years. One-hundred twenty-five patients (48%) were younger than 60 years and 137 patients (52%) were older than 60 years. Clinical and follow-up data were available for 256 cases. The International Prognostic Index (IPI) was available for 174 patients: 73 patients (42%) had low (0 or 1), 73 patients (42%) had intermediate (2 or 3), and 28 patients (16%) had high (4 or 5) IPI scores.
Hematoxylin and eosin–stained sections from a representative formalin-fixed, paraffin-embedded tissue block for each tumor were used to define diagnostic areas. One to three representative 0.6-mm to 1-mm cores were obtained from each case and inserted into a recipient paraffin block in a grid pattern using a tissue arrayer (Beecher Instruments, Silver Spring, MD).
Paraffin-embedded sections 5-μm thick were subjected to antigen retrieval and antibody staining, as shown in Table 1. The immunoperoxidase stains were performed on either a Benchmark XT (Ventana, Tucson, AZ) using cell conditioning solution for antigen unmasking (CC1) and Ultraview universal diaminobenzidine detection kits (Ventana) or an Autostainer Plus (Dako, Carpinteria, CA) using the Envision Flex High pH visualization system (Dako). GCET1 and FOXP1 are now commercially available from Santa Cruz Biotechnology (Santa Cruz, CA). For GCET1 and FOXP1, a 1-mmol/L EDTA solution (pH 8.0) replaced CC1 for antigen retrieval. The cutoff for tumor positivity was set at 30% of tumor cells staining for each antibody, except where noted in particular algorithms. The tissue core with the highest percentage of tumor cell staining was used for analysis. Scoring of the antibodies was estimated visually in 10% increments by one author (P.N.M.), who was blinded to the GEP and algorithm results and recorded in a spreadsheet. The issue of interobserver reproducibility has been previously addressed by our group for the Choi algorithm and showed that the majority of discrepancies only varied by a single 10% increment.4 A computer perturbation model showed that random introduction of 10% variability into the antibody scoring yielded approximately 90% concordance with the original algorithm result.4
The algorithms we evaluated are shown in Figure 1.3–7 The cell of origin predicted by the immunohistochemical (IHC) algorithms is referred to as “immunophenotype” in this article to distinguish the IHC result from the cell of origin determined by molecular analysis. Some cases used in this study (34 cases from the Nebraska Lymphoma Study Group registry, 20 cases from the Norwegian Radium Hospital, and 18 cases from the Oregon Health Sciences Center) were also used to develop the Choi algorithm.4 The Choi algorithm yielded the same overall survival (OS) and event-free survival (EFS) results when these 72 common cases were excluded (data not shown), thus abrogating the possibility of over-fitting of the data.
All of the published algorithms examined look at antibody staining in a particular order. This technique allows for the exclusion of certain antibody results in particular cases. GEP, which these algorithms are trying to emulate, does not examine results in a particular order or exclude results. To correct this discrepancy, a “Tally” algorithm in which antibody results are not examined in a particular order was developed. Antibody results already determined for the Choi algorithm, excluding BCL6 for reasons discussed later, were used. This method includes an equal number of GCB (GCET1 and CD10) and ABC (FOXP1 and MUM1) antibodies. Classification is determined by the immunophenotype pair with more positive antigens. Because two antibodies are used for each type, another antibody was necessary in the case of an equal number of positive results for each type (ie, a tie). LMO2 results, already obtained for the Natkunam algorithm, were chosen for the tie-breaker. If an equal number of GCB and ABC antigens are positive, then LMO2 determines the immunophenotype (ie, LMO2 ≥ 30% yields GCB).
The GeneChip Human Genome U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA) and Bayesian algorithm were used for determining the cell of origin.10 GEP results (GCB or ABC) were available on 192 of the cases. An additional 26 cases were unclassifiable, mostly because of nucleotide degradation (n = 24) or classified as primary mediastinal (n = 1) or Burkitt's lymphoma (n = 1) by GEP, and these were excluded from comparisons involving the GEP result. The sensitivity, specificity, positive predictive value, and negative predictive value for each algorithm compared with the GEP results were calculated using standard 2 × 2 tables. Because of the nature of the calculations, the sensitivity and specificity of an algorithm for the ABC type will always be the reverse of the sensitivity and specificity for the GCB type. Likewise, the positive predictive value and negative predictive value of an algorithm for the ABC type will always be the reverse of the positive predictive value and negative predictive value for the GCB type. For simplicity, only values for the GCB type of DLBCL will be discussed in this article. Concordance with the GEP results was determined for each algorithm as the number of matching immunophenotype and GEP results divided by the total number of GEP results.
The OS and EFS distributions for each algorithm were estimated by the Kaplan-Meier method, with differences evaluated by the log-rank test. OS is defined as the time from initial diagnosis to death or last follow-up, with those alive at last follow-up treated as censored. EFS is defined as the time from initial diagnosis to relapse, death, or last follow-up, whichever came first. Patients who were alive and relapse-free at last follow-up were treated as censored. Multivariate Cox regression analysis was performed to compare the ability of each algorithm to detect differences in OS and EFS after adjusting for the IPI. For each algorithm, a hazard ratio of 1 was assigned to the GCB (or equivalent) group. For all tests, a probability of .05 was used to determine statistical significance. SAS software version 9.2 (SAS Institute, Cary, NC) was used for the data analysis. This study was approved by the institutional review boards of the respective institutions, and all patients gave written informed consent.
RESULTS
The published algorithms examined in this study are those of Hans et al,3 Choi et al,4 Muris et al,5 Nyman et al,7 and the use of a single marker (LMO2) by Natkunam et al6 and are illustrated in Figure 1. The immunophenotypes from each algorithm were compared with the GEP results as determined by microarray analysis. The performance characteristics of each algorithm are presented in Table 2. To demonstrate the degree of overlap between algorithms in assigning a tumor to the GCB or ABC category, cases with a discrepancy between an algorithm result and the GEP result (81 cases) are shown in Table 3 and Appendix Table A1 (online only). The remaining cases with a GEP result showed agreement with the algorithm results. For cases without GEP results, the algorithm results were used for OS and EFS analysis, but are not shown in Appendix Table A1 because of the lack of a gold standard for comparison.
Of these algorithms, the Choi algorithm had the greatest concordance with the GEP results (87%). The Hans algorithm also had high concordance with the GEP results (86%). The algorithms of Nyman and Muris, although having relatively high concordance with the GEP results, had a low value for either sensitivity or specificity. The algorithm of Nyman had a sensitivity of only 67%, whereas the algorithm of Muris had a specificity of only 54%. The use of LMO2 alone had the lowest concordance at 74%.
The algorithms of Nyman and Muris are interesting in that neither one uses an antibody to BCL6. The article by Muris also examines an additional algorithm using only CD10 and MUM1, basically the Hans algorithm without BCL6. The use of BCL6 immunohistochemistry can be problematic, with poor reproducibility between laboratories.11 Considering the difficulty with BCL6, we removed this antibody from the Hans and Choi algorithms and reanalyzed the results. Results from the modified Hans algorithm without BCL6 (Hans*) were similar to those of the original Hans algorithm. Removing BCL6 from the Choi algorithm yielded far lower values in comparison with the GEP results (data not shown). Rearrangement of the antibody order and use of a 30% cutoff for each antibody after removal of BCL6 yielded a new algorithm (Choi*) with values identical to those of the original Choi algorithm.
Of the algorithms examined, most (with the exception of Natkunam) use a combination of antibodies to proteins expressed predominantly by either GCBs or ABCs and examined in a certain order. Because of reliance on the order of examination, the result of an antibody early in the algorithm can make the results of antibodies used later in the algorithm irrelevant. Therefore, a Tally method of positive GCB markers (GCET1 and CD10) versus positive ABC markers (MUM1 and FOXP1) was developed, with the greater number of positive results determining the immunophenotype. The result of a fifth antibody (LMO2) was used to determine the immunophenotype (positivity indicates GCB) if both groups had the same number of positive results (ie, a tie-breaker). Using this Tally algorithm, all antibody results were considered relevant for predicting the cell of origin, regardless of the order of examination. The Tally algorithm had the highest concordance with GEP results of all the algorithms examined (93%).
All of the algorithms divide patients with DLBCL into two groups with significantly different OS (Fig 2) and EFS (Fig 3). All of the algorithms except the Nyman algorithm (P = .13) also predict OS independent of the IPI. All of the algorithms predict EFS independent of the IPI. Hazard ratios (HRs) for OS and EFS are given in Table 2. For comparative purposes, GEP results have an HR of 3.3 for OS and 3.7 for EFS. Harrell's c-indices for our algorithms ranged from 0.73 to 0.77 for OS, all with overlapping CIs. The c-indices for EFS ranged from 0.74 to 0.81, again with overlapping confidence intervals. The Nyman algorithm had the highest c-index for OS and EFS; however, the CIs indicate that no particular algorithm has more discriminatory power than another.
DISCUSSION
DLBCL is considered an aggressive lymphoma, but predicting an individual patient's prognosis is difficult. This difficulty stems from the fact that DLBCL is a heterogeneous group of lymphomas with no clear histologic criteria for subdivision.12 Although new developments in chemotherapy, especially the anti-CD20 antibody rituximab, have improved the survival of patients with DLBCL, predicting patient prognosis is still difficult.13,14 Although developed before the routine use of rituximab, a number of publications have demonstrated the efficacy of GEP results in predicting survival in rituximab-treated patients as well.2,4,9 Targeted therapies that require cell of origin distinctions to be performed in real-time, such as bortezomib or dose-modified etoposide, doxorubicin, cincristine, prednisone, cyclophosphamide, and rituxin, have been suggested.15,16 For example, a clinical trial using Genasense with rituximab plus CHOP in ABC-type DLBCL determined by the Hans algorithm is underway at the University of Nebraska Medical Center (J.M.V.).
Currently, DNA microarray technology is not practical for the analysis of routine patient samples. This fact has led to efforts to approximate the information gained from GEP using simpler and more universally available techniques. With the revolution in antigen retrieval techniques, commercialization of antibodies, and automation of staining, immunohistochemistry is an obvious alternate technique to determine tumor cell of origin. A number of immunohistochemical algorithms have been published, most of which use a combination of antibodies against GCB- and ABC-specific antigens.
Our results demonstrate that the published algorithms of Hans and Choi were the best in predicting cell of origin as defined by GEP. Although the algorithm of Muris appears to be better in predicting OS than the Choi or Hans algorithms, the Muris algorithm is too selective, being very specific for ABC-type DLBCL at the cost of mislabeling a large number of ABC cases as GCB type. Although producing a large survival difference, this algorithm has no biologic basis and will not be helpful when trying to predict individual patient survival. Also, because it does not accurately predict the cell of origin, it cannot be used with therapies designed around the cell of origin.
Of these published algorithms, the Choi algorithm is the most predictive of GEP results and survival but is the least user-friendly. It requires the use of five antibodies, two of which are not commonly performed by most immunohistochemistry laboratories (GCET1 and FOXP1). Immunostains for BCL6 are technically difficult to perform and result in difficulties in interpretation.11 An example of appropriate BCL6 immunostaining with strong nuclear positivity is shown in Appendix Figure A1 (online only). Besides these problems, the Choi algorithm uses various cutoffs for determining antibody positivity and requires sequential interpretation of the results.
An attempt to address some of these issues led to the examination of two additional algorithms. Because of the problems with BCL6, it was removed from the Hans algorithm, leading to a new algorithm (Hans*) that had also been examined by Muris et al.5 The overall ability of the Hans* algorithm to predict the cell of origin and survival was similar to the original Hans algorithm. This minor difference in prognostic ability is offset by the ease of use of the modified Hans algorithm.
Removal of BCL6 from the Choi algorithm, rearrangement of the order of antibody examination, and standardization of positivity (30% of the tumor cells) led to a new algorithm (Choi*) that was easier to use than the original and had a similar ability to predict cell of origin and EFS. Prediction of OS, although statistically significant, was slightly decreased compared with the original algorithm; however, this decrease was offset by the ease of use.
All of the algorithms examined (except Natkunam) have a similar feature: certain antibodies have precedence over others because of the order of examination. By removing the order of examination and scoring the results of four selected antibodies, a Tally algorithm was created. This Tally algorithm includes the GCB-specific antigens CD10 and GCET1 and the ABC-specific antigens MUM1 and FOXP1. The GCB-specific antigen LMO2 is only used as a tie-breaker to classify the tumor. The Tally algorithm shows a better ability to predict the cell of origin than any algorithm examined in this study. The Tally algorithm also divides DLBCL patients into two groups with significantly different OS and EFS.
Of the 13 cases in which the Tally algorithm and GEP results disagreed, one was an ABC DLBCL and the other 12 were GCB DLBCL (Table 3). Seven of the eight algorithms determined an incorrect immunophenotype for the ABC DLBCL (case 1), with only Natkunam yielding the correct ABC immunophenotype. Of the 12 GCB DLBCLs, one case was incorrectly labeled by all eight algorithms, whereas another case was incorrectly labeled by seven algorithms. Of the 12 GCB DLBCL incorrectly labeled by the Tally algorithm, six cases had poor patient survivals more consistent with ABC DLBCL.
Although the Tally algorithm is the best for predicting the cell of origin, it too has drawbacks. First, it uses three antibodies that are commercially available but not commonly performed by many immunohistochemistry laboratories: GCET1, FOXP1, and LMO2. Second, the interpretation of GCET1, FOXP1, and LMO2 can be problematic, as some tumors show high background or nonspecific staining. Examples of appropriate and inappropriate results for LMO2 are shown in Appendix Figure A1. Third, the ability of the Tally algorithm to predict OS is slightly less than that of the Choi algorithm.
In conclusion, no single antibody has been useful in subdividing DLBCL or predicting prognosis. For this reason, combinations of antibodies, or algorithms, have been developed based on subdivision of DLBCL by microarray analysis. The Hans and Choi algorithms are useful to determine the cell of origin for a given DLBCL and can separate patients with DLBCL into prognostic groups, with or without the use of BCL6. A new algorithm that tallies antibody results without order precedence also has an excellent ability to predict the cell of origin and separate DLBCL patients into prognostic groups. Although we recommend the Tally and Choi algorithms for determining cell of origin and prognosis, the results of this article should allow laboratories with limited antibodies and expertise to choose the most appropriate algorithm for their practice. The immunohistochemical algorithms presented here are sufficiently robust to allow cell of origin determination for future therapies based on cell of origin.
AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST
The author(s) indicated no potential conflicts of interest.
AUTHOR CONTRIBUTIONS
Conception and design: Kai Fu, Timothy C. Greiner, Wing C. Chan, Dennis D. Weisenburger
Financial support: Wing C. Chan, Dennis D. Weisenburger
Administrative support: Lynette M. Smith
Provision of study materials or patients: Kai Fu, Timothy C. Greiner, Jan Delabie, Randy D. Gascoyne, German Ott, Andreas Rosenwald, Rita M. Braziel, Elias Campo, Julie M. Vose, Georg Lenz, Louis M. Staudt, Wing C. Chan, Dennis D. Weisenburger
Collection and assembly of data: Paul N. Meyer, Timothy C. Greiner
Data analysis and interpretation:Paul N. Meyer, Lynette M. Smith
Manuscript writing:All authors
Final approval of manuscript: All authors
Footnotes
-
Supported by National Cancer Institute Grants No. U01-CA114778-01 and CA36727.
-
Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.
- Received April 22, 2010.
- Accepted October 15, 2010.