Gene Panel Model Predictive of Outcome in Men at High-Risk of Systemic Progression and Death From Prostate Cancer After Radical Retropubic Prostatectomy

  1. George Vasmatzis
  1. From the Departments of Laboratory Medicine and Pathology and Urology, Division of Biostatistics; and the Departments of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN
  1. Corresponding author: George Vasmatzis, PhD, and John C. Cheville, MD, Department of Laboratory Medicine and Pathology, Mayo Clinic, 200 First St SW, Rochester, MN 55905; e-mail: vasmatzis.george{at}mayo.edu

Abstract

Purpose In men who are at high-risk of prostate cancer, progression and death from cancer after radical retropubic prostatectomy (RRP), limited prognostic information is provided by established prognostic features. The objective of this study was to develop a model predictive of outcome in this group of patients.

Methods Candidate genes were identified from microarray expression data from 102 laser capture microdissected prostate tissue samples. Candidates were overexpressed in tumor compared with normal prostate and more frequently in Gleason patterns 4 and 5 than in 3. A case control study of 157 high-risk patients, matched on Gleason score and stage with systemic progression or death of prostate cancer as the end point, was used to evaluate the expression of candidate genes and build a multivariate model. Tumor was collected from the highest Gleason score in paraffin-embedded blocks and the gene expression was quantified by real-time reverse transcription polymerase chain reaction. Validation of the final model was performed on a separate case-control study of 57 high-risk patients who underwent RRP.

Results A model incorporating gene expression of topoisomerase-2a, cadherin-10, the fusion status based on ERG, ETV1, and ETV4 expression, and the aneuploidy status resulted in a 0.81 area under the curve (AUC) in receiver operating characteristic statistical analysis for the identification of men with systemic progression and death from high grade prostate cancer. The AUC was 0.79 in the independent validation study.

Conclusion The model can identify men with high-risk prostate cancer who may benefit from more intensive postoperative follow-up and adjuvant therapies.

INTRODUCTION

Current prognostic markers predictive of outcome in men with prostate cancer after radical retropubic prostatectomy (RRP) consist of Gleason score, TNM stage, surgical margin status, and preoperative serum prostate-specific antigen (PSA).1-4 These clinicopathologic findings are used to identify patients who are at high-risk of systemic progression after surgery and may benefit from postoperative adjuvant therapy, such as androgen deprivation or radiation therapy. In men with high grade tumors after RRP, limited prognostic information is provided by established clinical and pathologic parameters.

Beyond the current clinicopathologic parameters, there have been other markers proposed to predict poor outcome of patients with prostate cancer. These markers include ploidy, tumor proliferative activity, alpha-methylacyl-CoA-racemase expression, and gene panel expression profiles.1,3-8 Recently, gene expression array technology applied to prostate cancer has resulted in the identification of a number of genes that have been associated with outcome. Many biomarkers reported to provide prognostic information show an association with outcome only in a univariate model and have limited to no prognostic value in a multivariate setting that includes Gleason score, stage, margin status, and preoperative serum PSA. In addition, many studies use serum PSA progression or biochemical recurrence after RRP as the outcome end point rather than development of metastatic disease or death from prostate cancer. Depending on the definition of PSA progression, this end point may include patients that do not have life-threatening disease.

Recently, Tomlins et al9 reported that a majority of prostate cancers exhibit fusions between the control region of an androgen regulated gene, TMPRSS2, and the coding region of the ETS family of transcription factors, most frequently ERG and much less frequently ETV1 and ETV4. The result of this fusion is the overexpression of the potential oncogenes, ERG, ETV1, and ETV4. The fusions are associated with an increased risk of cancer progression in patients treated surgically and those observed clinically without treatment.10,11 It is likely that the presence or absence of this fusion will be important in the clinical treatment of men with prostate cancer.

In this study, we sought to design a predictive model for men with surgically treated high-risk prostate cancer. An initial set of candidate genes were identified using microarray expression data from 102 samples collected by laser capture microdissection12; genes that could distinguish lower grade (Gleason score 6) from higher grade (Gleason score 7+) were selected using the methods given in Kosari et al.13 Variables for ETS-related fusion and DNA ploidy were also included in the modeling. The ETS fusion variable was added due its recent discovery, and a number of studies showing a significant association of fusion status with patient outcome. The variable for ploidy was added because studies have shown an association of ploidy with outcome, and it is routinely collected as part of the clinical practice on all patients with prostate cancer treated surgically at the Mayo Clinic.14 The candidate genes along with ETS fusion and ploidy were evaluated in a multivariate analysis using 157 tumor samples in a matched case-control study design. This case-control design was used to develop the model. The case group consisted of 76 men who developed systemic progression or died of prostate cancer within 5 years of surgery while the control group consisted of 81 men who did not develop systemic progression within 7 years after RRP. To maximize the practical utility of the candidate markers evaluated in this study, cases and controls were matched on all currently accepted clinical and pathologic prognostic features and were balanced for adjuvant therapy. The final model was then validated using an independent set of 57 patient samples in a nested case-control design. Thus, discovery, model building, and validation used independent groups of patients. Herein, we report a four-variable model predictive of cancer-specific outcome in men with high-grade prostate cancer treated by RRP. All human investigations were performed after approval by the Mayo Clinic institutional review board and in accordance with an assurance filed with and approved by the Department of Health and Human Services.

METHODS

The methods for the generation of the microarray data used in this study are described elsewhere.12,13 The methods that describe patient information and patient material for the cohort study and the case control-studies presented here are described in the supplemental material.

RESULTS

Cohort Study for Current High-Risk Prostate Cancer Marker Status

Using patients treated from 1990 to 2004 in the Mayo Radical Prostatectomy database, the Cox proportional hazards model was used to identify significant clinical predictors of time to systemic progression. The strongest predictors were RRP Gleason score, pathologic stage components (extracapsular cancer, seminal vesicle involvement, lymph node metastases), and the use of adjuvant (within 90 days after RRP) hormone therapy (Table 1). The model had a receiver operating characteristic (ROC) area under the curve (as measured by the concordance statistic) of 0.82. Gleason score was the strongest predictor with 10-year rates of systemic progression for Gleason score 2 to 6, 7, and 8 to 10 of 1.9%, 10.6%, and 23.6%, respectively. When the patient cohort was restricted to Gleason score 7 to 10, the strongest significant clinical predictors of time to systemic progression were RRP Gleason score, seminal vesicle involvement, and lymph node involvement (Table 1), and the concordance statistic dropped to 0.69. Guided by these results, we concluded that a high-risk prostate cancer RRP population, with Gleason score ≥ 7, would most strongly benefit from molecular marker panels to improve prognostic predictions and designed this study to address that need.

Candidate Marker Selection

A set of 38 genes was selected from a microarray expression profiling experiment of normal and tumor cells from patients with prostate cancer.12 Cells were collected by laser capture microdissection, from 29 non-neoplastic prostatic epithelia, five prostate intraepithelial neoplasias (PINs), 61 tumors with various degrees of differentiation (Gleason patterns 3, 4, and 5 prostate cancer), and seven node metastasis samples. Patient outcome was not known for these samples. We used a methodology described in Kosari et al13 to identify a list of variably overexpressed genes by comparing the expression of each gene in cancer samples with its expression in normal samples. Of those variably overexpressed genes, 38 were chosen as a candidate because they were overexpressed more frequently in high-grade cancers (Gleason patterns 4 or 5 and metastasis) than in intermediate cancers (Gleason pattern 3). We also defined a binary variable (pred_fusion) to indicate the ETS-fusion status of a patient's tumor and another binary variable to indicate aneuploidy (Appendix, online only). To evaluate the 40 variables and build a multivariable model, a case-control study was designed by using Gleason score ≥ 7 patients with and without systemic progression.

Model Building Case-Control Study

Cases were defined as men that developed systemic progression or died of prostate cancer within 5 years of RRP, and were matched for Gleason score, TNM stage, margin status, and preoperative serum PSA with controls that did not develop systemic progression or die of prostate cancer for at least 7 years. The cases and controls are very similar in their clinical and pathologic features with the exception of DNA ploidy (Appendix). The gene expression between the two groups was evaluated using univariate analysis (Table 2) for all 38 normalized genes, the predicted fusion variable, and the aneuploidy variable (see Methods in Appendix). As part of the Mayo Clinic standard practice, DNA ploidy14 is performed on all RRP cases. Based on this analysis, the best univariate predictors are BIRC5, RRM2, and TOP2A. CDH10, also a significant predictor, has a negative coefficient suggesting a protective role. The other genes with a significant (P ≤ .05) prognostic univariate effect were COL2A1, NRP1, and SSTR1. Because aneuploidy status has been previously indicated3,4 as potentially prognostic, it was evaluated and found to be significantly associated with case/control status. When hierarchical clustering (Appendix) was performed for the normalized genes based on absolute correlation (online-only Figs A1 and A2), the genes BIRC5, RRM2, and TOP2A, which are the most important univariate genes, form a tight cluster and only one of them is retained in a multivariate model.

Multivariate analysis.

We used a forward stepwise selection process to define an optimal model for prediction of prostate cancer outcome, yielding a four-variable model containing TOP2A, CDH10, and the fusion and aneuploidy status variables. At each iteration, all remaining variables were tested for their additional predictive power given the variables selected thus far (Appendix Tables A1, A2, and A3, online only). The first variable was TOP2A. With TOP2A in the model, BIRC5 and RRM2 were not significant predictors. The predictive performance of the model did not change significantly if either BIRC5 or RRM2 were selected as the first variable of the model. When any one of TOP2A, BIRC5, and RRM2 was selected as the first variable in the model, the other two genes lost significance, demonstrating these genes are interchangeable and suggesting they are related. The next most significant variable added to the model was CDH10. After adding CDH10 and TOP2A to the model, the significance of the fusion status increased and became the most significant remaining parameter (see Appendix for additional information). Adding fusion status to TOP2A and CHD10 rendered the remaining genes insignificant. Each successive step of this modeling improved the corresponding area under the curve (AUC) in the ROC plot by at least 0.02 (Fig 1). The AUC with TOP2A alone was 0.71— it increased to 0.74 with CDH10, and increased to 0.79 with the predicted fusion parameter (Fig 1). Aneuploidy remained a significant variable in this analysis, so it was added to the final model (Table 3) consisting of predicted fusion, TOP2A, CHD10, and aneuploidy which resulted to an AUC of 0.81 (95% CI, 0.74 to 0.88). The change in AUCs at each step was also tested for significance using the integrated discrimination improvement (IDI) method proposed by Pensina (Appendix) which agreed with the stepwise result. Further confirmation for the significance of the final model is provided by permutation tests and by the penalized fit to the original data, which also showed only minor changes in the coefficients. The score derived from the regression model is displayed in Figure 1B, showing the overall separation of the cases and the controls.

Independent validation using frozen tumor tissue from a separate cohort of 57 high-risk patients showed a slight decrease in the AUC to 0.79 (95% CI, 0.66 to 0.92; Fig 2A). The score derived from the regression model in this independent validation is displayed in Figure 2B.

It is important to note that patients were not matched for adjuvant postoperative therapy (hormone and radiation); however, nearly equal numbers of cases and controls received adjuvant therapy. In addition, stratified analysis indicated no association of adjuvant therapy with the final model, the treatment modalities were not significant predictors of outcome and the coefficients and P values of the other variables in the model changed only slightly.

DISCUSSION

Several candidate prognostic cancer models based on multiple gene panels or signatures have already been proposed.5,6,8,15-17 It is now important to construct models based on gene expression patterns that provide prognostic information in high-risk patients beyond that provided by established clinical and pathologic prognostic features. This need was demonstrated through a Cox model analysis of the Mayo Clinic Prostate Cancer Database cohort of more than 10,000 men with prostate cancer treated by RRP. In this study, we evaluated the prognostic utility of genes previously identified from gene expression profiling to be associated with high-grade (Gleason patterns 4 and 5) and metastatic prostate cancer.13 The expression of these genes was analyzed using real-time reverse transcription polymerase chain reaction in a prostate cancer tissue samples from patients treated by RRP. We compared expression in patients who developed systemic progression or died of prostate cancer with those who did not, where samples were matched for the established prognostic features of prostate cancer. This design allowed for the identification of prognostic biomarkers in men at high-risk of prostate cancer death, for whom established prognostic markers provide minimal or no additional prognostic information.

This study generated a model to distinguish men treated by RRP who develop systemic progression from men treated by RRP who do not develop systemic progression. The three-variable model, which includes TOP2A expression, CDH10 expression, as well as predicted TMPRSS2 (ERG, ETV1, or ETV4) fusion status, and aneuploidy had an AUC of 0.81 in training and 0.79 in validation. The slight reduction to the AUC could be attributed to the fact that the validation case control study was based in a risk-set study design (see Methods). Specific sensitivity and specificity are not reported as an optimal threshold would be arbitrary and not necessarily relevant based on the clinical situation.

The model was developed in a case-control study of men matched on Gleason score, TNM stage, margin status, and preoperative serum PSA who did and did not develop systemic progression or die of prostate cancer. Survival curves by score group could not be generated due to the case-control design, wherein the event rate is predetermined to be 50%. If the logistic score were applied prospectively, an increase in score of 0.7 (which is quite plausible) would imply a doubling in risk. For Gleason score 7, where the cumulative systemic progression rate at 10 years was 11%, this change in score could imply an overall increase in risk to around 20%.

It is important to note that the candidate genes were identified through association with Gleason pattern thus many of these genes were correlated with Gleason pattern or the degree of tumor differentiation but not necessarily patient outcome. Therefore, the majority of the candidate genes did not enter the model. Gleason pattern was used as a surrogate marker of tumor aggressiveness as there were inadequate numbers of frozen tissue samples from patients who had developed systemic progression or died of prostate cancer in the Mayo Clinic tissue bank. Indeed, among the 11,000 patients treated by RRP at the Mayo Clinic between 1990 and 2004, only 4% developed systemic progression or died of prostate cancer confirming the difficult nature of obtaining adequate patient samples to use systemic progression and prostate cancer death as an end point in analyses. Once Gleason grade was controlled for in the case-control design, many genes on the list no longer showed an association with outcome. Moreover, many of the variables in the initial list have an association with outcome; however, these genes frequently are strongly correlated with other genes on the list and thus dropout in subsequent steps in model building after the most significant gene in the cluster enters into the model (as demonstrated here with the proliferation markers TOP2A, RRM2, and BIRC5). Therefore, we believe that this methodology, and case-control study design resulted in an optimal predictive value that was validated in an independent set of samples.

TOP2A is an enzyme involved in segregation of newly replicated chromosome pairs, and has additional functions in the condensation and helix formation of chromosomal DNA and is part of the “cell proliferation cluster”.18 Proliferation-related genes are the most differentially expressed genes when tumor and matched normal tissues are compared.19 Cell proliferation cluster markers have been shown to be associated with poor outcome in numerous human malignancies including prostate.1,3,4,20 Recently, Hughes et al21 identified that TOP2A was associated with Gleason grade and, TOP2A expression has been associated with hormone insensitivity. In regards to therapy, there are chemotherapeutic agents that target TOP2A including etoposide, doxorubicin, and mitoxanthrone.22-26 It is possible patients whose tumors express elevated levels of TOP2A may benefit from these chemotherapeutic agents alone or in combination with other therapies, and we believe further studies are warranted. TOP2A overexpression may not only be prognostic but provide a target for chemotherapy.

CDH10 belongs to the large family of cell to cell adhesion proteins, classic cadherins type II.27 The role of CDH10 in the progression of human malignancies is unknown. However, it is known to play a role in cell to cell adhesion through stabilization of beta-catenin similar to type I cadherins such as E-cadherin.28 In prostate cancer, decreased expression of E-cadherin is associated with aggressive tumor behavior.29-32 Decreased E-cadherin expression in prostate cancer is associated with high Gleason score, advanced tumor stage, and cancer relapse.29-31 Another study found that aberrant E-cadherin expression was associated with salvage radiation failure for PSA recurrence after prostatectomy lending support to our findings that the high-risk patients have a feature of systemic disease rather than local progression.33 The loss of expression of adhesion molecules appears critical for tumor cells to develop an invasive phenotype that allows extension through prostatic stroma and access to vascular and lymphatic spaces for the systemic spread of tumor. In a manner analogous to E-cadherin, we demonstrate that decreased expression of another adhesion molecule, CDH10, is inversely correlated with prostate cancer progression suggesting its role in maintaining cell to cell adhesion and preventing tumor cell dissemination.

Recent studies support a role for the TMPRSS2 to ETS gene fusions as a prognostic marker for men with prostate cancer. In a study of 106 tumors, the TMPRSS2:ERG was found to be associated with moderate to poorly differentiated tumors.34 The fusion was not identified in any of five cases of benign prostatic hyperplasia, and in only one of 15 Gleason pattern 2 tumors. In contrast, approximately 40% of Gleason patterns 3, 4, or 5 tumors were found to contain fusions.34 In a clinical watchful waiting cohort, there was a statistically significant association between the most common fusion (TMPRSS2:ERG) and prostate cancer–specific death. These findings suggest that the TMPRSS2: ERG cancers have a more aggressive phenotype, possibly associated with increased ERG expression.10 Furthermore, TMPRSS2 is an androgen-responsive gene, and may play a important role in the behavior of androgen-dependent cancers, while the requirement for androgen-regulated expression might be bypassed in late stage or androgen-independent tumors.35 In this study, we show the prognostic significance of the TMPRSS2 to ETS gene fusions in a multivariate but not univariate analysis in the high-risk prostate cancer population. Our findings suggest the need to determine the presence of these fusions in men who have the pathologic features that put them at high risk for systemic progression.

AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The author(s) indicated no potential conflicts of interest.

AUTHOR CONTRIBUTIONS

Conception and design: John C. Cheville, George Vasmatzis

Financial support: John C. Cheville, George Vasmatzis

Administrative support: George Vasmatzis

Provision of study materials or patients: John C. Cheville, Jeffrey R. Karnes, Laureano J. Rangel, Eric Bergstralh, Cemile D. Savci-Heijink

Collection and assembly of data: John C. Cheville, Farhad Kosari, Jan-Marie Munz, Lori Tillmans, Eati Basal, Laureano J. Rangel, Cemile D. Savci-Heijink, Eric W. Klee, George Vasmatzis

Data analysis and interpretation: John C. Cheville, Jeffrey R. Karnes, Terry Therneau, Farhad Kosari, Jan-Marie Munz, Lori Tillmans, Eati Basal, Eric Bergstralh, Irina Kovtun, Cemile D. Savci-Heijink, Eric W. Klee, George Vasmatzis

Manuscript writing: John C. Cheville, Jeffrey R. Karnes, Terry Therneau, Farhad Kosari, Jan-Marie Munz, Lori Tillmans, Eati Basal, Eric Bergstralh, Irina Kovtun, Cemile D. Savci-Heijink, Eric W. Klee, George Vasmatzis

Final approval of manuscript: John C. Cheville, Jeffrey R. Karnes, Terry Therneau, Farhad Kosari, Jan-Marie Munz, Lori Tillmans, Eati Basal, Laureano J. Rangel, Eric Bergstralh, Irina Kovtun, Cemile D. Savci-Heijink, Eric W. Klee, George Vasmatzis

Appendix

Cohort Study for Endorsing High-Risk Population Marker Development

To confirm the need for novel biomarkers to predict outcome in high risk prostate cancer patients, an analysis of a large cohort of prostate cancer patients treated by radical retropubic prostatectomy (RRP) at the Mayo Clinic was undertaken. This cohort analysis was performed independent of the case-control study used to evaluate the candidate biomarkers. To examine the utility and sufficiency of clinical variables, a Cox proportional hazards model was fit to the entire cohort. We focused on subjects enrolled from 1990 to 2004 in order to assure sufficient follow-up. In this cohort of 10,623 men, follow-up was complete to either systemic progression or death in 1,569. For the remaining 9,040, the last year of follow-up was ≥ 2005 in 95.7% and in 2004 for an additional 2%. Only 1.2% were officially lost, in that we did not know a current address. The rates of follow-up are nearly identical for the Gleason score (GS) ≥7 subset. There were 441 observed systemic progressions (including prostate cancer deaths) among the 10,623 subjects. Using the clinical and pathological information in the database, models were fit for both the total and high-risk subgroups (GS ≥ 7).

Model building case-control design.

To assess the potential utility of selected biomarkers and to facilitate the construction of a multivariable model on the prediction of systemic progression, a case-control study design was used. Men with systemic progression within 5 years of RRP were identified, and matched with controls known to be free of systemic progression at 7 years. Subjects were also matched on GS, TNM stage, margin status, and preoperative serum prostate-specific antigen (PSA). Systemic progression was defined as the development of metastatic disease as determined by clinical, radiologic (bone or computed tomography scan), or pathologic (biopsy) evaluation. PSA progression alone was not sufficient for declaration of a systemic progression. Patients undergoing RRP at the Mayo Clinic are evaluated postoperatively at least quarterly for 1 year, semiannually for 1 year, and annually thereafter. Digital rectal examination and serum PSA are evaluated at each visit. If patients had an abnormal elevation in serum PSA postoperatively, a radioactive bone scan, plain radiograph in the presence of an abnormal bone scan, and/or computed tomography were performed. Patients that did not return to the Mayo Clinic were mailed kits for blood submission and serum PSA testing, and additional medical information was obtained from the local physicians as needed (Blute ML, et al: J Urol 165:119-125, 2001).

Based on this data, an initial set of 200 samples, comprising matched cases and controls, was defined. Tissues were acquired from the tissue bank and reviewed by two pathologists blinded to the case-control status. Of the 200 samples, 76 cases and 81 controls passed the pathology review and had sufficient tissue of the highest Gleason pattern cancer available for experimental analysis. The clinical and pathologic features of the selected subjects are listed in Table A2. A number of cases and controls were not included due to an absence of tumor block or insufficient tumor for analysis. In addition, some controls were excluded because of inaccurate Gleason grading; often the block containing the highest grade of the tumor was exhausted by previous studies. This occurred for both controls and cases. The controls were selected based on the surgical pathology report, and all materials were reviewed by a urologic pathologist to ensure appropriate Gleason grading when matching cases and controls. All GS6 samples were excluded from this study, hence cases and controls were GS ≥7. Due to the exclusion of several subjects following review by the pathologists, exact case-control matching was not preserved. However, the retained cases and controls remained balanced on age, preoperative serum PSA, TNM stage, GS, margin status, and adjuvant (< 90 days after surgery) therapy, and an unmatched analysis was used. Based on the final numbers the study had approximately 80% power to detect (α = .05, two sided) a mean difference in gene expression between cases and controls equivalent to 0.45 standard deviations. For present/absent gene expression, a 22-percentage point difference (eg, 40% v 62%) between cases and controls could be detected.

Validation case-control design.

An independent nested case-control design was conducted to assess the model identified in the model building component of the analysis. Twenty-one systemic progression cases were identified from 2000 to 2005 inclusive who had both fresh frozen and paraffin-embedded tissue. (Fresh frozen specimens have been archived since 1999.) For each case, two potential controls were identified from those who were at risk but who had not yet progressed at the time of the case's event (time of systemic progression or death) and matched on GS, pathologic stage, age, and date of surgery. The clinical and pathologic features of the selected subjects are listed in Table A3. The total number of controls with sufficient RNA was 36. Because of the risk-set sampling design the average follow-up of controls in this set was much less than the first case-control set of patients. Nevertheless, the risk-set sampling design was necessary as there is not yet sufficient follow-up in this cohort for a 7 year rule. We would expect that the consequences of this design are that it would still validate the model from stage 1, but that the model would discriminate slightly less well.

Experimental Methods

Processing of FFPE samples.

In all experiments, the processing of samples was randomized to prevent processing biases. Each case was reviewed by a pathologist and tumor was identified on H-E stained sections. Subsequent sections (10 μm) from each case were prepared under RNase free condition and de-paraffinized with Xylene. Identified tumor areas were scraped into 2 mL tubes containing digestion buffer (RecoverAll kit; Ambion, Austin, TX). Total RNA was isolated according to the RecoverAll RNA isolation procedure. The isolated RNA was treated with DNase using Ambion Turbo DNA free kit, according to manufacturer's instructions (Ambion). The amount of RNA in each case was measured by the Quant-iT RiboGreen kit (Invitrogen Carlsbad, CA). Reverse transcription (RT) was performed using Superscript III First Strand Synthesis system (Invitrogen) and 500 ng of RNA from each case in a 40 μL reaction volume.

Processing of frozen samples for validation.

Tissue was cut by our tissue-processing core facility and was kept frozen. Sections were stored on slides at −80 for less than 1 week. Slides were placed directly from −80% to 75% ethanol (ETOH) for 30 seconds × 2 to remove OTC, then placed in 95% ETOH for 30 seconds and 100% ETOH for 30 seconds and air dried. Tissue was scraped directly into lysis solution from the RNeasy mini or midi kit (Qiagen, Valencia, CA) and processed immediately according to manufacturer's instructions. Reverse transcription and quantitative polymerase chain reaction (qPCR) were performed as described for FFPE tissue except 200 ng of RNA was used in the RT reaction and 1 μL cDNA was used in subsequent qPCR (5 ng RNA equivalent).

qPCR was performed on each sample by adding 12.5-ng total RNA equivalent of cDNA to a 20 μL reaction volume for each gene of interest using SYBR green PCR Master Mix (Applied Biosystems ABI, Foster City, CA) on the ABI 7900HT real-time PCR machine using the manufacturer's default cycling conditions. Primers for qPCR were designed using Primer Express software (ABI) to amplify a 70 to 85 base pair fragment from the Affymetrix target sequence for the gene of interest. The primer pair concentrations (0.15 or 0.2 μmol/L final) were optimized by generating standard curves using a pool of prostate cDNA from normal and tumor tissue. To check for genomic DNA, No RT samples were run in a qPCR reaction and those with cycle threshold (Ct) less than 35 for GapDH were considered contaminated with DNA and were reprocessed. In the analysis of data, undetermined values for Cts were replaced with a Ct of 40, which was the maximum cycle number in the qPCR experiments.

All samples were analyzed in duplicate and all studies were carried out under Mayo institutional review board–approved protocols. Primers used for the quantitative RT-PCR expression analysis of the genes in the final model are: Forward reverse: GAPDH, CATGGCCTCCAAGGAGTAAGAC TCTCTTCCTCTTGTGCTCTTGCT; RPS 28, GCTGCTCGCTGGGTCTTG GGAGCAGATTGTGACAGACCATT; ERG, GCTGCCACAATCAGAAATCA TCGCGACTCAAAGGAAAACT; TOP2A, TGGCTGCCTCTGAGTCTGAA AGTCTTCTGCAATCCAGTCCTCTT; CDH10, GAACAGGATAGTTCTCCCTTAAGCA CAAGGGCAGGACATGTACCTAAC; ETV1, TGTTTTTGCTTTGCATTTGG TCCCCATTTACTCATGGTTTTT; ETV4, GCAGATCCCCACTGTCCTAC CCACTTTTCCTTCCCAATGA.

Data analysis.

All quantitative PCR measurements were normalized by subtracting the number of cycles measured for a candidate gene from the average number of cycles measured for GAPDH and RPS28 in the same tissue sample. This normalization method inverts the amplification values, such that higher values correspond to higher expression. The full set of un-normalized data was used to assess the variance of the assay using a Bland-Altman plot of the average of a pair of replicates versus their difference (not shown). Assay variability was close to constant for values below 33, from which point it increased linearly. The inverse of the estimated variance was used to create a weighted average for each replicate pair. In samples where one replicate measure returned no value, the other replicate value was used. In samples where both replicate measures returned no value, the measurement was treated as missing.

Ploidy analysis.

Flow cytometry was performed as previously described (Zanetta G, et al: Am J Obstet Gynecol 175:1217-1225, 1996). The nuclear content of 10,000 nuclei was measured with a FACScan (Becton Dickinson, Sunnyvale, CA) flow cytometer. Cell cycle evaluation of the DNA histogram was performed with a Modfit 5.2 (Verity Software, Topsham, ME) computerized software program. Tumors with only one identifiable gap0-gap1 peak were classified as DNA diploid (2n). Tumor samples that contained a significant increase in the 4n peak (> 9% of nuclei) and an identifiable 8n population were categorized as DNA tetraploid. Tumor DNA content was classified as DNA aneuploid if a separate, identifiable gap0-gap1 peak was present. All DNA histograms were analyzed and classified without knowledge of the clinicopathologic features or patient outcome.

Role of ploidy.

In this case-control study, cases and controls were not matched for aneuploidy as this is not a standard pathologic assessment of prostate cancer in RRP specimens, and its association with prostate cancer outcome is debated. Ploidy generally correlates with other prognostic factors, such as GS, tumor stage, and tumor volume: low-stage tumors are usually diploid, and high-stage tumors are usually nondiploid, and therefore some investigators have shown that there is no association of tumor ploidy with outcome in a multivariate model (Humphrey PA, et al: Am J Surg Pathol 15:1165-1170, 1991). However, in a Mayo Clinic study, when patients with tumor that had spread beyond the prostate were examined, tumor ploidy was found to be significantly associated with progression (Robertson CN, et al: Acta Oncol 30:205-207, 1991; Winkler HZ, et al: Mayo Clin Proc 63:103-112, 1988). In the Mayo Clinic practice, cancer ploidy is assessed on every RRP specimen, and therefore we investigated the significance of ploidy in the molecular model. The ploidy status parameter is often defined as 0 for diploid tumors, 1 for tetraploid tumors, and 2 for anauploid. However, we decided that it would be difficult to separate tetraploid from proliferation so we defined a binary variable only for aneuploidy (1 for aneuploid and 0 for everything else). This variable is still a significant predictor in univariate and multivariate analysis and entered into the model after the fusion-status variable. As measurement of aneuploidy is not a standard of care for prostate cancer we decided to report the model with and without the aneuploidy in this manuscript (Fig 1 in the manuscript).

Fusion status analysis.

The presence of fusions between TMPRSS2 and ERG, ETV1, or ETV4 was assigned based the expression values of ERG, ETV1, and ETV4 genes. The thresholds for status assignments were determined for each gene by analyzing the normalized Ct expression values of these genes. The fusion-status parameter was set to 1 when one of the ETS factors (ERG, ETV1, or ETV4) was above their specified threshold and to −1 when all of them were below the thresholds. If the RT-PCR failed for one of these three genes or the values were to close to call that fusion-status was set to 0. The thresholds were set by plotting histograms of the expression values for all cases for each of these genes and finding two cutoff points (Fig A2). Samples where the expression was greater than the upper threshold in any of the three genes (ERG > −4.8; ETV1 > −5.5; ETV4 > −6) were designated fusion positive. Samples where the expression was less than the lower threshold for all three genes (ERG < −5.4; ETV1 < −6.8; ETV4 < −5.9) were assigned fusion negative. The same procedure was implemented for the frozen samples and the thresholds were −5, −3, and −7, respectively.

Statistical Analysis

Methods.

In the cohort study, the Cox proportional hazards model was used to evaluate predictors of time to systemic progression. The concordance statistic (C), which has an interpretation similar to receiver operator characteristic curve area was used to compare models. In the case-control study the association of gene expression with systemic progression was assessed using univariate and multivariate logistic regression with the dependent variable (0 = control, 1 = case) and the normalized genes as the independent predictors. Computations were done using the S-Plus statistical package (Insightful Corp, Seattle WA) and the open-source R statistical package. The clustering analysis was performed using clustering methods in R.

Multivariate analysis.

The P values for the univariate and the stepwise logistic regression analysis can be followed in Table A1. In iteration 1 the model was adjusted for TOP2A (model 1). In iteration 2 the model was adjusted for TOP2A and CDH10 (model 2). In iteration 3 the model was adjusted for TOP2A, CDH10, and pred_fusion (model 3). The final model included TOP2A, CDH10, and pred_fusion and aneuploidy (model 4).

The change in area under the curve from model 1 (more than 2), model 2 (more than 3), and model 3 (more than 4), was tested for significance using the integrated discrimination improvement statistic proposed by Pensina et al (Pensina MJ, et al: Statist Med 27:157-172, 2008). The P values were .006, .001, and .025 respectively for each step. We also applied bias reduced regression on the final model using the plr function in R. The penalized fit showed only minor changes in the coefficients (shrinkage was chosen to give 3.5 effective df).

Importance of fusions in the multivariate analysis.

The ETS-fusion status became a strong predictor of outcome after CDH10 entered in the model (Table A1). Logistic regression showed a strong association of CDH10 expression with the fusion status in all three data sets. As we examined the association of the ETS fusion with CDH10, we observed that CDH10 expression was variably overexpressed in much higher levels in ETS fusion–positive samples than in ETS-fusion–negative samples. The pred_fusion status variable entered the model to compensate for this association. At the time we do not know the biologic explanation for the association of CDH10 and ETS transcription factors. We are in the process of performing studies to better understand this association. Of more clinical significance we observed that if we apply our model in a fusion dependent manner, we see an improvement in the model's predictive value in fusion-positive cases to 0.82 area under the curve. Our data suggests that the presence or absence of the fusion is associated with unique gene expression profiles of prognostic markers, and further studies are underway. For example, COL2A1 and KHDRBS3 genes, that have a strong association with the fusion, also became strong predictors when the CDH10 entered into the model presumably imposing a similar adjustment.

Independent validation.

The concordance index (area under the curve) for the validation set is 0.79 (0.66,0.92) when using the coefficients derived from the training set. We also attempted to fit the four variables in this set as well. The concordance index (area under the curve) of the panel fit for the validation set was 0.82 (95% CI, 0.7 to 0.94). The model parameters are as follows: variable original model coefficient (95% CI) refit coefficient: TOP2A, 0.71 (95% CI, 0.38 to 1.04) 0.81; CDH10, 0.62 (95% CI, 0.93 to 0.31) 0.78; pred_fusion, 0.91 (95% CI, 0.36 to 1.46) 1.07; aneuploidy 1.34 (95% CI, 0.14 to 2.54).

As shown earlier, the model coefficients for the individual variables were also consistent in the validation set with the exception of aneuploidy. In the validation data set only three cases were aneuploid, hence the coefficient is not estimatable. If there was serious overfitting in the training set we would expect that the coefficients in the validation set to be shrunk toward 0. Further confirmation is provided by the penalized fit to the original data, which also showed only minor changes in the coefficients (shrinkage was chosen to give 3.5 effective df).

Clinical Information: Important Considerations

The results of this study, while showing strong promise for prognostication of high-risk prostate cancer patients, should be interpreted with the following limitations in mind. The samples evaluated in this study primarily consisted of patients of white descent. To obtain a more accurate assessment of prognostic performance in the population at large, a cohort study that includes proper minority population content is needed. Also, in the absence of an established assay to determine fusion status of the tumors, expression levels of the ETS gene family were used. This surrogate measure of fusion status has been applied in other studies (Demichelis F, et al: Oncogene 26:4596-4599, 2007) and we believe predicts the fusion status accurately.

Fig A1.

Genes were clustered using functions from the R package according to their expression profiles in all samples. Expression vectors were first centered according to the mean expression for each gene. Then the distance matrix was computed using Pearson correlation as the distance measure. The clustering was performed using “hclust” with default parameters. Most of the genes clustered into two major clades. The three proliferation related genes, BIRC5, TOP2A, and RRM2 are the closest subcluster.

Fig A2.

Histograms of ERG, ETV1, and ETV4 normalized expression in FFPE and frozen preserved tissues. (A) Formalin-fixed paraffin-embedded tissues. (B) Fresh frozen tissue.

Table A1.

P Values of All Variables in the Univariate Model Analysis and the Subsequent Iterations in the Stepwise Logistic Regression Process

Table A2.

Clinical and Pathologic Features of the Patients in the Model Building Case-Control Study Using Formalin-Fixed, Paraffin-Embedded Tissues

Table A3.

Clinical and Pathologic Features of the Patients in the Validation Case-Control Study Using Fresh-Frozen Tissues

Fig 1.

(A) Receiver operating characteristic (ROC) analysis based on logistic regression models for predicting systemic progression. Predictors in each model are: 1 = TOP2a; 2 = TOP2A, CDH10; 3 = TOP2A, CDH10, predicted fusion; and 4 = TOP2A, CDH10, predicted fusion, aneuploidy. Models 1, 2, 3, and 4, ROC curves are shown as blue, yellow, red, and brown lines, respectively. The text shows the coefficients and the area under the curve for each model. Note that the coefficient of CDH10 is negative. (B) Scatter plot of the final model score for cases (red points) and controls (yellow points) of the training set. The yellow bullets are patients with prostate-specific antigen progression but no sign of systemic progression.

Fig 2.

(A) Receiver operating characteristic (ROC) analysis of model 4 (0.71 × TOP2A − 0.62 × CDH10 + 0.91 × pred_fusion + 1.34 × aneuploidy) for the independent validation set. (B) Scatter plot of the score for cases (red points) and controls (yellow points) for the independent validation set.

Table 1.

Multivariate (Cox model) Analysis of Clinical and Pathologic Predictors of Systemic Progression After Radical Retropubic Prostatectomy From the Mayo Clinic Radical Retropubic Prostatectomy Database of Men (N = 10,626)

Table 2.

Genes Examined by Quantitative Reverse Transcription Polymerase Chain Reaction in the Case/Control Study

Table 3.

Final Logistic Regression Model Predicting Systemic Progression

Acknowledgments

We thank the MN Department of Employment and Economic Development for the funds for the MN Partnership for Biotechnology and Medical Genomics that provided the Microarray data. Mayo Clinic has filed a patent application entitled “Predicting Cancer Outcome,” involving data included in this publication.

Footnotes

  • Supported by a generous gift from The Richard M. Schulze Family Foundation; the Mayo Clinic Comprehensive Cancer Center; the Department of Laboratory Medicine and Pathology and the SPORE in Prostate Cancer Grant No. CA91956 from the National Cancer Institute, US National Institutes of Health.

    Authors’ disclosures of potential conflicts of interest and author contributions are found at the end of this article.

  • Received December 9, 2007.
  • Accepted April 15, 2008.

REFERENCES

| Table of Contents
  • Advertisement
  • Advertisement
  • Advertisement