More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer (PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet), permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa genes are likely to have an antiapoptotic effect and to play a role in cell proliferation, angiogenesis, and cell adhesion. Their proteins are likely to be ubiquitinated or sumoylated but not acetylated. A number of novel PCa candidates have been proposed. Functional annotations of novel candidates identified antiapoptosis, regulation of cell proliferation, positive regulation of kinase activity, positive regulation of transferase activity, angiogenesis, positive regulation of cell division, and cell adhesion as top functions. We provide the list of the top 200 predicted PCa genes, which can be used as candidates for experimental validation. The model may be modified to predict genes for other cancer sites.
Dartmouth Digital Commons Citation
Gorlov, Ivan P.; Logothetis, Christopher J.; Fang, Shenying; Gorlova, Olga Y.; and Amos, Christopher, "Building a Statistical Model for Predicting Cancer Genes" (2012). Dartmouth Scholarship. 2572.