Dartmouth Scholarship

Cross-Platform Normalization of Microarray and Rna-Seq Data for Machine Learning Applications

Document Type

Article

Publication Date

1-21-2016

Publication Title

PeerJ

Department

Geisel School of Medicine

Abstract

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

DOI

10.7717/peerj.1621

Dartmouth Digital Commons Citation

Thompson, Jeffrey A.; Tan, Jie; and Greene, Casey S., "Cross-Platform Normalization of Microarray and Rna-Seq Data for Machine Learning Applications" (2016). Dartmouth Scholarship. 2606.
https://digitalcommons.dartmouth.edu/facoa/2606

Download

Included in

Genetics and Genomics Commons, Medicine and Health Sciences Commons

COinS

Dartmouth Scholarship

Cross-Platform Normalization of Microarray and Rna-Seq Data for Machine Learning Applications

Document Type

Publication Date

Publication Title

Department

Abstract

DOI

Dartmouth Digital Commons Citation

Included in

Browse

Search

Contribute

Links

Questions?

Dartmouth Scholarship

Cross-Platform Normalization of Microarray and Rna-Seq Data for Machine Learning Applications

Authors

Document Type

Publication Date

Publication Title

Department

Abstract

DOI

Dartmouth Digital Commons Citation

Included in

Share

Browse

Search

Contribute

Links

Questions?