Comparative genomics based on sequenced referenced genomes is essential to hypothesis generation and testing within population genetics. However, selection of candidate regions for further study on the basis of elevated or depressed divergence between species leads to a divergence-based ascertainment bias in the site frequency spectrum within selected candidate loci. Here, a method to correct this problem is developed that obtains maximum-likelihood estimates of the unascertained allele frequency distribution using numerical optimization. I show how divergence-based ascertainment may mimic the effects of natural selection and offer correction formulae for performing proper estimation into the strength of selection in candidate regions in a maximum-likelihood setting.
Kern, Andrew D., "Correcting the Site Frequency Spectrum for Divergence-Based Ascertainment" (2009). Open Dartmouth: Faculty Open Access Articles. 2618.