The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.
Carlson, Jonathan M.; Chakravarty, Arijit; Khetani, Radhika S.; and Gross, Robert H., "Bounded Search for de Novo Identification of Degenerate Cis-Regulatory Elements" (2006). Open Dartmouth: Faculty Open Access Articles. 572.