Quantitative sequence and open reading frame analysis based on codon bias
Susan Rainey, Joe Repka
The frequencies with which the sixty-four codons occur in human coding DNA are known. If we assume that the codons occur randomly, subject only to these probabilities, then it is possible to predict trinucleotide frequencies in each of the five other reading frames. A model is developed for evaluating the extent to which a given sequence has trinucleotide frequencies compatible with coding DNA. This model is tested using known samples of coding DNA taken at random from GenBank, and good agreement is found. Practical and theoretical applications are discussed, including determination of coding open reading frames, evaluation of sequence data for frameshift mutations and examination of hypothetical genes.