Random Shuffling Permutations of Nucleotides
Shiquan Wu, Xun Gu
In this paper, we discuss a shuffling sequence problem: Given a DNA sequence, we generate a random sequence that preserves the frequencies of all mononucleotides, dinucleotides, trinucleotides or some high order base-compositions of the given sequence. Two quadratic running time algorithms, called Frequency-Counting algorithm and Decomposition- and-Reassemble algorithm, are presented for solving the problem. The first one is to count all frequencies of the mononucleotides, dinucleotides, trininucleotides, and any high order base-compositions in the given sequence. The second one is to generate a random DNA sequence that preserves the mononucleotides, dinucleotides, trinucleotides, or some high order base- compositions. The two algorithms are implemented into a program ShuffleSeq (in C) and is available at http://www.cs.iastate.edu/~ sqwu/ShuffleSeq.html.