Abstract
In this paper, sequence unique reconstruction refers to the property that a sequence is uniquely reconstructable from all its K-tuples. We propose and study the phase transition behavior of the probability P(K) of unique reconstruction with regard to tuple size K in random sequences (iid model). Based on Monte Carlo experiments, artificial proteins generated from iid model exhibit a phase transition when P(K) abruptly jumps from a low value phase (e.g. < 0.1) to a high value phase (e.g. > 0.9). With a generalization to any alphabet, we prove that for a random sequence of length L, as L is large enough, P(K) undergoes a sharp phase transition when ple; 0.1015 where p = P (two random letters match). Besides, formulas are derived to estimate the transition points, which may be of practical use in sequencing DNA by hybridization. Concluded from our study, most proteins do not deviate greatly from random sequences in the sense of sequence unique reconstruction, while there are some "stubborn" proteins which only become uniquely reconstructable at a very large K and probably have biological implications.
Original language | English (US) |
---|---|
Pages (from-to) | 18-29 |
Number of pages | 12 |
Journal | Journal of Systems Science and Complexity |
Volume | 20 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2007 |
Externally published | Yes |
Keywords
- Phase transition
- Probability
- Protein sequence
- SBH
- Unique reconstruction
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Information Systems