JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]


Journal of Information Science and Engineering, Vol. 27 No. 3, pp. 855-868


Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test


IAN LIUQ+, SAN-YANG LIU AND LI-FANG LIU
+Department of Mathematics 
School of Computer Science and Technology 
Xidian University 
Xi'an, 710071 P.R. China


    Transcription factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. We propose to identify and group similar profiles using Bayesian hypothesis test between PFMs, describing a column-by-column method for PFM similarity quantification based on Bayes factor and posterior probability of null model that aligned columns are independent and identically distributed observation from the same multinomial distribution. We group TFBS frequency matrices from less redundant JASPAR into matrix families by cluster analysis according to Bayes factors and posterior probability of similar PFMs. Clusters of highly similar matrices are identified. We further compare the performance of this method to Pearson χ2 test on simulated data. The proposed method is very simple, easily implemented and outperforms the other method in our test. Taking Pearson product moment correlation coefficient as an objective criterion of the performance, results indicate that Bayesian test performs better than the classical methods on average.


Keywords: transcription factor binding site, position frequency matrices, similarity, Bayes factor, posterior probability, cluster analysis

  Retrieve PDF document (JISE_201103_04.pdf)