HSMotifDiscover: Identification of motifs in sequences composed of non-single-letter elements

Vinod Kumar Singh, Rohan Misra, Steven C. Almo, Ulrich G. Steidl, Hannes E. Bülow, Deyou Zheng

Research output: Contribution to journalArticlepeer-review


Summary: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, e.g. preferred patterns of chemical modifications on proteins, DNAs, RNAs or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers.

Original languageEnglish (US)
Pages (from-to)4036-4038
Number of pages3
Issue number16
StatePublished - Aug 15 2022

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'HSMotifDiscover: Identification of motifs in sequences composed of non-single-letter elements'. Together they form a unique fingerprint.

Cite this