HSMotifDiscover: Identification of motifs in sequences composed of non-single-letter elements

Vinod Kumar Singh; Rohan Misra; Steven C. Almo; Ulrich G. Steidl; Hannes E. Bülow; Deyou Zheng

doi:10.1093/bioinformatics/btac437

HSMotifDiscover: Identification of motifs in sequences composed of non-single-letter elements

Vinod Kumar Singh, Rohan Misra, Steven C. Almo, Ulrich G. Steidl, Hannes E. Bülow, Deyou Zheng

Research output: Contribution to journal › Article › peer-review

Abstract

Summary: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, e.g. preferred patterns of chemical modifications on proteins, DNAs, RNAs or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers.

Original language	English (US)
Pages (from-to)	4036-4038
Number of pages	3
Journal	Bioinformatics
Volume	38
Issue number	16
DOIs	https://doi.org/10.1093/bioinformatics/btac437
State	Published - Aug 15 2022

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btac437

Cite this

@article{82b6b7af60324a56b82a46b842d39294,

title = "HSMotifDiscover: Identification of motifs in sequences composed of non-single-letter elements",

abstract = "Summary: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, e.g. preferred patterns of chemical modifications on proteins, DNAs, RNAs or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers.",

author = "Singh, {Vinod Kumar} and Rohan Misra and Almo, {Steven C.} and Steidl, {Ulrich G.} and B{\"u}low, {Hannes E.} and Deyou Zheng",

year = "2022",

month = aug,

day = "15",

doi = "10.1093/bioinformatics/btac437",

language = "English (US)",

volume = "38",

pages = "4036--4038",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "16",

}

TY - JOUR

T1 - HSMotifDiscover

T2 - Identification of motifs in sequences composed of non-single-letter elements

AU - Singh, Vinod Kumar

AU - Misra, Rohan

AU - Almo, Steven C.

AU - Steidl, Ulrich G.

AU - Bülow, Hannes E.

AU - Zheng, Deyou

PY - 2022/8/15

Y1 - 2022/8/15

N2 - Summary: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, e.g. preferred patterns of chemical modifications on proteins, DNAs, RNAs or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers.

AB - Summary: The functional sub-string(s) of a biopolymer sequence defines the specificity of its interaction with other biomolecules and is often referred to as motifs. Computational algorithms and software have been broadly developed for finding such motifs in sequences in which the individual elements are single characters, such as those in DNA and protein sequences. However, there are more complex scenarios where the motifs exist in non-single-letter contexts, e.g. preferred patterns of chemical modifications on proteins, DNAs, RNAs or polysaccharides. To search for those motifs, we describe a new method that converts the modified sequence elements to representative single-letter codes and then uses a modified Gibbs-sampling algorithm to define the position specific scoring matrix representing the motif(s). As a proof of principle, we describe the implementation and application of an R package for discovering heparan sulfate (HS) motifs in glycan sequences, which are important in regulating protein-protein interactions. This software can be valuable for analyzing high-throughput glycoprotein binding data using microarrays with HS oligosaccharides or other biological polymers.

UR - http://www.scopus.com/inward/record.url?scp=85136575217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85136575217&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btac437

DO - 10.1093/bioinformatics/btac437

M3 - Article

C2 - 35771633

AN - SCOPUS:85136575217

SN - 1367-4803

VL - 38

SP - 4036

EP - 4038

JO - Bioinformatics

JF - Bioinformatics

IS - 16

ER -

HSMotifDiscover: Identification of motifs in sequences composed of non-single-letter elements

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this