Evidence-ranked motif identification

Stoyan Georgiev¹, Alan P Boyle, Karthik Jayasurya, Xuan Ding, Sayan Mukherjee, Uwe Ohler

Affiliations

PMID: 20156354
PMCID: PMC2872879
DOI: 10.1186/gb-2010-11-2-r19

V体育ios版 - Evidence-ranked motif identification

Stoyan Georgiev et al. Genome Biol. 2010.

. 2010;11(2):R19.

doi: 10.1186/gb-2010-11-2-r19. Epub 2010 Feb 15.

Authors

Stoyan Georgiev¹, Alan P Boyle, Karthik Jayasurya, Xuan Ding, Sayan Mukherjee, Uwe Ohler

"VSports最新版本" Affiliation

¹ Program for Computational Biology and Bioinformatics, Duke University, 102 North Building, Durham, NC 27708, USA.

PMID: 20156354
PMCID: PMC2872879
DOI: 10.1186/gb-2010-11-2-r19

Abstract (V体育官网)

cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence. Instead of pre-selecting promising candidate sequences, it utilizes information across all sequence regions to search for high-scoring motifs VSports手机版. We apply cERMIT on a range of direct binding and overexpression datasets; it substantially outperforms state-of-the-art approaches on curated ChIP-chip datasets, and easily scales to current mammalian ChIP-seq experiments with data on thousands of non-coding regions. .

PubMed Disclaimer

VSports在线直播 - Figures

**Figure 1**
**cERMIT motif discovery algorithm**. cERMIT starts with all possible 5-mer seeds and proceeds by independently 'evolving' each seed by increasing the enrichment of target sequences in the top of the evidence ranked list.

**Figure 2**
**Motif discovery pipeline**. Pipeline for motif discovery based on genome-wide evidence of regulation. Sequence reads are aligned to the reference genome and peak calling is executed to produce a set of putative regulatory regions (for example, DNaseI peaks) and corresponding evidence of regulation (for example, ChIP-seq peaks). As a final step in the pipeline, cERMIT is run on the preprocessed data to produce motif predictions that are best supported by the observed experimental evidence E.

**Figure 3**
**Human ChIP-seq motif predictions**. Motif predictions of cERMIT on six human ChIP-seq datasets: STAT1 [35], the insulator binding protein CTCF [36], SRF, GABP [37], FoxA1 [38], and NRSF [39]. The 'ensemble' column includes results from using the ensemble of all six datasets to define the space of regulatory regions (see text). The 'DNaseI' column includes cERMIT predictions when using open chromatin regions, as defined by DNaseI peaks, to be the set of putative regulatory regions. Literature position-specific scoring matrices (PSSMs) were extracted from TRANSFAC 2009.1. Asterisks indicate the optimal alignment of motif prediction to literature. CTCF, due to its ubiquitous binding, was recovered using the top 25,000 DNase peaks as input to cERMIT. All other datasets consider the top 5,000 peaks from each factor (in the two different scenarios).

**Figure 4**
**Mouse ChIP-seq mouse**. Motif predictions of cERMIT on mouse ChIP-seq data from [40]. The predictions of cERMIT use the 'ensemble' approach to define the set of putative regulatory regions (see text for details). Literature position-specific scoring matrices (PSSMs) were extracted from TRANSFAC 2009.1, except for CTCF [45], Klf4 [57], and Zfx (unknown). Asterisks are used to indicate the optimal alignment of motif prediction to literature. Each individual factor contributes (the top scoring) 5,000 peaks to the ensemble set of putative regulatory regions.

See this image and copyright information in PMC

References

1. Ohler U, Niemann H. Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet. 2001;17:56–60. - "VSports" PubMed
1. Tompa M, Li N, Bailey T, Church G, De Moor B, Eskin E, Favorov A, Frith M, Fu Y, Kent, Makeev V, Mironov A, Noble W, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005;23:137–144. - "V体育平台登录" PubMed
1. Bussemaker H, Foat B, Ward L. Predictive modeling of genome-wide mRNA expression: from modules to molecules. Annu Rev Biophys Biomol Struct. 2007;36:329–347. - PubMed
1. Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993;262:208–214. - VSports app下载 - PubMed
1. Roth P, Hughes J, Estep P, Church G. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol. 1998;16:939–945. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
VSports app下载 - Actions
V体育2025版 - Actions
Actions
V体育ios版 - Actions
Actions
Actions (VSports手机版)
Actions
Actions (VSports在线直播)

Substances

Actions

Grants and funding

P50GM081883/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

"VSports最新版本" Save citation to file

Email citation

Add to Collections

Add to My Bibliography (VSports注册入口)

Your saved search

Create a file for external citation management software

Your RSS Feed (V体育平台登录)