<tt lang="e6fKJR"></tt>
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or . mil. Before sharing sensitive information, make sure you’re on a federal government site. VSports app下载.

Https

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely V体育官网. .

Comparative Study
. 2007 Dec;17(12):1919-31.
doi: 10.1101/gr.7090407. Epub 2007 Nov 7.

Reliable prediction of regulator targets using 12 Drosophila genomes

Affiliations
Comparative Study

Reliable prediction of regulator targets using 12 Drosophila genomes (VSports)

Pouya Kheradpour et al. Genome Res. 2007 Dec.

Abstract

Gene expression is regulated pre- and post-transcriptionally via cis-regulatory DNA and RNA motifs. Identification of individual functional instances of such motifs in genome sequences is a major goal for inferring regulatory networks yet has been hampered due to the motifs' short lengths that lead to many chance matches and poor signal-to-noise ratios. In this paper, we develop a general methodology for the comparative identification of functional motif instances across many related species, using a phylogenetic framework that accounts for the evolutionary relationships between species, allows for motif movements, and is robust against missing data due to artifacts in sequencing, assembly, or alignment. We also provide a robust statistical framework for evaluating motif confidence, which enables us to translate evolutionary conservation into a confidence measure for each motif instance, correcting for varying motif length, composition, and background conservation of the target regions. We predict targets of fly transcription factors and miRNAs in alignments of 12 recently sequenced Drosophila species VSports手机版. When compared to extensive genome-wide experimental data, predicted targets are of high quality, matching and surpassing ChIP-chip microarrays and recovering miRNA targets with high sensitivity. The resulting regulatory network suggests significant redundancy between pre- and post-transcriptional regulation of gene expression. .

PubMed Disclaimer

Figures (VSports app下载)

Figure 1.
Figure 1.
BLS measure (Branch Length Score) for assessing motif conservation in many genomes. (A) Conservation level and corresponding BLS scores for two Mef-2 motif instances. The BLS measure scores the total branch length of the subtree connecting the species with motif instances, as a fraction of the total branch length of all twelve species. As shown in these examples (Mef-2 motif: YTAWWWWTAR), BLS accounts for local alignment inaccuracies, gaps, motif movement, and motif loss. Species abbreviations as follows: Drosophila melanogaster (D. mel.), D. simulans (D. sim.), D. sechellia (D. sec.), D. yakuba (D. yak.), D. erecta (D. ere.), D. ananassae (D. ana.), D. pseudoobscura (D. pse.), D. persimilis (D. per.), D. willistonii (D. will.), D. mojavensis (D. moj.), D. virilis (D. vir.), and D. grimshawii (D. gri.). (B) BLS scores for different instance conservation scenarios. Given the pattern of presence (black) and absence (white) within a phylogenetic tree, BLS evaluates the total branch length of the subtree connecting the species that contain the motif: When all species are present, BLS is 100% (column A); different sets of species lead to different BLS scores based on their evolutionary distances: distantly related species lead to higher scores as they span larger evolutionary distances (columns B,C); species that are very closely related to each other lead to only small incremental contributions, due to their phylogenetic redundancy (columns D,E); sequencing, assembly, and alignment artifacts are not penalized, such as those stemming from lower-coverage genomes, as redundancy of branches between close species complements BLS (column F). Information about sequence coverage is from Drosophila 12 Genomes Consortium (2007) and Richards et al. (2005).
Figure 2.
Figure 2.
High-confidence recovery of individual motif instances. (A) Mapping BLS scores to confidence values. Recovery of conserved motif instances the transcriptional repressor Snail (CAGGTG) in promoter regions (2-kb regions upstream of transcription start sites), and the K-box miRNA (CTGTGAT) in 3′ UTRs, at different BLS cutoffs (X-axis). Instances of shuffled control motifs (gray area) decrease much more rapidly than instances of real motifs (height of black curve), leading to a large fraction of motif instances conserved above background (black area). The motif-confidence score (red line) is calculated as the fraction of conserved instances above background. Random motifs are selected to have equal frequency as real motifs at BLS = 0. (B,C) Increasing confidence values select functional motif instances. With increasing confidence cutoffs (X-axis), transcription factor (TF) motif instances fall increasingly in promoter regions (light blue), 5′ UTRs (red), and introns (green), at the exclusion of 3′ UTRs (dark blue) and coding regions (yellow). In contrast, miRNA motif instances fall increasingly into 3′ UTRs to the exclusion of promoters and other regions. Relative size of regions is normalized at BLS = 0. (D) miRNA motif instances at increasing confidence cutoffs are increasingly on the transcribed strand of 3′ UTRs (black curve), while no such trend is seen for TF motifs (gray). Curves are truncated when <10 instances reach the respective confidence.
Figure 3.
Figure 3.
Discovery power for motif instance prediction. (A) Effect of tolerated motif movement. Number of recovered motif instances at 60% confidence for TF and miRNA motifs. (Left panel) For both TF motifs (gray: bicoid motif, VVVBTAATCC) and miRNA motifs (black: miR-iab-4 motif, GTATACG), instance recovery increases until an optimal window size (500 and 400 nucleotides, respectively) and then decreases for larger movements, suggesting that tolerating motif movements increases overall discovery power. (Right panel) Performance across all TF motifs (black) and all miRNA motifs (gray) shows improved recovery until windows of 300–500 nucleotides (for 60%–80% of motifs) but reduced performance for larger window sizes. Performance for individual examples (left panel) shows a sharper peak than the overall performance across all motifs (right panel), as different window sizes are optimal for different motifs. (B) BLS measure leads to increased sensitivity. Number of motif instances recovered (Y-axis) at each confidence value (X-axis) for transcription factor (TF) motifs (left panel) and miRNA motifs (right panel). The BLS measure applied to the 12 fly genomes (blue) recovers more motif instances at each confidence, as compared to approaches requiring motif presence in all compared species (“full” conservation), applied to the five melanogaster species (red), the pairwise comparison of D. melanogaster and D. pseudoobscura (yellow), or the nine Sophophora species (green). (C) Additional species lead to increased specificity. Two measures of discovery power for the BLS measure applied to the five melanogaster group species (green), a pairwise comparison of D. melanogaster and D. pseudoobscura (gray), the nine Sophophora species (black), and all 12 Drosophila species (red). (Left panel) More TF and miRNA motifs reach 60% confidence for increasing number of genomes at larger evolutionary distances. (Right panel) Increasing numbers of genomes at larger evolutionary distances also lead to increased signal-to-noise ratio, measured as the conservation level of real motifs vs. control motifs at the most stringent BLS cutoff.
Figure 4.
Figure 4.
Conserved motif instances identify functional in vivo targets. Functional in vivo targets were determined for Mef-2, Twist, and Snail using ChIP-chip (Sandmann et al. 2006, 2007; Zeitlinger et al. 2007), and direct transcriptional targets were determined for CrebA using various assays (Abrams and Andrew 2005). (A) Increasing confidence values show increased enrichment for in vivo sites. Fold enrichment in functional in vivo sites (Y-axis) for conserved motif instances at varying confidence values (X-axis). Hypergeometric P-values for max fold enrichments are 4 × 10−11 for Mef-2, 2 × 10−6 for Twist, 3 × 10−10 for Snail, and 1 × 10−7 for CrebA. Increasing confidence levels selected functional in vivo sites with increased enrichment for all four regulators, showing that high conservation selects for functional motif instances (X = 0% shows the enrichment in the absence of comparative information, i.e., without requiring conservation). Curves are truncated when motifs do not reach the respective confidence levels. (BD) High-sensitivity recovery of in vivo targets for TF and miRNA regulators. Fraction of motifs in bound regions recovered at 60% confidence (black bars), compared to the fraction expected given the overall conservation of the respective regions, as assessed by control motifs using the same BLS cutoff (gray; suggesting preferential conservation of the corresponding TF motif instances). (B) Recovery of ChIP-bound motifs, across all ChIP-bound regions (lableled “C”), and only those instances overlapping known enhancers (labeled “E”). Recovery rates show high sensitivity for TF motif instances, especially when these overlap enhancer regions. (C) Recovery of experimentally validated direct CrebA targets shows even higher sensitivity, likely due to the multiple lines of experimental evidence establishing them as direct targets. (D) miRNA recovery at 80% confidence is very high. (E) Nonconserved ChIP sites show reduced functional enrichments. Enrichment in promoter regions of muscle genes for motif instances of activators Twist and Mef-2, and depletion for motif instances of repressor Snail are reduced for ChIP-bound regions for which motif instances are not conserved, suggesting they may contain a higher fraction of nonfunctional sites. The enrichment/depletion is even weaker for ChIP-bound regions without motif instances (all enrichments are significant with P-values between 1.1 × 10−4 and 5.1 × 10−13 except those for Snail). (F) Conservation-inferred targets and ChIP-inferred targets show comparable functional enrichments. Conservation-inferred motif targets at 60% confidence (red; all P < 10−4) show higher muscle-gene enrichment/depletion than ChIP-inferred targets (black). Even outside ChIP-bound regions, conserved motifs show comparable enrichment and depletion (blue; all P < 5 × 10−3).
Figure 5.
Figure 5.
An initial regulatory network in Drosophila. Regulatory network with 46,525 connections between 83 TF and 57 miRNA motifs (circles) and their target genes (squares) at 60% confidence. If the regulator and its target are co-expressed in at least one tissue according to ImaGO (Tomancak et al. 2002), the corresponding edges (lines) and nodes (circles or squares) are colored red; otherwise they are gray. The high fraction of red edges (46%, P = 2 × 10−3) highlights the quality of the network. Nodes with gene names and connected by bold edges indicate examples of regulatory connections with evidence in the literature (see Supplemental Table S4).

References (V体育安卓版)

    1. Abrams E.W., Andrew D.J., Andrew D.J. CrebA regulates secretory activity in the Drosophila salivary gland and epidermis. Development. 2005;132:2743–2758. - PubMed
    1. Adryan B., Teichmann S.A., Teichmann S.A. FlyTF: A systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster. Bioinformatics. 2006;22:1532–1533. - PubMed
    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., Dolinski K., Dwight S.S., Eppig J.T., Dwight S.S., Eppig J.T., Eppig J.T., et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Bailey A.M., Posakony J.W., Posakony J.W. Suppressor of hairless directly activates transcription of enhancer of split complex genes in response to Notch receptor activity. Genes & Dev. 1995;9:2609–2622. - "V体育官网" PubMed
    1. Bergman C.M., Carlson J.W., Celniker S.E., Carlson J.W., Celniker S.E., Celniker S.E. Drosophila DNase I footprint database: A systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics. 2005;21:1747–1749. - PubMed

Publication types (VSports注册入口)

LinkOut - more resources