Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or . mil. Before sharing sensitive information, make sure you’re on a federal government site VSports app下载. .

Https

The site is secure V体育官网. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely. .

. 2010 Sep 23;6(9):e1001134.
doi: 10.1371/journal.pgen.1001134.

Orphan CpG islands identify numerous conserved promoters in the mammalian genome

Affiliations

"VSports最新版本" Orphan CpG islands identify numerous conserved promoters in the mammalian genome

VSports - Robert S Illingworth et al. PLoS Genet. .

V体育平台登录 - Abstract

CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are "orphans" that are not associated with annotated promoters VSports手机版. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development. .

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Typical CAP-seq profiles for human and mouse tissues.
(A, B) CAP-seq read density profiles (blue) for sperm, blood and cerebellum of human chr17: 43,061,000–43,596,500 (A) and mouse chr1: 136,095,000–136,630,500 (B). Genes (Refseq) are annotated below the CAP-seq profiles with those mapped to the positive and negative strand displayed above and below the chromosome (grey line) respectively. Non-promoter CGIs are denoted by asterisks. See also Figure S1.
Figure 2
Figure 2. Similar numbers of CGIs in humans and mice, but differing CpG densities.
(A) Numbers of human (pink) and mouse (black) CGIs identified by CAP-seq (CAP) and sequence based prediction (Predicted) . Total numbers are noted above each bar. (B) Histogram depicting the CpG observed/expected (o/e) values for all human (pink) and mouse (black) CGIs. Statistical significance (**) was determined using a Welch Two Sample t-Test. The human genome average CpG o/e value of 0.21 (broken red line) and the standard CGI prediction threshold minimum of 0.6 (broken black line) are indicated. (C) CAP-seq (blue) and CpG o/e (black; 400 bp window with a 10 bp slide) profiles for syntenic regions of human (chr16: 16,933,500–17,785,000) and mouse (chr3: 88,960,000–88,995,000) genomes. CGIs missed by standard sequence prediction parameters in mouse are indicated (red arrows). Sequence profiles are displayed as for Figure 1. (D) Categorisation of CGIs with respect to annotated genes (Refseq) in human and mouse. Categories indicated are human and mouse annotated transcription start site associated (h/m-ATSS), human and mouse intragenic (h/m-Intra) and human and mouse intergenic (h/m-Inter). (E) Box plots representing the relative CpG o/e values of CGIs at different genomic locations with respect to genes in human (pink) and mouse (black). Genome average and CGI prediction threshold CpG o/e values are indicated as in (B). CGIs distribution was categorised as either annotated transcription start site (ATSS), intragenic (Intra) or intergenic (Inter). Box plots represent the central 50% of the data (filled box), the median value (central bisecting line) and the whiskers (1.5× the inter-quartile range).
Figure 3
Figure 3. Trimethylated H3K4 is the signature chromatin mark at CGIs and is grossly proportional to CpG density.
(A, B) Sequence read profiles for H3K4me3 in ES cells (green) and CAP-seq in sperm (blue) are depicted for human chr17: 7,054,500–7,203,500 (A) and mouse chr10: 80,726,000–80,874,000 (B). (C) Bar plot indicating the percentage (displayed within each bar) of H3K4me3 positive CGIs in human (pink) and mouse (black). Categories of CGI position relative to genes are represented as for Figure 2E. (D) Box plots of H3K4me3 reads per base (averaged across 500 bp with a 100 bp slide) spanning 5 kb surrounding all mouse CGIs at the following CpG densities (CpGs per 100 bp): <5, 5–6, 6–7, 7–8, 8–9 and >9, in ascending order from the top. Box plots represent the distribution of the central 50% of the data (filled box) and the median (black bisecting line). The numbers of islands in each category (n) is noted in parenthesis. Figure S3 shows equivalent data for human CGIs. (E) Summary plot relating the CpG density of each bin to the mean H3K4me3 read value for the central 2 kb of regions displayed in (D). Plots illustrate the relationship for all CGIs (green) and orphan CGIs (blue).
Figure 4
Figure 4. Many orphan CGIs demarcate sites of transcriptional initiation with tissue-restricted activity.
(A) A heat map indicating the association of human orphan CGIs (n = 13,217) with predicted gene TSSs (Pred'; grey; data from USCS and Ensembl), RNAPII peaks in human ES cells (orange), nuclear run-on transcripts in human lung fibroblasts (NRO; grey; , transcripts detected by multiple tissue Cap Analysis of Gene Expression (CAGE; purple bars; [33]) and H3K4me3 peaks in human ES cells (green). The percentage of overlap is noted within the plot and the complete set of orphan CGIs which overlap a TSS by at least one of the above criteria is indicated (All; black). (B) Examples of orphan CGIs which co-localise with signatures of transcriptional initiation. Mapped sequence reads for Sperm CAP (blue), hES H3K4me3, hES RNAPII, NRO and CAGE are displayed for human chr19: 7,020,000–7,071,000 (left panel) and chr22: 41,721,500–41,819,500 (right panel). Sequence profiles are colour coded as in (A). (C) Heat map depicting the association of RNAPII with orphan CGIs (n = 11,089) in mouse ES cells and brain. Orphan CGIs associated with RNAPII only in ES cells or only in brain are indicated (red and blue lines respectively) and the total percentage expressed is indicated (dashed arrow). (D) Profiles for sperm CAP (blue) and ES cell and brain RNAPII (orange) are depicted for mouse chr10: 62,302,000–62,435,000 showing two orphan CGIs (asterisks) which are differentially associated with RNAPII in mouse ES cells and brain.
Figure 5
Figure 5. Reciprocal screening identifies inter-species conservation of CGI methylation even at sites distal to annotated promoters.
(A) CAP- (blue) and MAP-seq (red) profile for human chr19: 19,218,000–19,264,000. (B) Bar plot representing the percentage of methylated CGIs at different genomic locations for human (pink) and mouse (black). Categories are displayed as in Figure 2E and individual percentages for each are noted within the plot. (C) Preferential methylation at CGIs whose location is evolutionarily conserved between humans and mice. Bar plot depicting the percentage of mouse CGIs which are somatically methylated (All) compared with the percentage of CGIs with identifiable human orthologues (conserved). The percentage of methylation (indicated within the plot) is displayed for CGIs associated with annotated transcriptional start sites (ATSS; black) and orphan CGIs associated with gene bodies (Intragenic; grey). (D, E) Example of conserved orphan CGI methylation in the HOXA locus. CAP- (blue) and MAP-seq (red) profiles spanning the first three genes in the HOXA locus in human (D) and mouse (E). Regions displayed are human chr7: 27,098,000–27,128,000 and mouse chr6: 52,104,000–52,130,000.
Figure 6
Figure 6. Somatic methylation is incompatible with H3K4me3 and RNAPII occupancy at orphan CGIs.
(A) Composite box plots showing sequence read density for brain RNAPII (orange), brain H3K4me3 (green), cerebellum MAP (red) and cerebellum CAP (blue) in mouse. Plotted as for Figure 3D. (B) Sequence profiles of CAP, MAP, H3K4me3 and RNAPII for mouse (chr11: 104,982,000–105,056,000) in ES cells and sperm (left panel) and brain and cerebellum (right panel) depict the loss of RNAPII and H3K4me3 associated with a gain of DNA methylation in cerebellum. Sequence profiles are colour coded as for (A).
Figure 7
Figure 7. Distinct characteristics of normal and tumour-specific CGI methylation.
(A) MAP-seq profiles (red) for five colon mucosa (C3, C5, C6, C9 and C10) and five matched colorectal tumour (T3, T5, T6, T9 and T10) biopsy samples corresponding to human chr13: 78,052,000–78,123,500. CGIs (blue bars) and sites of hES H3K27 trimethylation (hES H3K27me3; black bars; are represented). See also Figure S7. (B) Bar plot representing the percentage of autosomal CGI methylation in colon (red bars), colon and tumour (dark grey) and tumour only (light grey) relative to gene position (categorised as for Figure 2E). (C) Bar plot indicating the percentage of all autosomal (blue) and methylated autosomal CGIs (red) which co-localise with domains of H3K27me3 in human ES cells. Percentages and number of CGIs (n) are displayed within the plot. Tumour specific denotes CGIs that are methylated in at least three colorectal carcinoma samples, but not in any of the normal colon samples.

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - "V体育2025版" PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. The sequence of the human genome. Science. 2001;291:1304–1351. - "V体育2025版" PubMed
    1. Cross SH, Charlton JA, Nan X, Bird AP. Purification of CpG islands using a methylated DNA binding column. Nat Genet. 1994;6:236–244. - PubMed
    1. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007;39:457–466. - PubMed
    1. Bird AP. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8:1499–1504. - PMC - PubMed

Publication types

MeSH terms