Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or . mil. Before sharing sensitive information, make sure you’re on a federal government site VSports app下载. .

Https

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely V体育官网. .

. 2005 Mar 14;33(5):1544-52.
doi: 10.1093/nar/gki296. Print 2005.

Integration of text- and data-mining using ontologies successfully selects disease gene candidates

Affiliations

Integration of text- and data-mining using ontologies successfully selects disease gene candidates

Nicki Tiffin et al. Nucleic Acids Res. .

VSports - Abstract

Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63. 3% (+/-18. 8%) of its original size VSports手机版. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues. .

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schema outlining the method used to identify candidate disease genes.
Figure 2
Figure 2
Success of finding the disease gene using mismatched terms m = 1, number of terms n = 4 is scored as successful = 1 and unsuccessful = 0 (closed triangle). Randomly selected genes were assigned to each disease name and their presence detected in the selected gene set. Detection rate is shown per disease, for 10 runs with random gene assignment (open circle). Likelihood of the known disease gene being selected by chance alone was calculated as the probability of the disease gene falling into the candidate gene set according to the size of the candidate disease gene set (open square).
Figure 3
Figure 3
Reduction in size of the candidate gene set size using mismatch m = 1 and number of terms n = 4. Final set size is shown as a percentage of starting set size. Average and SD values are shown. *Known disease gene is not present in the selected candidate gene set.
Figure 4
Figure 4
Frequency of annotation per ‘eVOC Anatomy Term’ in the EnsMart database. The frequency of RefSeq genes at each node was the sum of all annotated RefSeq genes at the node and descendants of the node, compared to total number of annotated RefSeq genes.
Figure 5
Figure 5
Schema showing the relationship between the number of candidate genes selected (Candidate genes), number of terms associated with the disease name (Associated terms), number of terms used to annotate the disease gene (Annotated terms) and number of terms associated with disease name and also used to annotate the disease gene (Common terms). *Known disease gene is not present in the candidate gene set.

References

    1. Pritchard J.K., Cox N.J. The allelic architecture of human disease genes: common disease-common variant… or not? Hum. Mol. Genet. 2002;11:2417–2423. - PubMed (V体育平台登录)
    1. Hoh J., Ott J. Genetic dissection of diseases: design and methods. Curr. Opin. Genet. Dev. 2004;14:229–232. - PubMed
    1. Glazier A.M., Nadeau J.H., Aitman T.J. Finding genes that underlie complex traits. Science. 2002;298:2345–2349. - PubMed
    1. Tabor H.K., Risch N.J., Myers R.M. Opinion: candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Rev. Genet. 2002;3:391–397. - PubMed
    1. Risch N.J. Searching for genetic determinants in the new millennium. Nature. 2000;405:847–856. - "VSports注册入口" PubMed

Publication types

"V体育官网" LinkOut - more resources