. 2019 Jun 13;177(7):1888-1902.e21.

doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.

Comprehensive Integration of Single-Cell Data

Affiliations

¹ New York Genome Center, New York, NY, USA.
² New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA.
³ Technology Innovation Lab, New York Genome Center, New York, NY, USA.
⁴ New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA. Electronic address: rsatija@qiuluzeuv.cn.

PMID: 31178118
PMCID: PMC6687398
DOI: 10.1016/j.cell.2019.05.031

Comprehensive Integration of Single-Cell Data

"VSports注册入口" Tim Stuart et al. Cell. 2019.

. 2019 Jun 13;177(7):1888-1902.e21.

doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.

Authors (V体育2025版)

Affiliations

¹ New York Genome Center, New York, NY, USA.
² New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA.
³ Technology Innovation Lab, New York Genome Center, New York, NY, USA.
⁴ New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA. Electronic address: rsatija@qiuluzeuv.cn.

PMID: 31178118
PMCID: PMC6687398
DOI: "VSports在线直播" 10.1016/j.cell.2019.05.031

Abstract

Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns VSports手机版. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets. .

Keywords: integration; multi-modal; scATAC-seq; scRNA-seq; single cell; single-cell ATAC sequencing; single-cell RNA sequencing V体育安卓版. .

PubMed Disclaimer

Conflict of interest statement (VSports最新版本)

Declaration of interests

The authors declare no competing interests.

Figures

**Figure 1.. Schematic overview of reference “assembly” integration in Seurat v3**
**(A)** Representation of two datasets, reference and query, each of which originates from a separate single-cell experiment. The two datasets share cells from similar biological states, but the query dataset contains a unique population (in black). **(B)** We perform canonical correlation analysis, followed by L2-normalization of the canonical correlation vectors, to project the datasets into a subspace defined by shared correlation structure across datasets. **(C)** In the shared space, we identify pairs of mutual nearest neighbors across reference and query cells. These should represent cells in a shared biological state across datasets (grey lines), and serve as “anchors” to guide dataset integration. In principle, cells in unique populations should not participate in anchors, but in practice we observe “incorrect” anchors at low frequency (red lines). **(D)** For each anchor pair, we assign a score based on the consistency of anchors across the neighborhood structure of each dataset. **(E)** We utilize anchors and their scores to compute “correction” vectors for each query cell, transforming its expression so it can be jointly analyzed as part of an integrated reference.

**Figure 2.. Comparison of multi-dataset integration methods for scRNA-seq**
**(A-H)** UMAP plots of eight pancreatic islet cell datasets colored by dataset (A-D) and by cell type (E-H) after integration with Seurat v3, Seurat v2, mnnCorrect, and Scanorama. To challenge the methods’ robustness to non-overlapping populations, a single cell type was withheld from each dataset prior to integration. **(I-J)** Distribution of anchor scores and counts, separated by incorrect (different cell types in the anchor pair) and correct (same cell type in the anchor pair) anchors. Anchors are from the analysis in Figure S1A. **(K-L)** Metrics for evaluating integration performance across the four methods on two main properties: cell “mixing” across datasets and the preservation of within-dataset local structure (STAR Methods).

**Figure 3.. Transferring cell state classifications across datasets**
**(A)** Schematic representation where identified anchors allow for the transfer of discrete labels between a reference and query dataset. **(B)** Confusion matrix for one cell type hold-out evaluation where pancreatic alpha cells were removed from the reference. Cell types with fewer than two cells in the query not shown. Alpha cells in the query consistently receive the lowest classification score, and are labeled as “Unassigned”. **(C)** Classification benchmarking on 166 test/training datasets from human pancreatic islets and mouse retina. (D) Distribution of prediction scores for one cell type hold-out experiment (as in B). Mis-classification calls are associated with lower prediction scores. (E) Joint visualization of scRNA-seq data with classified scATAC-seq cells (left). We identified anchors between scRNA-seq data (reference) and a gene activity matrix derived from scATAC-seq (query) datasets from the mouse visual cortex, and transferred class annotations (right). **(F)** We created pseudo-bulk ATAC-seq profiles by pooling together cells with for each cell type. Each cell type showed enriched accessibility near canonical marker genes. Chromatin accessibility tracks are normalized to sequencing depth (RPKM normalization) in each pooled group. Y-axes for each track ranged from 0 to different maxima, due to inherent differences in the maximum read depth at different loci. For each locus, the y-axis maximum shown is: *Neurod6* 1,500; *Gad2, Pvalb, Sst, Vip, Lamp5,* and *Id2* 1,000; *Lhx6* 600. **(G)** We searched for overrepresented DNA motifs present in PV-specific accessibility peaks, and identified the *Mef2c* and *Rora* motifs as the most highly enriched motifs (p < 10⁻²² and p < 10⁻⁹). **(H)** Both *Mef2c* and *Rora* also exhibit upregulated expression in PV interneurons from scRNA-seq.

**Figure 4.. Imputing immunophenotypes in a transcriptomic atlas of the human bone marrow**
**(A)** Cross-validations for immunophenotype imputation, performed using a CITE-seq dataset of 35,543 bone marrow cells and 25 surface proteins. **(B)** Prediction accuracy as a function of the number of transcriptomic features used to determine anchors. **(C)** We integrated 274,932 bone marrow cells produced by the Human Cell Atlas and annotated the cell types. Using the CITE-seq bone marrow cells, we predicted protein expression levels in the integrated HCA dataset, and observed expression patterns consistent with the known cell types. **(D)** Predicted CD8+ CD69+ cells up-regulate a module of inflammatory cytokines and chemokines across all eight donors. Shown are averaged RNA expression values for each human donor. **(E)** We validated CD69+ marker genes identified in the scRNA-seq data by performing bulk RNA-seq on FACS-isolated CD8+ CD69+/− cells, which revealed a similar set of deferentially expressed genes. **(F)** We ordered CD8+ memory cells by their CD69 expression in the HCA and CITE-seq datasets, and computed the autocorrelation for each gene along this CD69 axis (Moran’s I). CD69+ marker genes consistently showed a higher Moran’s I value in the HCA dataset, reflecting the increased statistical power accompanying an order-of-magnitude greater cell number.

**Figure 5.. Spatial patterns of gene expression in the mouse brain**
**(A)** Schematic representation of data transfer between scRNA-seq and STARmap datasets. After identifying anchors using the subset of genes measured in both experiments, we subsequently transfer sequencing data to the STARmap cells, predicting new spatial expression patterns. **(B)** Leave-one-out cross validation for 8 genes, exhibiting predicted expression patterns, and original STARmap measurements. **(C)** Gene expression patterns for *Rorb, Syt6, Lamp5* and *Sox10,* as measured by osmFISH, a highly sensitive single molecular assay [Codeluppi et al., 2018], in the mouse somatosensory cortex. **(D)** Predicted expression patterns for four genes not originally profiled by STARmap, with external validation in Supplementary File 2. **(E)** Correlation between Moran’s I value, a measure of spatial autocorrelation, for each predicted gene expression pattern in two STARmap replicates. Marker genes for VLMC cells, endothelial cells, and perivascular macrophages are highlighted, reflecting rare cell subsets that were spatially restricted in only one replicate. **(F)** Horizontally-compressed STARmap cells with predicted cell type transferred from the SMART-seq2 dataset. **(G)** Expression of cell type marker genes in each predicted STARmap cell type (both replicates combined).

See this image and copyright information in PMC

Comment in

Integration of Single-Cell Genomics Datasets. (VSports手机版)
Adey AC. Adey AC. Cell. 2019 Jun 13;177(7):1677-1679. doi: 10.1016/j.cell.2019.05.034. Cell. 2019. PMID: 31199914

"V体育安卓版" References

1. Achim K, Pettit J-B, Saraiva LR, Gavriouchkina D, Larsson T, Arendt D, Marioni JC, 2015. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nature Biotechnology 33 (5), 503–509. URL http://www.nature.com/doifinder/10.1038/nbt.3209 - DOI - PubMed
1. Alemany A, Florescu M, Baron CS, Peterson-Maduro J, Van Oudenaarden A, 2018. Whole-organism clone tracing using single-cell sequencing. Nature 556 (7699), 108–112. URL 10.1038/nature25969 - DOI - PubMed
1. Arya S, Mount D, Kemp SE, Jefferis G, 2018. RANN: Fast Nearest Neighbour Search (Wraps ANN Library) Using L2 Metric . R package version 2.6. URL https://CRAN.R-project.org/package=RANN
1. Baglama J, Reichel L, Lewis BW, 2018. irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices. R package version 2.3.2. URL https://CRAN.R-project.org/package=irlba
1. Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, Ryu JH, Wagner BK, Shen-Orr SS, Klein AM, Melton DA, Yanai I, October 2016. A Single-Cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst 3 (4), 346–360.e4. - "V体育安卓版" PMC - PubMed

Publication types

V体育2025版 - Actions
Actions
Actions

"VSports注册入口" MeSH terms

Actions (V体育官网入口)
V体育安卓版 - Actions
"VSports在线直播" Actions
VSports注册入口 - Actions
Actions (VSports最新版本)
Actions
Actions

Grants and funding

R21 HG009748/HG/NHGRI NIH HHS/United States

"VSports app下载" LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- "VSports" NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program (VSports)

Save citation to file

Email citation (V体育安卓版)

Add to Collections

"VSports" Add to My Bibliography

"VSports注册入口" Your saved search

Create a file for external citation management software

Your RSS Feed

Comprehensive Integration of Single-Cell Data

Affiliations

Comprehensive Integration of Single-Cell Data

Authors (V体育2025版)

Affiliations

Abstract

Conflict of interest statement (VSports最新版本)

Figures

Comment in

"V体育安卓版" References

Publication types

"VSports注册入口" MeSH terms

Grants and funding

"VSports app下载" LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials