suppressPackageStartupMessages({
library(Seurat)
library(magrittr)
library(harmony)
})
This brief vignette demonstrates how to use Harmony with Seurat V2. This example closely follows the Seurat vignette: https://satijalab.org/seurat/v2.4/immune_alignment.html
To begin, download the sparse gene count matrices and move them somewhere convenient (e.g. data/).
load('data/pbmc_stim.RData')
Before running Harmony, make a Seurat object and following the standard pipeline through PCA.
IMPORTANT DIFFERENCE: In the Seurat integration tutorial, you need to define a Seurat object for each dataset. With Harmony integration, create only one Seurat object with all cells.
pbmc <- CreateSeuratObject(raw.data = cbind(stim.sparse, ctrl.sparse), project = "PBMC", min.cells = 5)
pbmc@meta.data$stim <- c(rep("STIM", ncol(stim.sparse)), rep("CTRL", ncol(ctrl.sparse)))
pbmc %<>% Seurat::NormalizeData()
pbmc@var.genes <- split(row.names(pbmc@meta.data), pbmc@meta.data$stim) %>% lapply(function(cells_use) {
temp_data <- SubsetData(pbmc, cells_use)
temp_data %<>% FindVariableGenes(do.plot = FALSE)
head(rownames(temp_data@hvg.info), 1000)
}) %>% unlist %>% unique
pbmc %<>% ScaleData()
pbmc %<>% RunPCA(pc.genes = pbmc@var.genes, pcs.compute = 20, do.print = TRUE, pcs.print = 1:5, genes.print = 5)
The RunHarmony function will integrate datasets defined by the "stim" column in the Seurat Object's meta.data
pbmc@meta.data
If you want to integrate on another variable, just add that column to the meta.data
pbmc@meta.data$my_new_var <- my_new_var_vector
The plot below shows the convergence of the Harmony objective function over each iteration of Harmony.
Each point is a single cluster step.
Different colors represent different Harmony iterations.
options(repr.plot.height = 3, repr.plot.width = 6)
system.time(pbmc %<>% RunHarmony("stim", theta = 2, plot_convergence = TRUE))
After Harmony integration, cells align along the Harmonized PCs. If they did not, you could increase the theta value above and rerun the analysis.
options(repr.plot.height = 6, repr.plot.width = 12)
p1 <- DimPlot(object = pbmc, reduction.use = "harmony", pt.size = .1, group.by = "stim", do.return = T)
p2 <- VlnPlot(object = pbmc, features.plot = "Harmony1", group.by = "stim", do.return = TRUE)
plot_grid(p1,p2)
Genes correlated with the Harmonized PCs
PrintDim(object = pbmc, reduction.type = "harmony", dims.print = 1:5, genes.print = 10)
DimHeatmap(object = pbmc, reduction.type = "harmony", cells.use = 500, dim.use = 1:6, do.balanced = TRUE)
Similar to the linked vigenette, the harmony results can be used for integrated analysis. You simply replace the CCA dimensional reduction with Harmony.
system.time({
pbmc %<>% RunTSNE(reduction.use = "harmony", dims.use = 1:20, do.fast = T)
pbmc %<>% FindClusters(reduction.type = "harmony", resolution = 0.6, dims.use = 1:20, force.recalc = TRUE, print.output = FALSE)
})
p3 <- TSNEPlot(pbmc, do.return = T, pt.size = 0.5, group.by = "stim")
p4 <- TSNEPlot(pbmc, do.label = T, do.return = T, pt.size = 0.5)
plot_grid(p3, p4)
options(repr.plot.height = 10, repr.plot.width = 12)
FeaturePlot(object = pbmc, features.plot = c("CD3D", "SELL", "CREM", "CD8A", "GNLY", "CD79A", "FCGR3A", "CCL2", "PPBP"),
min.cutoff = "q9", cols.use = c("lightgrey", "blue"), pt.size = 0.5)