library(Seurat)
library(cowplot)
library(harmony)

Seurat V3 Interface¶

This brief vignette demonstrates how to use Harmony with Seurat V3. This example closely follows the Seurat vignette: https://satijalab.org/seurat/v3.0/immune_alignment.html

To begin, download the sparse gene count matrices and move them somewhere convenient (e.g. data/).

load('data/pbmc_stim.RData')

Initialize Seurat Object¶

Before running Harmony, make a Seurat object and following the standard pipeline through PCA.

IMPORTANT DIFFERENCE: In the Seurat integration tutorial, you need to define a Seurat object for each dataset. With Harmony integration, create only one Seurat object with all cells.

pbmc <- CreateSeuratObject(counts = cbind(stim.sparse, ctrl.sparse), project = "PBMC", min.cells = 5) %>%
    Seurat::NormalizeData(verbose = FALSE) %>%
    FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% 
    ScaleData(verbose = FALSE) %>% 
    RunPCA(pc.genes = pbmc@var.genes, npcs = 20, verbose = FALSE)

Make sure that the dataset ID is in the object's metadata. Here, we define datasets with the variable stim.

pbmc@meta.data$stim <- c(rep("STIM", ncol(stim.sparse)), rep("CTRL", ncol(ctrl.sparse)))

There is a clear difference between the datasets in the uncorrected PCs

options(repr.plot.height = 5, repr.plot.width = 12)
p1 <- DimPlot(object = pbmc, reduction = "pca", pt.size = .1, group.by = "stim", do.return = TRUE)
p2 <- VlnPlot(object = pbmc, features = "PC_1", group.by = "stim", do.return = TRUE, pt.size = .1)
plot_grid(p1,p2)

Run Harmony¶

The simplest way to run Harmony is to pass the Seurat object and specify which variable(s) to integrate out. RunHarmony returns a Seurat object, updated with the corrected Harmony coordinates. Let's set plot_convergence to TRUE, so we can make sure that the Harmony objective function gets better with each round.

options(repr.plot.height = 2.5, repr.plot.width = 6)
pbmc <- pbmc %>% 
    RunHarmony("stim", plot_convergence = TRUE)

Harmony 1/10
Harmony 2/10
Harmony 3/10
Harmony 4/10
Harmony 5/10
Harmony 6/10
Harmony 7/10
Harmony 8/10
Harmony converged after 8 iterations

To directly access the new Harmony embeddings, use the Embeddings command.

harmony_embeddings <- Embeddings(pbmc, 'harmony')
harmony_embeddings[1:5, 1:5]

Let's make sure that the datasets are well integrated in the first 2 dimensions after Harmony.

options(repr.plot.height = 5, repr.plot.width = 12)
p1 <- DimPlot(object = pbmc, reduction = "harmony", pt.size = .1, group.by = "stim", do.return = TRUE)
p2 <- VlnPlot(object = pbmc, features = "harmony_1", group.by = "stim", do.return = TRUE, pt.size = .1)
plot_grid(p1,p2)

Downstream analysis¶

Many downstream analyses are performed on low dimensional embeddings, not gene expression. To use the corrected Harmony embeddings rather than PCs, set reduction = 'harmony'. For example, let's perform the UMAP and Nearest Neighbor analyses using the Harmony embeddings.

pbmc <- pbmc %>% 
    RunUMAP(reduction = "harmony", dims = 1:20) %>% 
    FindNeighbors(reduction = "harmony", dims = 1:20) %>% 
    FindClusters(resolution = 0.5) %>% 
    identity()

Computing nearest neighbor graph
Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 25894
Number of edges: 854286

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9057
Number of communities: 14
Elapsed time: 5 seconds

In the UMAP embedding, we can see more intricate structure. Since we used harmony embeddings, the UMAP embeddings are well mixed.

options(repr.plot.height = 4, repr.plot.width = 10)
DimPlot(pbmc, reduction = "umap", group.by = "stim", pt.size = .1, split.by = 'stim')

In this well mixed embedding, we can start identifying shared cell types with clustering analysis.

options(repr.plot.height = 4, repr.plot.width = 6)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = .1)

	harmony_1	harmony_2	harmony_3	harmony_4	harmony_5
AAACATACCAAGCT-1	3.959337	-0.9887445	-4.9379311	-6.118842	-1.8354619
AAACATACCCCTAC-1	-1.631977	-0.9337078	7.7300758	-8.842865	-2.9982345
AAACATACCCGTAA-1	5.798815	1.6570520	0.6685604	1.148935	-0.2856785
AAACATACCCTCGT-1	2.838795	-1.2805793	7.2660363	-1.711304	1.0466651
AAACATACGAGGTG-1	6.783708	-0.2450474	0.4836541	2.962785	-0.5428238