In [20]:
library(Seurat)
library(cowplot)
library(harmony)

Seurat V3 Interface

This brief vignette demonstrates how to use Harmony with Seurat V3. This example closely follows the Seurat vignette: https://satijalab.org/seurat/v3.0/immune_alignment.html

To begin, download the sparse gene count matrices and move them somewhere convenient (e.g. data/).

In [2]:
load('data/pbmc_stim.RData')

Initialize Seurat Object

Before running Harmony, make a Seurat object and following the standard pipeline through PCA.

IMPORTANT DIFFERENCE: In the Seurat integration tutorial, you need to define a Seurat object for each dataset. With Harmony integration, create only one Seurat object with all cells.

In [88]:
pbmc <- CreateSeuratObject(counts = cbind(stim.sparse, ctrl.sparse), project = "PBMC", min.cells = 5) %>%
    Seurat::NormalizeData(verbose = FALSE) %>%
    FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% 
    ScaleData(verbose = FALSE) %>% 
    RunPCA(pc.genes = pbmc@var.genes, npcs = 20, verbose = FALSE)

Make sure that the dataset ID is in the object's metadata. Here, we define datasets with the variable stim.

In [88]:
pbmc@meta.data$stim <- c(rep("STIM", ncol(stim.sparse)), rep("CTRL", ncol(ctrl.sparse)))

There is a clear difference between the datasets in the uncorrected PCs

In [90]:
options(repr.plot.height = 5, repr.plot.width = 12)
p1 <- DimPlot(object = pbmc, reduction = "pca", pt.size = .1, group.by = "stim", do.return = TRUE)
p2 <- VlnPlot(object = pbmc, features = "PC_1", group.by = "stim", do.return = TRUE, pt.size = .1)
plot_grid(p1,p2)

Run Harmony

The simplest way to run Harmony is to pass the Seurat object and specify which variable(s) to integrate out. RunHarmony returns a Seurat object, updated with the corrected Harmony coordinates. Let's set plot_convergence to TRUE, so we can make sure that the Harmony objective function gets better with each round.

In [100]:
options(repr.plot.height = 2.5, repr.plot.width = 6)
pbmc <- pbmc %>% 
    RunHarmony("stim", plot_convergence = TRUE)
Harmony 1/10
Harmony 2/10
Harmony 3/10
Harmony 4/10
Harmony 5/10
Harmony 6/10
Harmony 7/10
Harmony 8/10
Harmony converged after 8 iterations

To directly access the new Harmony embeddings, use the Embeddings command.

In [101]:
harmony_embeddings <- Embeddings(pbmc, 'harmony')
harmony_embeddings[1:5, 1:5]
  1. 25894
  2. 20
A matrix: 5 × 5 of type dbl
harmony_1harmony_2harmony_3harmony_4harmony_5
AAACATACCAAGCT-1 3.959337-0.9887445-4.9379311-6.118842-1.8354619
AAACATACCCCTAC-1-1.631977-0.9337078 7.7300758-8.842865-2.9982345
AAACATACCCGTAA-1 5.798815 1.6570520 0.6685604 1.148935-0.2856785
AAACATACCCTCGT-1 2.838795-1.2805793 7.2660363-1.711304 1.0466651
AAACATACGAGGTG-1 6.783708-0.2450474 0.4836541 2.962785-0.5428238

Let's make sure that the datasets are well integrated in the first 2 dimensions after Harmony.

In [108]:
options(repr.plot.height = 5, repr.plot.width = 12)
p1 <- DimPlot(object = pbmc, reduction = "harmony", pt.size = .1, group.by = "stim", do.return = TRUE)
p2 <- VlnPlot(object = pbmc, features = "harmony_1", group.by = "stim", do.return = TRUE, pt.size = .1)
plot_grid(p1,p2)

Downstream analysis

Many downstream analyses are performed on low dimensional embeddings, not gene expression. To use the corrected Harmony embeddings rather than PCs, set reduction = 'harmony'. For example, let's perform the UMAP and Nearest Neighbor analyses using the Harmony embeddings.

In [110]:
pbmc <- pbmc %>% 
    RunUMAP(reduction = "harmony", dims = 1:20) %>% 
    FindNeighbors(reduction = "harmony", dims = 1:20) %>% 
    FindClusters(resolution = 0.5) %>% 
    identity()
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 25894
Number of edges: 854286

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9057
Number of communities: 14
Elapsed time: 5 seconds

In the UMAP embedding, we can see more intricate structure. Since we used harmony embeddings, the UMAP embeddings are well mixed.

In [122]:
options(repr.plot.height = 4, repr.plot.width = 10)
DimPlot(pbmc, reduction = "umap", group.by = "stim", pt.size = .1, split.by = 'stim')

In this well mixed embedding, we can start identifying shared cell types with clustering analysis.

In [124]:
options(repr.plot.height = 4, repr.plot.width = 6)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = .1)