In [ ]:
suppressPackageStartupMessages({
    library(Seurat)
    library(magrittr)
    library(harmony)
})

Seurat V2 Interface

This brief vignette demonstrates how to use Harmony with Seurat V2. This example closely follows the Seurat vignette: https://satijalab.org/seurat/v2.4/immune_alignment.html

To begin, download the sparse gene count matrices and move them somewhere convenient (e.g. data/).

In [ ]:
load('data/pbmc_stim.RData')

Initialize Seurat Object

Before running Harmony, make a Seurat object and following the standard pipeline through PCA.

IMPORTANT DIFFERENCE: In the Seurat integration tutorial, you need to define a Seurat object for each dataset. With Harmony integration, create only one Seurat object with all cells.

In [4]:
pbmc <- CreateSeuratObject(raw.data = cbind(stim.sparse, ctrl.sparse), project = "PBMC", min.cells = 5)
pbmc@meta.data$stim <- c(rep("STIM", ncol(stim.sparse)), rep("CTRL", ncol(ctrl.sparse)))

pbmc %<>% Seurat::NormalizeData()
pbmc@var.genes <- split(row.names(pbmc@meta.data), pbmc@meta.data$stim) %>% lapply(function(cells_use) {
    temp_data <- SubsetData(pbmc, cells_use)
    temp_data %<>% FindVariableGenes(do.plot = FALSE)
    head(rownames(temp_data@hvg.info), 1000)
}) %>% unlist %>% unique
pbmc %<>% ScaleData()
pbmc %<>% RunPCA(pc.genes = pbmc@var.genes, pcs.compute = 20, do.print = TRUE, pcs.print = 1:5, genes.print = 5)
Scaling data matrix
[1] "PC1"
[1] "RPL21"  "RPS6"   "RPL3"   "RPL13A" "RPL7"  
[1] ""
[1] "TYROBP"   "C15orf48" "FCER1G"   "SOD2"     "CST3"    
[1] ""
[1] ""
[1] "PC2"
[1] "IL8"      "CD14"     "MARCKSL1" "S100A8"   "GPX1"    
[1] ""
[1] "NKG7" "GNLY" "GZMB" "PRF1" "CCL5"
[1] ""
[1] ""
[1] "PC3"
[1] "ISG15" "IFIT3" "IFIT1" "ISG20" "IFIT2"
[1] ""
[1] "NKG7" "GZMB" "GNLY" "IL8"  "CCL5"
[1] ""
[1] ""
[1] "PC4"
[1] "B2M"    "HLA-B"  "HLA-C"  "TMSB4X" "HLA-A" 
[1] ""
[1] "HBA1"  "HBA2"  "HBB"   "ALAS2" "SNCA" 
[1] ""
[1] ""
[1] "PC5"
[1] "CD79A"    "CD74"     "HLA-DQA1" "CD83"     "HLA-DPA1"
[1] ""
[1] "GIMAP7" "CD3D"   "ANXA1"  "GIMAP5" "GIMAP4"
[1] ""
[1] ""

Perform an integrated analysis

The RunHarmony function will integrate datasets defined by the "stim" column in the Seurat Object's meta.data 

    pbmc@meta.data

If you want to integrate on another variable, just add that column to the meta.data

    pbmc@meta.data$my_new_var <- my_new_var_vector


The plot below shows the convergence of the Harmony objective function over each iteration of Harmony. 
Each point is a single cluster step. 
Different colors represent different Harmony iterations. 
In [5]:
options(repr.plot.height = 3, repr.plot.width = 6)
system.time(pbmc %<>% RunHarmony("stim", theta = 2, plot_convergence = TRUE))
Run Harmony
Using top 20 PCs
Harmony 1/5
Harmony 2/5
Harmony 3/5
Harmony 4/5
Harmony 5/5
   user  system elapsed 
 15.832   1.304  17.443 

After Harmony integration, cells align along the Harmonized PCs. If they did not, you could increase the theta value above and rerun the analysis.

In [6]:
options(repr.plot.height = 6, repr.plot.width = 12)
p1 <- DimPlot(object = pbmc, reduction.use = "harmony", pt.size = .1, group.by = "stim", do.return = T)
p2 <- VlnPlot(object = pbmc, features.plot = "Harmony1", group.by = "stim", do.return = TRUE)
plot_grid(p1,p2)

Genes correlated with the Harmonized PCs

In [7]:
PrintDim(object = pbmc, reduction.type = "harmony", dims.print = 1:5, genes.print = 10)
[1] "Harmony1"
 [1] "RPL21"  "RPS6"   "RPS18"  "RPL13A" "RPL3"   "RPL13"  "RPS3"   "RPS14" 
 [9] "RPL7"   "RPS27A"
[1] ""
 [1] "TYROBP"   "C15orf48" "FCER1G"   "SOD2"     "TIMP1"    "CST3"    
 [7] "CD63"     "FTL"      "LGALS1"   "ANXA5"   
[1] ""
[1] ""
[1] "Harmony2"
 [1] "HLA-DRA"  "HLA-DRB1" "RPS9"     "NPC2"     "EMP3"     "LYZ"     
 [7] "C15orf48" "PABPC1"   "ANXA2"    "S100A10" 
[1] ""
 [1] "NKG7"     "GNLY"     "GZMB"     "CCL5"     "PRF1"     "CST7"    
 [7] "CLIC3"    "KLRD1"    "APOBEC3G" "GZMA"    
[1] ""
[1] ""
[1] "Harmony3"
 [1] "CCR7"   "RPL32"  "RPL18A" "RPL11"  "RPS12"  "RPS23"  "RPL13"  "RPLP2" 
 [9] "RPS18"  "RPS2"  
[1] ""
 [1] "NKG7"   "GZMB"   "GNLY"   "CCL5"   "CST7"   "PRF1"   "CLIC3"  "KLRD1" 
 [9] "GZMH"   "FGFBP2"
[1] ""
[1] ""
[1] "Harmony4"
 [1] "B2M"    "HLA-B"  "HLA-C"  "HLA-A"  "TMSB4X" "NKG7"   "CYBA"   "GZMB"  
 [9] "CCL5"   "MALAT1"
[1] ""
 [1] "HBA1"        "HBA2"        "HBB"         "ALAS2"       "SNCA"       
 [6] "HBD"         "CA1"         "HBM"         "AHSP"        "RP11-51J9.5"
[1] ""
[1] ""
[1] "Harmony5"
 [1] "CD74"     "CD79A"    "HLA-DQA1" "CD83"     "HLA-DPA1" "HLA-DQB1"
 [7] "MS4A1"    "HLA-DRA"  "HLA-DPB1" "HLA-DRB1"
[1] ""
 [1] "GIMAP7"   "CD3D"     "ANXA1"    "GIMAP5"   "GIMAP4"   "IL7R"    
 [7] "LEPROTL1" "TMEM66"   "LDHB"     "CD3E"    
[1] ""
[1] ""
In [8]:
DimHeatmap(object = pbmc, reduction.type = "harmony", cells.use = 500, dim.use = 1:6, do.balanced = TRUE)

Seurat tSNE Functions for Integrated Analysis Using Harmony Results

Similar to the linked vigenette, the harmony results can be used for integrated analysis. You simply replace the CCA dimensional reduction with Harmony.

In [9]:
system.time({
pbmc %<>% RunTSNE(reduction.use = "harmony", dims.use = 1:20, do.fast = T)
pbmc %<>% FindClusters(reduction.type = "harmony", resolution = 0.6, dims.use = 1:20, force.recalc = TRUE, print.output = FALSE)
})
   user  system elapsed 
197.306   4.676 205.191 
In [10]:
p3 <- TSNEPlot(pbmc, do.return = T, pt.size = 0.5, group.by = "stim")
p4 <- TSNEPlot(pbmc, do.label = T, do.return = T, pt.size = 0.5)
plot_grid(p3, p4)
In [11]:
options(repr.plot.height = 10, repr.plot.width = 12)
FeaturePlot(object = pbmc, features.plot = c("CD3D", "SELL", "CREM", "CD8A", "GNLY", "CD79A", "FCGR3A", "CCL2", "PPBP"), 
            min.cutoff = "q9", cols.use = c("lightgrey", "blue"), pt.size = 0.5)
In [ ]: