Here we show an example of how loom-annotated matrices of a 10x dataset can be loaded and analyzed in R using velocyto.R and pagoda2.

Data loading

Load the velocyto package:

library(velocyto.R)

Load loom data

# you can download 10X43_1.loom from the following URL: http://pklab.med.harvard.edu/velocyto/DG1/10X43_1.loom
ldat <- read.loom.matrices("10X43_1.loom")

Normalize and cluster cells using pagoda2

Using spliced expression matrix as input to pagoda2.

emat <- ldat$spliced
# this dataset has already been pre-filtered, but this is where one woudl do some filtering
emat <- emat[,colSums(emat)>=1e3]

Pagoda2 processing

Pagoda2 is used to generate cell embedding, cell clustering, as well as a more accurate cell-cell distance matrix. You can alternatively generate those using other tools, such as Seurat2, etc.

Create pagoda2 object, adjust variance:

library(pagoda2)
r <- Pagoda2$new(emat,modelType='plain',trim=10,log.scale=T)
3396 cells, 10995 genes; normalizing ... using plain model winsorizing ... log scale ... done.
r$adjustVariance(plot=T,do.par=T,gam.k=10)
calculating variance fit ... using gam 
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-18. For overview type 'help("mgcv-package")'.
388 overdispersed genes ... 388 persisting ... done.

Run basic analysis steps to generate cell embedding and clustering, visualize:

r$calculatePcaReduction(nPcs=100,n.odgenes=3e3,maxit=300)
Loading required package: irlba
r$makeKnnGraph(k=30,type='PCA',center=T,distance='cosine');
Loading required package: igraph

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union
reading points ... done (3396 points)
building the index ... 

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************

done (0.0523537s)
running queries with k=30 ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

done (0.225207s)
creating report data frame ... done
r$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')
r$getEmbedding(type='PCA',embeddingType='tSNE',perplexity=50,verbose=T)
Loading required package: Rtsne
calculating distance ... pearson ... done
running tSNE using 30 cores:
Read the 3396 x 3396 data matrix successfully!
OpenMP is working...
Using no_dims = 2, perplexity = 50.000000, and theta = 0.500000
Computing input similarities...
Building tree...
Done in 32.45 seconds (sparsity = 0.064151)!
Learning embedding...
Iteration 50: error is 75.376272 (50 iterations in 6.44 seconds)
Iteration 100: error is 65.281519 (50 iterations in 6.35 seconds)
Iteration 150: error is 64.258071 (50 iterations in 6.55 seconds)
Iteration 200: error is 63.992296 (50 iterations in 5.88 seconds)
Iteration 250: error is 63.869374 (50 iterations in 5.66 seconds)
Iteration 300: error is 1.699583 (50 iterations in 5.46 seconds)
Iteration 350: error is 1.496087 (50 iterations in 5.47 seconds)
Iteration 400: error is 1.417420 (50 iterations in 5.31 seconds)
Iteration 450: error is 1.384008 (50 iterations in 5.33 seconds)
Iteration 500: error is 1.364940 (50 iterations in 5.38 seconds)
Iteration 550: error is 1.355332 (50 iterations in 5.45 seconds)
Iteration 600: error is 1.346593 (50 iterations in 5.51 seconds)
Iteration 650: error is 1.341220 (50 iterations in 5.55 seconds)
Iteration 700: error is 1.335850 (50 iterations in 5.50 seconds)
Iteration 750: error is 1.332704 (50 iterations in 5.52 seconds)
Iteration 800: error is 1.329928 (50 iterations in 5.48 seconds)
Iteration 850: error is 1.326857 (50 iterations in 5.40 seconds)
Iteration 900: error is 1.324685 (50 iterations in 5.68 seconds)
Iteration 950: error is 1.323039 (50 iterations in 5.73 seconds)
Iteration 1000: error is 1.321794 (50 iterations in 5.58 seconds)
Fitting performed in 113.24 seconds.

Plot embedding, labeling clusters (left) and “Xist” expression (which separates the male and female )

par(mfrow=c(1,2))
r$plotEmbedding(type='PCA',embeddingType='tSNE',show.legend=F,mark.clusters=T,min.group.size=10,shuffle.colors=F,mark.cluster.cex=1,alpha=0.3,main='cell clusters')
r$plotEmbedding(type='PCA',embeddingType='tSNE',colors=r$counts[,"Xist"],main='Xist')  
treating colors as a gradient with zlim: 0 1.192859 

Velocity estimation

Prepare matrices and clustering data:

emat <- ldat$spliced; nmat <- ldat$unspliced
emat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)]; # restrict to cells that passed p2 filter
# take cluster labels
cluster.label <- r$clusters$PCA[[1]]
cell.colors <- pagoda2:::fac2col(cluster.label)
# take embedding
emb <- r$embeddings$PCA$tSNE

In addition to clustering and the t-SNE embedding, from the p2 processing we will also take a cell-cell distance, which will be better than the default whole-transcriptome correlation distance that velocyto.R would normally use.

cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))

Filter genes based on the minimum average expresion magnitude (in at least one of the clusters), output total number of resulting valid genes:

emat <- filter.genes.by.cluster.expression(emat,cluster.label,min.max.cluster.average = 0.5)
nmat <- filter.genes.by.cluster.expression(nmat,cluster.label,min.max.cluster.average = 0.05)
length(intersect(rownames(emat),rownames(emat)))
[1] 3636

Estimate RNA velocity (using gene-relative model with k=20 cell kNN pooling and using top/bottom 2% quantiles for gamma fit):

fit.quantile <- 0.02
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells=20,cell.dist=cell.dist,fit.quantile=fit.quantile)
calculating cell knn ... done
calculating convolved matrices ... done
fitting gamma coefficients ... done. succesfful fit for 1196 genes
filtered out 161 out of 1196 genes due to low nmat-emat correlation
filtered out 137 out of 1035 genes due to low nmat-emat slope
calculating RNA velocity shift ... done
calculating extrapolated cell state ... 
Note: method with signature ‘CsparseMatrix#Matrix#missing#replValue’ chosen for function ‘[<-’,
 target signature ‘dgCMatrix#lgCMatrix#missing#numeric’.
 "Matrix#lsparseMatrix#missing#replValue" would also be valid
done

Visualize velocity on the t-SNE embedding, using velocity vector fields:

show.velocity.on.embedding.cor(emb,rvel.cd,n=300,scale='sqrt',cell.colors=ac(cell.colors,alpha=0.5),cex=0.8,arrow.scale=5,show.grid.flow=TRUE,min.grid.cell.mass=0.5,grid.n=40,arrow.lwd=1,do.par=F,cell.border.alpha = 0.1)
delta projections ... sqrt knn ... transition probs ... done
calculating arrows ... grid estimates ... grid.sd= 1.220483  min.arrow.size= 0.02440966  max.grid.arrow.length= 0.03236221  done