There are two options for annotating unspliced/spliced molecules for velocyto: 1. Using velocyto.py loom files 2. Using dropEst output

Here we show how to load dropEst output matrices. We then use pagoda2 to obtain cell clusters/embedding, and then estimate/visualize velocity.

Data loading

dropEst includes -V option that writes out an additional file called cell.counts.matrices.rds, which contains separate matrices for three types of reads: 1. exonic - those molecules whose reads are all mapping to exonic regions 2. intronic - those molecules whose reads are all mapping to intronic regions 3. spanning - those molecules whose reads span both intronic and exonic regions of a gene

Here’s the example dropEst command that was ran for to generate these: ~/dropEst/dropest -V -C 6000 -m -g ~/dropEst/data/gtf/refflat_ucsc_mm10_exons.gtf.gz -c ~/dropEst/configs/indrop_v3.xml ~/inDropDatasets/20170409-Guo-SUB05381/04-indropEst/SCG_71_C2.*.aligned.bam

Load matrix rds file:

dat <- readRDS(url("http://pklab.med.harvard.edu/velocyto/mouseBM/cell.counts.matrices.rds"))

Load the velocyto package:

library(velocyto.R)
Loading required package: Matrix

Normalize and cluster cells using pagoda2

Using spliced expression matrix as input to pagoda2.

emat <- dat$exon
hist(log10(colSums(emat)),col='wheat',xlab='cell size')

# this dataset has already been pre-filtered, but this is where one woudl do some filtering
emat <- emat[,colSums(emat)>=1e3]

Pagoda2 processing

Pagoda2 is used to generate cell embedding, cell clustering, as well as a more accurate cell-cell distance matrix. You can alternatively generate those using other tools, such as Seurat2, etc.

Create pagoda2 object, adjust variance:

library(pagoda2)
r <- Pagoda2$new(emat,modelType='plain',trim=10,log.scale=T)
3017 cells, 9400 genes; normalizing ... using plain model winsorizing ... log scale ... done.
r$adjustVariance(plot=T,do.par=T,gam.k=10)
calculating variance fit ... using gam 
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-18. For overview type 'help("mgcv-package")'.
182 overdispersed genes ... 182 persisting ... done.

Run basic analysis steps to generate cell embedding and clustering, visualize:

r$calculatePcaReduction(nPcs=100,n.odgenes=3e3,maxit=300)
Loading required package: irlba
r$makeKnnGraph(k=30,type='PCA',center=T,distance='cosine');
Loading required package: igraph

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union
reading points ... done (3017 points)
building the index ... 

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************

done (0.0535099s)
running queries with k=30 ... 
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

done (0.179554s)
creating report data frame ... done
r$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')
r$getEmbedding(type='PCA',embeddingType='tSNE',perplexity=50,verbose=T)
Loading required package: Rtsne
calculating distance ... pearson ... done
running tSNE using 30 cores:
Read the 3017 x 3017 data matrix successfully!
OpenMP is working...
Using no_dims = 2, perplexity = 50.000000, and theta = 0.500000
Computing input similarities...
Building tree...
Done in 28.92 seconds (sparsity = 0.073050)!
Learning embedding...
Iteration 50: error is 75.103968 (50 iterations in 6.35 seconds)
Iteration 100: error is 66.781239 (50 iterations in 4.89 seconds)
Iteration 150: error is 66.252379 (50 iterations in 5.43 seconds)
Iteration 200: error is 66.144961 (50 iterations in 5.52 seconds)
Iteration 250: error is 66.113366 (50 iterations in 5.39 seconds)
Iteration 300: error is 1.867365 (50 iterations in 5.54 seconds)
Iteration 350: error is 1.732481 (50 iterations in 5.78 seconds)
Iteration 400: error is 1.683884 (50 iterations in 5.53 seconds)
Iteration 450: error is 1.660513 (50 iterations in 5.12 seconds)
Iteration 500: error is 1.649438 (50 iterations in 5.38 seconds)
Iteration 550: error is 1.643250 (50 iterations in 5.49 seconds)
Iteration 600: error is 1.639473 (50 iterations in 5.45 seconds)
Iteration 650: error is 1.636999 (50 iterations in 5.39 seconds)
Iteration 700: error is 1.634399 (50 iterations in 5.32 seconds)
Iteration 750: error is 1.632796 (50 iterations in 5.05 seconds)
Iteration 800: error is 1.631622 (50 iterations in 5.36 seconds)
Iteration 850: error is 1.630181 (50 iterations in 5.45 seconds)
Iteration 900: error is 1.629088 (50 iterations in 5.42 seconds)
Iteration 950: error is 1.627652 (50 iterations in 5.55 seconds)
Iteration 1000: error is 1.626624 (50 iterations in 5.38 seconds)
Fitting performed in 108.80 seconds.

Plot embedding, labeling clusters (left) and “Xist” expression (which separates the male and female )

par(mfrow=c(1,2))
r$plotEmbedding(type='PCA',embeddingType='tSNE',show.legend=F,mark.clusters=T,min.group.size=10,shuffle.colors=F,mark.cluster.cex=1,alpha=0.3,main='cell clusters')
r$plotEmbedding(type='PCA',embeddingType='tSNE',colors=r$depth,main='depth')  
treating colors as a gradient with zlim: 989.8 3210.8