There are two options for annotating unspliced/spliced molecules for velocyto: 1. Using velocyto.py loom files 2. Using dropEst output

Here we show how to load dropEst output matrices. We then use pagoda2 to obtain cell clusters/embedding, and then estimate/visualize velocity.

`dropEst`

includes `-V`

option that writes out an additional file called `cell.counts.matrices.rds`

, which contains separate matrices for three types of reads: 1. exonic - those molecules whose reads are all mapping to exonic regions 2. intronic - those molecules whose reads are all mapping to intronic regions 3. spanning - those molecules whose reads span both intronic and exonic regions of a gene

Here’s the example `dropEst`

command that was ran for to generate these: `~/dropEst/dropest -V -C 6000 -m -g ~/dropEst/data/gtf/refflat_ucsc_mm10_exons.gtf.gz -c ~/dropEst/configs/indrop_v3.xml ~/inDropDatasets/20170409-Guo-SUB05381/04-indropEst/SCG_71_C2.*.aligned.bam`

Load matrix rds file:

`dat <- readRDS(url("http://pklab.med.harvard.edu/velocyto/mouseBM/cell.counts.matrices.rds"))`

Load the velocyto package:

`library(velocyto.R)`

`Loading required package: Matrix`

Using spliced expression matrix as input to pagoda2.

```
emat <- dat$exon
hist(log10(colSums(emat)),col='wheat',xlab='cell size')
```

```
# this dataset has already been pre-filtered, but this is where one woudl do some filtering
emat <- emat[,colSums(emat)>=1e3]
```

Pagoda2 is used to generate cell embedding, cell clustering, as well as a more accurate cell-cell distance matrix. You can alternatively generate those using other tools, such as Seurat2, etc.

Create pagoda2 object, adjust variance:

```
library(pagoda2)
r <- Pagoda2$new(emat,modelType='plain',trim=10,log.scale=T)
```

`3017 cells, 9400 genes; normalizing ... using plain model winsorizing ... log scale ... done.`

`r$adjustVariance(plot=T,do.par=T,gam.k=10)`

`calculating variance fit ... using gam `

```
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-18. For overview type 'help("mgcv-package")'.
```

`182 overdispersed genes ... 182 persisting ... done.`

Run basic analysis steps to generate cell embedding and clustering, visualize:

`r$calculatePcaReduction(nPcs=100,n.odgenes=3e3,maxit=300)`

`Loading required package: irlba`

`r$makeKnnGraph(k=30,type='PCA',center=T,distance='cosine');`

```
Loading required package: igraph
Attaching package: ‘igraph’
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
```

```
reading points ... done (3017 points)
building the index ...
```

```
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************
```

```
done (0.0535099s)
running queries with k=30 ...
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
done (0.179554s)
creating report data frame ... done
```

```
r$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')
r$getEmbedding(type='PCA',embeddingType='tSNE',perplexity=50,verbose=T)
```

`Loading required package: Rtsne`

```
calculating distance ... pearson ... done
running tSNE using 30 cores:
Read the 3017 x 3017 data matrix successfully!
OpenMP is working...
Using no_dims = 2, perplexity = 50.000000, and theta = 0.500000
Computing input similarities...
Building tree...
Done in 28.92 seconds (sparsity = 0.073050)!
Learning embedding...
Iteration 50: error is 75.103968 (50 iterations in 6.35 seconds)
Iteration 100: error is 66.781239 (50 iterations in 4.89 seconds)
Iteration 150: error is 66.252379 (50 iterations in 5.43 seconds)
Iteration 200: error is 66.144961 (50 iterations in 5.52 seconds)
Iteration 250: error is 66.113366 (50 iterations in 5.39 seconds)
Iteration 300: error is 1.867365 (50 iterations in 5.54 seconds)
Iteration 350: error is 1.732481 (50 iterations in 5.78 seconds)
Iteration 400: error is 1.683884 (50 iterations in 5.53 seconds)
Iteration 450: error is 1.660513 (50 iterations in 5.12 seconds)
Iteration 500: error is 1.649438 (50 iterations in 5.38 seconds)
Iteration 550: error is 1.643250 (50 iterations in 5.49 seconds)
Iteration 600: error is 1.639473 (50 iterations in 5.45 seconds)
Iteration 650: error is 1.636999 (50 iterations in 5.39 seconds)
Iteration 700: error is 1.634399 (50 iterations in 5.32 seconds)
Iteration 750: error is 1.632796 (50 iterations in 5.05 seconds)
Iteration 800: error is 1.631622 (50 iterations in 5.36 seconds)
Iteration 850: error is 1.630181 (50 iterations in 5.45 seconds)
Iteration 900: error is 1.629088 (50 iterations in 5.42 seconds)
Iteration 950: error is 1.627652 (50 iterations in 5.55 seconds)
Iteration 1000: error is 1.626624 (50 iterations in 5.38 seconds)
Fitting performed in 108.80 seconds.
```

Plot embedding, labeling clusters (left) and “Xist” expression (which separates the male and female )

```
par(mfrow=c(1,2))
r$plotEmbedding(type='PCA',embeddingType='tSNE',show.legend=F,mark.clusters=T,min.group.size=10,shuffle.colors=F,mark.cluster.cex=1,alpha=0.3,main='cell clusters')
r$plotEmbedding(type='PCA',embeddingType='tSNE',colors=r$depth,main='depth')
```

`treating colors as a gradient with zlim: 989.8 3210.8 `