The goal of this tutorial is to demonstrate how to run ligand-receptor analysis. Our implementation of this analysis tests for ligands that are differentially expressed across donors and identifies co-expression gene modules in other cell types associated with the ligands.
First, we’ll load the package and the data as in the main Walkthrough tutorial.
library(scITD)
library(dplyr)
# counts matrix
pbmc_counts <- readRDS('/home/jmitchel/data/van_der_wijst/pbmc_counts_v2.rds')
# meta data matrix
pbmc_meta <- readRDS('/home/jmitchel/data/van_der_wijst/pbmc_meta_v2.rds')
# ensembl to gene name conversions
feature.names <- readRDS('/home/jmitchel/data/van_der_wijst/genes.rds')
Next, we will form the tensor and run the decomposition the same way as in the main Walkthrough tutorial.
# set up project parameters
param_list <- initialize_params(ctypes_use = c("CD4+ T", "CD8+ T", "cMonocyte", "CD56(dim) NK", "B"),
ncores = 30, rand_seed = 10)
# create project container
pbmc_container <- make_new_container(count_data=pbmc_counts,
meta_data=pbmc_meta,
gn_convert = feature.names,
params=param_list,
label_donor_sex = TRUE)
# form the tensor from the data
pbmc_container <- form_tensor(pbmc_container, donor_min_cells=5,
norm_method='trim', scale_factor=10000,
vargenes_method='norm_var_pvals', vargenes_thresh=.1,
scale_var = TRUE, var_scale_power = 2)
#> parsing data matrix by cell/tissue type...
#> cleaning data...
#> Keeping 41 donors. All donors have at least 5 cells in each cell type included.
#> collapsing count matrices from cells to donors (aka pseudobulk operation)...
#> normalizing data...
#> calculating gene overdispersion factors...
#> selecting highly variable genes from each cell type...
#> scaling variance...
#> forming tensor...
#> Complete!
# run the tensor decomposition
pbmc_container <- run_tucker_ica(pbmc_container, ranks=c(5,10),
tucker_type = 'regular', rotation_type = 'hybrid')
Next, we will load up a resource of ligand-receptor cogante pairs from NicheNet. The data must be structured with ligands in 1 column and receptors in another with column names as “ligand” and “receptor”, respectively.
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
lr_network = lr_network %>% mutate(bonafide = ! database %in% c("ppi_prediction","ppi_prediction_go"))
lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor, bonafide)
# structuring the data properly
lr_pairs <- as.matrix(lr_network)
lr_pairs <- lr_pairs[lr_pairs[,'bonafide']=='TRUE',]
lr_pairs <- lr_pairs[,c(1,2)]
print(head(lr_pairs))
#> ligand receptor
#> [1,] "CXCL1" "CXCR2"
#> [2,] "CXCL2" "CXCR2"
#> [3,] "CXCL3" "CXCR2"
#> [4,] "CXCL5" "CXCR2"
#> [5,] "PPBP" "CXCR2"
#> [6,] "CXCL6" "CXCR2"
Next, we will prepare the data for LR analysis by computing WGCNA gene modules from the pseudobulk expression data in each cell type. First, we’ll need to select soft thresholds for the modules. This parameter is usually chosen to be the power value where the SFT.R.sq starts to level off above 0.9.
pbmc_container <- prep_LR_interact(pbmc_container, lr_pairs, norm_method='trim', scale_factor=10000,
var_scale_power=2)
#> pickSoftThreshold: will use block size 741.
#> pickSoftThreshold: calculating connectivity for given powers...
#> ..working on genes 1 through 741 of 741
#> Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
#> 1 1 0.3730 9.72 0.944 379.00 380.000 418.0
#> 2 2 0.0353 1.08 0.733 203.00 201.000 250.0
#> 3 3 0.4680 -2.17 0.619 114.00 109.000 163.0
#> 4 4 0.6760 -2.13 0.673 66.40 60.600 114.0
#> 5 5 0.8840 -2.15 0.852 40.40 35.000 83.2
#> 6 6 0.9580 -2.06 0.953 25.60 20.600 65.1
#> 7 7 0.9230 -1.96 0.911 16.90 12.600 53.6
#> 8 8 0.9390 -1.83 0.934 11.60 7.900 45.6
#> 9 9 0.8950 -1.72 0.890 8.28 4.990 39.9
#> 10 10 0.9320 -1.62 0.931 6.12 3.290 35.5
#> 11 12 0.9240 -1.45 0.934 3.67 1.450 29.5
#> 12 14 0.9300 -1.34 0.944 2.44 0.691 25.5
#> 13 16 0.9270 -1.27 0.936 1.75 0.347 22.4
#> 14 18 0.9340 -1.21 0.945 1.33 0.180 19.9
#> 15 20 0.9300 -1.18 0.938 1.05 0.099 17.9
#> pickSoftThreshold: will use block size 741.
#> pickSoftThreshold: calculating connectivity for given powers...
#> ..working on genes 1 through 741 of 741
#> Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
#> 1 1 0.00655 0.781 0.695 380.000 381.000 417.00
#> 2 2 0.11900 3.580 0.833 204.000 203.000 246.00
#> 3 3 0.18700 4.430 0.654 113.000 112.000 152.00
#> 4 4 0.24400 4.760 0.426 65.200 63.700 97.70
#> 5 5 0.27900 4.450 0.202 38.900 37.400 65.10
#> 6 6 0.26500 3.310 0.055 23.900 22.400 46.30
#> 7 7 0.25400 1.010 0.696 15.200 13.800 34.30
#> 8 8 0.06960 0.503 0.955 9.980 8.690 26.10
#> 9 9 0.03070 0.325 0.873 6.740 5.610 20.40
#> 10 10 0.40700 -0.972 0.765 4.680 3.660 16.30
#> 11 12 0.90900 -1.570 0.962 2.440 1.670 11.50
#> 12 14 0.95600 -1.680 0.974 1.400 0.780 9.16
#> 13 16 0.95900 -1.700 0.977 0.865 0.382 7.61
#> 14 18 0.95300 -1.670 0.973 0.572 0.200 6.53
#> 15 20 0.92900 -1.660 0.947 0.398 0.108 5.72
#> pickSoftThreshold: will use block size 741.
#> pickSoftThreshold: calculating connectivity for given powers...
#> ..working on genes 1 through 741 of 741
#> Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
#> 1 1 0.0381 2.030 0.8200 382.00 382.000 433.0
#> 2 2 0.1600 4.150 0.6700 210.00 204.000 278.0
#> 3 3 0.2520 4.220 0.2660 121.00 111.000 195.0
#> 4 4 0.2240 3.020 0.0118 73.60 62.300 145.0
#> 5 5 0.1850 0.647 0.4060 46.60 36.100 112.0
#> 6 6 0.0109 0.143 0.6340 30.80 21.700 89.8
#> 7 7 0.2520 -0.618 0.4090 21.20 13.200 73.7
#> 8 8 0.8650 -1.250 0.8440 15.10 8.470 61.5
#> 9 9 0.9020 -1.310 0.8850 11.10 5.380 52.1
#> 10 10 0.9100 -1.280 0.8870 8.36 3.500 44.6
#> 11 12 0.9160 -1.230 0.8960 5.10 1.620 33.5
#> 12 14 0.8980 -1.220 0.8790 3.34 0.789 26.1
#> 13 16 0.9150 -1.170 0.9030 2.30 0.400 20.7
#> 14 18 0.8970 -1.170 0.8810 1.65 0.214 16.7
#> 15 20 0.8690 -1.180 0.8520 1.22 0.115 13.6
#> pickSoftThreshold: will use block size 741.
#> pickSoftThreshold: calculating connectivity for given powers...
#> ..working on genes 1 through 741 of 741
#> Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
#> 1 1 0.08260 2.950 0.889 390.00 392.000 449.0
#> 2 2 0.20700 4.360 0.576 218.00 215.000 295.0
#> 3 3 0.23300 4.080 0.211 127.00 120.000 206.0
#> 4 4 0.19900 2.870 -0.023 77.00 68.600 152.0
#> 5 5 0.06930 0.355 0.662 48.40 40.300 117.0
#> 6 6 0.00559 -0.110 0.826 31.60 24.300 92.3
#> 7 7 0.37100 -0.995 0.742 21.30 14.800 75.1
#> 8 8 0.76600 -1.510 0.869 14.90 9.260 62.4
#> 9 9 0.84500 -1.580 0.902 10.70 5.910 52.8
#> 10 10 0.87000 -1.550 0.903 7.89 3.850 45.3
#> 11 12 0.85900 -1.530 0.869 4.63 1.720 34.4
#> 12 14 0.83000 -1.500 0.860 2.94 0.822 27.0
#> 13 16 0.90100 -1.370 0.912 1.99 0.417 21.8
#> 14 18 0.87900 -1.350 0.879 1.42 0.212 17.9
#> 15 20 0.82600 -1.330 0.804 1.05 0.112 14.9
#> pickSoftThreshold: will use block size 741.
#> pickSoftThreshold: calculating connectivity for given powers...
#> ..working on genes 1 through 741 of 741
#> Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
#> 1 1 0.000011 -0.0334 0.7240 367.000 377.0000 415.00
#> 2 2 0.067800 2.7800 0.8260 197.000 202.0000 246.00
#> 3 3 0.119000 3.6100 0.6350 109.000 110.0000 151.00
#> 4 4 0.154000 3.8100 0.3530 61.400 61.4000 95.90
#> 5 5 0.155000 3.3200 0.0415 35.500 35.0000 62.90
#> 6 6 0.107000 2.0800 -0.1490 21.000 20.3000 42.50
#> 7 7 0.074700 1.7400 -0.1820 12.700 12.1000 29.60
#> 8 8 0.110000 -0.6600 0.7620 7.870 7.4100 21.20
#> 9 9 0.231000 -0.9890 0.9540 4.980 4.5800 15.60
#> 10 10 0.141000 -0.5230 0.9550 3.220 2.8500 11.70
#> 11 12 0.734000 -2.0100 0.9150 1.440 1.1900 7.01
#> 12 14 0.903000 -2.2300 0.9360 0.701 0.5230 4.48
#> 13 16 0.889000 -2.1500 0.8700 0.372 0.2380 3.01
#> 14 18 0.286000 -3.0500 0.0980 0.215 0.1140 2.38
#> 15 20 0.326000 -2.9800 0.2210 0.135 0.0561 2.09
Now, compute the WGCNA modules with the appropriate soft thresholds.
sft_thresh <- c(10,12,10,16,14)
invisible({capture.output({
pbmc_container <- get_gene_modules(pbmc_container,sft_thresh)
})})
After computing the modules, we can test for associations between them and ligands across pairs of cell type. Note, only pairs of different kinds of cell types are tested, and we require the cognate receptor to be expressed to a minimal extent in the target cell type (determined by percentile_exp_rec
).
invisible({capture.output({
lr_hmap <- compute_LR_interact(pbmc_container, lr_pairs, sig_thresh=.001,
percentile_exp_rec=0.85, add_ld_fact_sig=TRUE)
})})
lr_hmap
The main heatmap (with red color) shows the p-values of the linear model associations between the ligands (rows) and co-expression modules (columns). It only shows results where the receptor is expressed in the target cell type. The heatmap on the right (with purple color) shows the p-values of the linear model associations between the ligands and the various scITD factors.
Next, we can plot an example hit: TNF ligand from cMonocytes to TNFRSF1B receptor in CD8+ T cells
lig_mod_fact <- plot_mod_and_lig(pbmc_container,factor_select=1,mod_ct='CD8+ T',mod=3,lig_ct='cMonocyte',lig='TNF')
lig_mod_fact
We can also look at gene sets that are enriched in the target modules associated with a given ligand. For the example interaction above, we will look at modules 3 and 5 in the CD8+ T cells.
mod_enr <- plot_multi_module_enr(pbmc_container, ctypes=c('CD8+ T','CD8+ T'), modules=c(3,5), sig_thresh=.05,
db_use=c('GO'),max_plt_pval=.05,h_w=c(8,2))
mod_enr
These enriched gene sets may represent the downstream response to stimulation by the ligand, though further analysis is needed to confirm this. We recommend checking the gene sets for expected targets impacted by the ligand, which is encoded in NicheNet’s regulatory potential scores.