A guide to using cellexalvrR

Stefan Lang & Shamit Soneji

2020-09-17

Introduction

This document describes how to use cellexalvrR, an R package that accompanies CellexalVR which is a virtual reality environment to analyze single-cell RNAseq data. cellexalvrR has two functions:

  1. To aid the formatting and export of data that can be imported by CellexalVR.
  2. To perform backend calculations during a CellexalVR session.

Installation

The easiest way to install cellexalvrR is directly from github using the devtools package:

library(devtools)
install_github("sonejilab/cellexalvrR")

If you are installing this on your VR workstation then make sure you install cellexalvrR system-wide. Right-click the R icon, and then "Run as administrator" before issuing the commands above. Also make sure that you also have the Rtools installed on your Windows machine which provides C and C++ compilers.

Quick start

Exporting files from a Seurat object

We have a simple function to convert a Seurat ovject to a cellexalvr object prior to export. To demonstrate this we will use the their pbmc example. We have processed the data as per the vignette here, and we pick it up from where the UMAP/tSNE is made. You can get an rds file from here (unzip first) to follow along.


library(Seurat)
library(cellexalvrR)
pbmc <- readRDS("pbmc.rds")

# We will now calculate a UMAP and tSNE, but in each case embedding to 3 dimensions: 

pbmc <- RunUMAP(pbmc, dims = 1:10,n.components = 3)
Embeddings(pbmc,"umap")[1,]

pbmc <- RunTSNE(pbmc, dims = 1:10,dim.embed = 3)
Embeddings(pbmc,"tsne")[1,]

#Now we convert the object to a cellexalvr object using the cell identity and cluster as metadata for the cells (and any others you wish):
cvr <- as_cellexalvrR(pbmc,c("orig.ident","seurat_clusters"),specie="human")
cvr

#We're ready to export. Fist make a folder if you haven't already:
dir.create("PBMC")
export2cellexalvr(cvr,"PBMC")

#Done!

Copy the PBMC folder and contents to the "Data"" folder where you downloaded and unpacked CellexalVR. It will then be available for loading when you step into CellexalVR.

Using AnnData objects

Currently we do not have a package to export a Scanpy object directly to cellexalVR ready files (just yet), but a quick route is to use seurat-disk to import the h5ad file into a Seurat and use the steps above. Get the 3k PBMC h5ad file from here and unzip. It was made using this Scanpy tutorial where the only change we made was to embed the UMAP into 3 dims using sc.tl.umap(adata,n_components=3) :

library(Seurat)
library(SeuratDisk)
library(cellexalvrR)

#Convert("pbmc3k.h5ad", dest = "h5seurat", overwrite = TRUE)
#pbmc3k <- LoadH5Seurat("pbmc3k.h5seurat")

cvr <- as_cellexalvrR('pbmc3k.h5ad', c("leiden"), specie="human", velocity='none')

dir.create("PBMC3K")
export2cellexalvr(cvr,"PBMC3K")
#Done!

Using AnnData objects from an scvelo run

If you have used scvelo to caculate cell velocities then only a couple more options need to be added to the export. In this example we ran the demo with two extra lines:

sc.tl.umap(adata,n_components=3) # embed to 3 dims
sc.tl.leiden(adata) #get clusters

You can get a zip of the resulting h5ad file from here. The rest is almost the same as above (in R):

library(Seurat)
library(SeuratDisk)
library(cellexalvrR)

#Convert("pancreas_scvelo.h5ad", dest = "h5seurat", overwrite = TRUE)
#pancreas <- LoadH5Seurat("pancreas_scvelo.h5seurat")

cvr <- as_cellexalvrR("pancreas_scvelo.h5ad",c("cluster","leiden"),specie="human",velocity="scvelo",scale.arrow.travel=30)

# The deltas can be quite small, so in the line above we have 'scale.arrow.travel' which multiples delta by factor specified to increase the path of travel for each cell to make them more visible.

dir.create("Pancreas_scvelo")
export2cellexalvr(cvr,"Pancreas_scvelo")
#Done!

In general, to enable velocity visualisation in CellexalVR one supplies 6-column .mds files, where the last three columns denote the destination x/y/z coordinate for each cell

Exporting scATAC data from a Seurat object

We followed the example here, again embedding to 3 dimensions as shown above. A zip of the RDS file can be obtained here which is a seurat object. Exporting the data:


library(Seurat)
library(cellexalvrR)
pbmc.atac <- readRDS("pbmc.atac.rds")

# convert the object using the gene activity assay
cvr <- as_cellexalvrR(pbmc.atac,c("seurat_clusters"),specie="human",assay="ACTIVITY")
cvr

#We're ready to export. Fist make a folder if you haven't already:
dir.create("scATAC")
export2cellexalvr(cvr,"scATAC")

#Done!

Making and exporting a cellexalVR object from it's elements.

If you aren't using Seurat the following guide will show you how to make a cellexalVR oject from scratch using our in-built functions. The requirments are very simple and examples are shown below.

The data from Nestorowa et al can be downloaded from here. Unpack them and set your working directory to where they are.

First, load the library:

library(cellexalvrR)
#> Loading required package: Matrix
#> 
#> 
#> 

Then load the data:

load("exdata.RData")
load("facs.RData")
load("cell.ids.RData")
load("diff.proj.RData")
load("ddr.proj.RData")
load("tsne.proj.RData")

exdata is a sparse matrix of highly variable genes in log2 form. The first 10 columns and 10 rows look as so:

exdata[1:10,1:10]
#> 10 x 10 sparse Matrix of class "dgCMatrix"
#>    [[ suppressing 10 column names 'HSPC_001', 'HSPC_002', 'HSPC_003' ... ]]
#>                                                                                
#> Clec1b         .         .        .         .        1.039889 1.172368 .       
#> Kdm3a          .         .        .         .        .        8.313856 7.042431
#> Coro2b         9.4755769 .        .         .        1.637906 1.172368 .       
#> 8430408G22Rik  .         .        .         .        .        .        .       
#> Clec9a         .         .        .         .        .        .        .       
#> Phf6           7.7333697 7.271829 9.4130405 2.677008 5.832172 7.856124 3.298544
#> Usp14          1.4788998 3.378561 8.6774851 1.659063 7.391105 3.091340 8.689676
#> Tmem167b       .         9.093340 0.8302297 .        1.637906 .        .       
#> Kbtbd7        10.0456005 2.861341 1.3538522 1.659063 2.385231 7.902221 .       
#> Rag2           0.5329056 2.047347 0.8302297 .        6.381515 .        .       
#>                                         
#> Clec1b        .        1.486056 .       
#> Kdm3a         2.732774 2.202400 2.209467
#> Coro2b        1.090491 .        .       
#> 8430408G22Rik .        .        .       
#> Clec9a        .        .        .       
#> Phf6          9.103079 6.695850 1.491864
#> Usp14         7.566973 2.202400 8.129944
#> Tmem167b      .        1.486056 1.491864
#> Kbtbd7        .        .        .       
#> Rag2          1.704399 7.365545 1.491864

Note the cell IDs are in the column names, and the gene names are the row names.

class(exdata)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"

facs is a matrix of cell surface marker intensities captured during index sorting:

head(facs)
#>               CD34      CD16     c.Kit      EPCR      Flk2     CD150      CD48
#> HSPC_001 0.6037796 0.4322826 1.3509535 1.5235524 0.3618872 0.3036440 0.4359241
#> HSPC_002 0.3652949 0.5923082 1.1668190 1.5575771 0.3985330 0.1802330 0.3413969
#> HSPC_003 1.1688350 0.3103070 1.2716602 0.1796609 0.2225457 0.2708689 1.4005640
#> HSPC_004 0.7234537 0.5098244 1.2890254 0.1047645 0.7382823 0.1746147 1.0903352
#> HSPC_006 0.7806266 0.2244971 0.8613574 0.1909399 0.7124364 0.4767371 1.1907472
#> HSPC_008 0.8303475 0.5984071 1.2414646 1.4069329 0.4826786 0.2322792 0.3781136
#>                Lin      Sca1
#> HSPC_001 0.2989843 1.3179900
#> HSPC_002 0.4922320 1.5137737
#> HSPC_003 0.4194964 0.6391425
#> HSPC_004 0.5070497 0.5363241
#> HSPC_006 0.3140937 0.5699483
#> HSPC_008 0.2573233 1.2604954

Cell IDs are in the row names, and the name of the surface protein in the columns.

cell.ids is a 1/0 matrix that assigns metadata to the each cell. In this case it shows the cell type each cell was sorted as:

head(cell.ids)
#>          LTHSC_broad LMPP_broad MPP_broad CMP_broad MEP_broad GMP_broad
#> HSPC_001           0          0         1         0         0         0
#> HSPC_002           1          0         0         0         0         0
#> HSPC_003           0          0         1         0         0         0
#> HSPC_004           0          1         0         0         0         0
#> HSPC_006           0          1         0         0         0         0
#> HSPC_008           0          1         0         0         0         0
#>          MPP1_broad MPP2_broad MPP3_broad STHSC_broad LTHSC LMPP MPP CMP MEP
#> HSPC_001          0          0          0           1     0    0   0   0   0
#> HSPC_002          0          0          0           0     1    0   0   0   0
#> HSPC_003          0          0          1           0     0    0   0   0   0
#> HSPC_004          0          0          0           0     0    1   0   0   0
#> HSPC_006          0          0          0           0     0    1   0   0   0
#> HSPC_008          0          0          0           0     0    1   0   0   0
#>          GMP MPP1 MPP2 MPP3 STHSC ESLAM HSC1 Projected
#> HSPC_001   0    0    0    0     0     0    0         0
#> HSPC_002   0    0    0    0     0     0    0         0
#> HSPC_003   0    0    0    0     0     0    0         0
#> HSPC_004   0    0    0    0     0     0    0         0
#> HSPC_006   0    0    0    0     0     0    0         0
#> HSPC_008   0    0    0    0     0     0    0         0

Cell IDs are in the row names, and the name of the meta data in the columns. A 1 is given if the cell belongs to a metadata class, 0 otherwise.

diff.proj,tsne.proj,and diff.proj are the results from three different dimension reduction methods applied to exdata, specifically DDRTree, tSNE and diffusion map respectively. Each is a three column matrix of x/y/z coordinates. For example:

head(diff.proj)
#>                   DC1          DC2          DC3
#> HSPC_001 -0.014760110 -0.024119177 -0.007998293
#> HSPC_002 -0.010255308 -0.019970919 -0.015754327
#> HSPC_003 -0.003649320  0.027297654  0.014202316
#> HSPC_004 -0.012653205 -0.002990825  0.025813077
#> HSPC_006 -0.005876981  0.021537658  0.035740656
#> HSPC_008 -0.015699148 -0.018777106  0.007916241

The first step is to put all the dimension reduction outputs into a single list:

proj.list <- list(diffusion=diff.proj,DDRtree=ddr.proj,tSNE=tsne.proj)

The next is to make a cellexalvr object by calling MakeCellexaVRObj and passing the required objects to it:

cellvr <- MakeCellexalVRObj(exdata,drc.list=proj.list,specie="mouse",cell.meta=cell.ids,facs.data=facs)

In the same step we also set the specie as mouse which ensures the correct transcription factor IDs are using during network construction if implemented in CellexalVR.

Calling the object name will detail it's contents:

cellvr
#> An object of class cellexalvrR 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'   
#> There are 0 user group(s) stored :
#> and 3 drc object(s)
#> Specie is set to mouse

The last step is to call export2cellexalvr which will write the neccessary files from the cellvr object to a specified directory:

dir.create("CellexalOut/")
export2cellexalvr(cellvr,"CellexalOut/")

All the files needed by CellexalVR are created at this point. The entire "CellexalOut/" folder should then be moved/copied to the "Data" folder in your CellexalVR setup. See the manual for more details including a schematic of the folder structure.

Making a cellexalvr object from scratch

While MakeCellexalVRObj is the most convenient way to make the object, sometimes you want to make one manually. This is done calling new:

cell.vr.scr <- new("cellexalvr",data=Matrix::Matrix(exdata,sparse=T),drc=list(tsne=tsne.proj))
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1,1): ''   
#> There are 0 user group(s) stored :
#> and 1 drc object(s)
#> Specie is set to NA

We can add another set of dimension reduction coordinates using the addMDS2cellexalvr function:

cell.vr.scr <- addDRC2cellexalvr(cell.vr.scr,ddr.proj,"DDRTree")
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1,1): ''   
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA

To add metadata for each cell use addCellMeta2cellexalvr:

cell.vr.scr <- addCellMeta2cellexalvr(cell.vr.scr,cell.ids)
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'   
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA

To add FACS for each cell use addFACS2cellexalvr:

cell.vr.scr <- addFACS2cellexalvr(cell.vr.scr,facs)

Setting the specie is done using the set.specie function:

cell.vr.scr <- set.specie(cell.vr.scr,"mouse")
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'   
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to mouse

Making cell metadata from a data frame

CellexalVR requires metadata in the form of a 1/0 matrix, but many packages store it as a data frame. CellexalvrR has function to convert a data frame into a 1/0 matrix. First, lets make a data frame:

meta.df <- data.frame(CellType=sample(c("Type1","Type2","Type3"),10,replace=T),
                      Phase=sample(c("G1","G2M","S"),10,replace=T),
                      Treatment=sample(c("WT","Dox"),10,replace=T))
head(meta.df)
#>   CellType Phase Treatment
#> 1    Type3     S       Dox
#> 2    Type2    G1        WT
#> 3    Type1     S        WT
#> 4    Type3    G1       Dox
#> 5    Type1    G1        WT
#> 6    Type3     S        WT

We can now make a correctly formatted cell metadata matrix by applying the function make.cell.meta.from.df using only the fields we need, in this case the "CellType" and "Treatment" columns:

required.cell.metad <- make.cell.meta.from.df(meta.df,c("CellType","Treatment"))
head(required.cell.metad)
#>   CellType@Type3 CellType@Type2 CellType@Type1 Treatment@Dox Treatment@WT
#> 1              1              0              0             1            0
#> 2              0              1              0             0            1
#> 3              0              0              1             0            1
#> 4              1              0              0             1            0
#> 5              0              0              1             0            1
#> 6              1              0              0             0            1

It can be seen the field name is placed in the column heading preceeding a "@", and this is used by CellexalVR to form catagories on the menu system, so "CellType" and "Treatment" will be on separate tabs. This metadata matrix can now be added to a cellexalvrR object as decribed above using the addCellMeta2cellexalvr function.