Contents

1 Overview

cellassign assigns cells measured using single cell RNA sequencing to known cell types based on marker gene information. Unlike other
methods for assigning cell types from single cell RNA-seq data, cellassign does not require labeled single cell or purified bulk expression data – cellassign only needs to know whether or not each given gene is a marker of each cell type:

Inference is performed using Tensorflow. For more details please see the manuscript.

2 Installation

cellassign depends on tensorflow, which can be installed as follows:

install.packages("tensorflow")
library(tensorflow)
install_tensorflow()

You can confirm that the installation succeeded by running:

sess = tf$Session()
hello <- tf$constant('Hello, TensorFlow!')
sess$run(hello)

Note that the tf object is created automatically when the tensorflow library is loaded to provide access to the Tensorflow interface.

For more details see the Rstudio page on tensorflow installation.

cellassign can then be installed through Bioconductor via

BiocManager::install('cellassign')

or the development version through github using the devtools package :

devtools::install_github("Irrationone/cellassign")

3 Basic usage

We begin by illustrating basic usage of cellassign on some example data bundled with the package. First, load the relevant libraries:

library(SingleCellExperiment)
library(cellassign)

We use an example SingleCellExperiment consisting of 10 marker genes and 500 cells:

data(example_sce)
print(example_sce)
#> class: SingleCellExperiment 
#> dim: 10 500 
#> metadata(1): params
#> assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
#> rownames(10): Gene186 Gene205 ... Gene949 Gene994
#> rowData names(6): Gene BaseGeneMean ... DEFacGroup1 DEFacGroup2
#> colnames(500): Cell1 Cell2 ... Cell499 Cell500
#> colData names(5): Cell Batch Group ExpLibSize EM_group
#> reducedDimNames(0):
#> spikeNames(0):

The true cell types are annotated for convenience in the Group slot of the SingleCellExperiment:

print(head(example_sce$Group))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"

Also provided is an example gene-by-cell-type binary matrix, whose entries are 1 if a gene is a marker for a given cell type and 0 otherwise:

data(example_rho)
print(example_rho)
#>         Group1 Group2
#> Gene186      1      0
#> Gene205      0      1
#> Gene269      1      0
#> Gene526      1      0
#> Gene536      1      0
#> Gene575      0      1
#> Gene754      0      1
#> Gene773      0      1
#> Gene949      0      1
#> Gene994      1      0

We further require size factors for each cell. These are stored in sizeFactors(example_sce) - for your data we recommend computing them using the computeSumFactors function from the scran package.

We then call cellassign using the cellassign() function, passing in the above information:

s <- sizeFactors(example_sce)

fit <- cellassign(exprs_obj = example_sce, 
                  marker_gene_info = example_rho, 
                  s = s, 
                  learning_rate = 1e-2, 
                  shrinkage = TRUE,
                  verbose = FALSE)

This returns a cellassign object:

print(fit)
#> A cellassign fit for 500 cells, 10 genes, 2 cell types with 0 covariates
#>             To access cell types, call celltypes(x)
#>             To access cell type probabilities, call cellprobs(x)

We can access the maximum likelihood estimates (MLE) of cell type using the celltypes function:

print(head(celltypes(fit)))
#> [1] "Group1" "Group2" "Group2" "Group1" "Group1" "Group1"

and all MLE parameters using mleparams:

print(str(mleparams(fit)))
#> List of 9
#>  $ delta  : num [1:10, 1:2] 2.32 0 2.5 2.9 2.89 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:10] "Gene186" "Gene205" "Gene269" "Gene526" ...
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ beta   : num [1:10, 1] 0.487 -0.255 -1.016 1.195 1.476 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:10] "Gene186" "Gene205" "Gene269" "Gene526" ...
#>   .. ..$ : NULL
#>  $ phi    : num [1:500, 1:10, 1:2] 1.86 1.87 1.86 1.86 1.86 ...
#>  $ gamma  : num [1:500, 1:2] 1.00 1.56e-145 1.01e-43 1.00 1.00 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ mu     : num [1:500, 1:10, 1:2] 22.6 80.9 11.5 15.5 15.8 ...
#>  $ a      : num [1:10(1d)] 1.03 1.08 1.13 1.19 1.26 ...
#>  $ theta  : num [1:2(1d)] 0.472 0.528
#>   ..- attr(*, "dimnames")=List of 1
#>   .. ..$ : chr [1:2] "Group1" "Group2"
#>  $ ld_mean: num 1
#>  $ ld_var : num 0.531
#> NULL

We can also visualize the probabilities of assignment using the cellprobs function that returns a probability matrix for each cell and cell type:

pheatmap::pheatmap(cellprobs(fit))