Annotate cells to cell types using cellassign

Automatically annotate cells to known types based on the expression patterns of a priori known marker genes.

cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2,
  X = NULL, B = 10, shrinkage = TRUE, n_batches = 1,
  dirichlet_concentration = 0.01, rel_tol_adam = 1e-04,
  rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20,
  learning_rate = 0.1, verbose = TRUE, sce_assay = "counts",
  return_SCE = FALSE, num_runs = 1)

Arguments

exprs_obj	Either a matrix representing gene expression counts or a `SummarizedExperiment`. See details.
marker_gene_info	Information relating marker genes to cell types. See details.
s	Numeric vector of cell size factors
min_delta	The minimum log fold change a marker gene must be over-expressed by in its cell type
X	Numeric matrix of external covariates. See details.
B	Number of bases to use for RBF dispersion function
shrinkage	Logical - should the delta parameters have hierarchical shrinkage?
n_batches	Number of data subsample batches to use in inference
dirichlet_concentration	Dirichlet concentration parameter for cell type abundances
rel_tol_adam	The change in Q function value (in pct) below which each optimization round is considered converged
rel_tol_em	The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged
max_iter_adam	Maximum number of ADAM iterations to perform in each M-step
max_iter_em	Maximum number of EM iterations to perform
learning_rate	Learning rate of ADAM optimization
verbose	Logical - should running info be printed?
sce_assay	The `assay` from the input#' `SingleCellExperiment` to use: this assay should always represent raw counts.
return_SCE	Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details.
num_runs	Number of EM optimizations to perform (the one with the maximum log-marginal likelihood value will be used as the final).

Value

An object of class cellassign. See details

Details

Input format exprs_obj should be either a SummarizedExperiment (we recommend the SingleCellExperiment package) or a cell (row) by gene (column) matrix of raw RNA-seq counts (do not log-transform or otherwise normalize).

marker_gene_info should either be

A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise
A list with names corresponding to cell types, where each entry is a vector of marker gene names. These are converted to the above matrix using the marker_list_to_mat function.

Cell size factors If the cell size factors s are not provided they are computed using the computeSumFactors function from the scran package.

Covariates If X is not NULL then it should be an N by P matrix of covariates for N cells and P covariates. Such a matrix would typically be returned by a call to model.matrix with no intercept. It is also highly recommended that any numerical (ie non-factor or one-hot-encoded) covariates be standardized to have mean 0 and standard deviation 1.

cellassign A call to cellassign returns an object of class cellassign. To access the MLE estimates of cell types, call fit$cell_type. To access all MLE parameter estimates, call fit$mle_params.

Returning a SingleCellExperiment

If return_SCE is true, a call to cellassign will return the input SingleCellExperiment, with the following added:

A column cellassign_celltype to colData(sce) with the MAP estimate of the cell type
A slot sce@metadata$cellassign containing the cellassign fit. Note that a SingleCellExperiment must be provided as exprs_obj for this option to be valid.

Examples

data(example_sce)
data(example_marker_mat)

fit <- em_result <- cellassign(example_sce[rownames(example_marker_mat),],
marker_gene_info = example_marker_mat,
s = colSums(SummarizedExperiment::assay(example_sce, "counts")),
learning_rate = 1e-2,
shrinkage = TRUE,
verbose = FALSE)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Warning: package ‘SummarizedExperiment’ was built under R version 3.6.1
#> Loading required package: GenomicRanges
#> Warning: package ‘GenomicRanges’ was built under R version 3.6.1
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:parallel’:
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which, which.max, which.min
#> Loading required package: S4Vectors
#> Warning: package ‘S4Vectors’ was built under R version 3.6.1
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:base’:
#> 
#>     expand.grid
#> Loading required package: IRanges
#> Warning: package ‘IRanges’ was built under R version 3.6.1
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: DelayedArray
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘matrixStats’
#> The following objects are masked from ‘package:Biobase’:
#> 
#>     anyMissing, rowMedians
#> Loading required package: BiocParallel
#> Warning: package ‘BiocParallel’ was built under R version 3.6.1
#> 
#> Attaching package: ‘DelayedArray’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
#> The following objects are masked from ‘package:base’:
#> 
#>     aperm, apply, rowsum

Annotate cells to cell types using cellassign

Arguments

Value

Details

Examples

Contents