Automatically annotate cells to known types based on the expression patterns of a priori known marker genes.

cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2,
  X = NULL, B = 10, shrinkage = TRUE, n_batches = 1,
  dirichlet_concentration = 0.01, rel_tol_adam = 1e-04,
  rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20,
  learning_rate = 0.1, verbose = TRUE, sce_assay = "counts",
  return_SCE = FALSE, num_runs = 1)

Arguments

exprs_obj

Either a matrix representing gene expression counts or a SummarizedExperiment. See details.

marker_gene_info

Information relating marker genes to cell types. See details.

s

Numeric vector of cell size factors

min_delta

The minimum log fold change a marker gene must be over-expressed by in its cell type

X

Numeric matrix of external covariates. See details.

B

Number of bases to use for RBF dispersion function

shrinkage

Logical - should the delta parameters have hierarchical shrinkage?

n_batches

Number of data subsample batches to use in inference

dirichlet_concentration

Dirichlet concentration parameter for cell type abundances

rel_tol_adam

The change in Q function value (in pct) below which each optimization round is considered converged

rel_tol_em

The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged

max_iter_adam

Maximum number of ADAM iterations to perform in each M-step

max_iter_em

Maximum number of EM iterations to perform

learning_rate

Learning rate of ADAM optimization

verbose

Logical - should running info be printed?

sce_assay

The assay from the input#' SingleCellExperiment to use: this assay should always represent raw counts.

return_SCE

Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details.

num_runs

Number of EM optimizations to perform (the one with the maximum log-marginal likelihood value will be used as the final).

Value

An object of class cellassign. See details

Details

Input format exprs_obj should be either a SummarizedExperiment (we recommend the SingleCellExperiment package) or a cell (row) by gene (column) matrix of raw RNA-seq counts (do not log-transform or otherwise normalize).

marker_gene_info should either be

  • A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise

  • A list with names corresponding to cell types, where each entry is a vector of marker gene names. These are converted to the above matrix using the marker_list_to_mat function.

Cell size factors If the cell size factors s are not provided they are computed using the computeSumFactors function from the scran package.

Covariates If X is not NULL then it should be an N by P matrix of covariates for N cells and P covariates. Such a matrix would typically be returned by a call to model.matrix with no intercept. It is also highly recommended that any numerical (ie non-factor or one-hot-encoded) covariates be standardized to have mean 0 and standard deviation 1.

cellassign A call to cellassign returns an object of class cellassign. To access the MLE estimates of cell types, call fit$cell_type. To access all MLE parameter estimates, call fit$mle_params.

Returning a SingleCellExperiment

If return_SCE is true, a call to cellassign will return the input SingleCellExperiment, with the following added:

  • A column cellassign_celltype to colData(sce) with the MAP estimate of the cell type

  • A slot sce@metadata$cellassign containing the cellassign fit. Note that a SingleCellExperiment must be provided as exprs_obj for this option to be valid.

Examples

data(example_sce) data(example_marker_mat) fit <- em_result <- cellassign(example_sce[rownames(example_marker_mat),], marker_gene_info = example_marker_mat, s = colSums(SummarizedExperiment::assay(example_sce, "counts")), learning_rate = 1e-2, shrinkage = TRUE, verbose = FALSE)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Warning: package ‘SummarizedExperiment’ was built under R version 3.6.1
#> Loading required package: GenomicRanges
#> Warning: package ‘GenomicRanges’ was built under R version 3.6.1
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> #> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:parallel’: #> #> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, #> clusterExport, clusterMap, parApply, parCapply, parLapply, #> parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from ‘package:stats’: #> #> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’: #> #> anyDuplicated, append, as.data.frame, basename, cbind, colnames, #> dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, #> grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, #> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, #> rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, #> union, unique, unsplit, which, which.max, which.min
#> Loading required package: S4Vectors
#> Warning: package ‘S4Vectors’ was built under R version 3.6.1
#> #> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:base’: #> #> expand.grid
#> Loading required package: IRanges
#> Warning: package ‘IRanges’ was built under R version 3.6.1
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor #> #> Vignettes contain introductory material; view with #> 'browseVignettes()'. To cite Bioconductor, see #> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: DelayedArray
#> Loading required package: matrixStats
#> #> Attaching package: ‘matrixStats’
#> The following objects are masked from ‘package:Biobase’: #> #> anyMissing, rowMedians
#> Loading required package: BiocParallel
#> Warning: package ‘BiocParallel’ was built under R version 3.6.1
#> #> Attaching package: ‘DelayedArray’
#> The following objects are masked from ‘package:matrixStats’: #> #> colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
#> The following objects are masked from ‘package:base’: #> #> aperm, apply, rowsum