Automatically annotate cells to known types based on the expression patterns of a priori known marker genes.

cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2,
  X = NULL, B = 10, shrinkage = TRUE, n_batches = 1,
  dirichlet_concentration = 0.01, rel_tol_adam = 1e-04,
  rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20,
  learning_rate = 0.1, verbose = TRUE, sce_assay = "counts",
  return_SCE = FALSE, num_runs = 1)



Either a matrix representing gene expression counts or a SummarizedExperiment. See details.


Information relating marker genes to cell types. See details.


Numeric vector of cell size factors


The minimum log fold change a marker gene must be over-expressed by in its cell type


Numeric matrix of external covariates. See details.


Number of bases to use for RBF dispersion function


Logical - should the delta parameters have hierarchical shrinkage?


Number of data subsample batches to use in inference


Dirichlet concentration parameter for cell type abundances


The change in Q function value (in pct) below which each optimization round is considered converged


The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged


Maximum number of ADAM iterations to perform in each M-step


Maximum number of EM iterations to perform


Learning rate of ADAM optimization


Logical - should running info be printed?


The assay from the input#' SingleCellExperiment to use: this assay should always represent raw counts.


Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details.


Number of EM optimizations to perform (the one with the maximum log-marginal likelihood value will be used as the final).


An object of class cellassign. See details


Input format exprs_obj should be either a SummarizedExperiment (we recommend the SingleCellExperiment package) or a cell (row) by gene (column) matrix of raw RNA-seq counts (do not log-transform or otherwise normalize).

marker_gene_info should either be

  • A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise

  • A list with names corresponding to cell types, where each entry is a vector of marker gene names. These are converted to the above matrix using the marker_list_to_mat function.

Cell size factors If the cell size factors s are not provided they are computed using the computeSumFactors function from the scran package.

Covariates If X is not NULL then it should be an N by P matrix of covariates for N cells and P covariates. Such a matrix would typically be returned by a call to model.matrix with no intercept. It is also highly recommended that any numerical (ie non-factor or one-hot-encoded) covariates be standardized to have mean 0 and standard deviation 1.

cellassign A call to cellassign returns an object of class cellassign. To access the MLE estimates of cell types, call fit$cell_type. To access all MLE parameter estimates, call fit$mle_params.

Returning a SingleCellExperiment

If return_SCE is true, a call to cellassign will return the input SingleCellExperiment, with the following added:

  • A column cellassign_celltype to colData(sce) with the MAP estimate of the cell type

  • A slot sce@metadata$cellassign containing the cellassign fit. Note that a SingleCellExperiment must be provided as exprs_obj for this option to be valid.


data(example_sce) data(example_marker_mat) fit <- em_result <- cellassign(example_sce[rownames(example_marker_mat),], marker_gene_info = example_marker_mat, s = colSums(SummarizedExperiment::assay(example_sce, "counts")), learning_rate = 1e-2, shrinkage = TRUE, verbose = FALSE)
