cellassign.RdAutomatically annotate cells to known types based on the expression patterns of a priori known marker genes.
cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2, X = NULL, B = 10, shrinkage = TRUE, n_batches = 1, dirichlet_concentration = 0.01, rel_tol_adam = 1e-04, rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20, learning_rate = 0.1, verbose = TRUE, sce_assay = "counts", return_SCE = FALSE, num_runs = 1)
| exprs_obj | Either a matrix representing gene
expression counts or a |
|---|---|
| marker_gene_info | Information relating marker genes to cell types. See details. |
| s | Numeric vector of cell size factors |
| min_delta | The minimum log fold change a marker gene must be over-expressed by in its cell type |
| X | Numeric matrix of external covariates. See details. |
| B | Number of bases to use for RBF dispersion function |
| shrinkage | Logical - should the delta parameters have hierarchical shrinkage? |
| n_batches | Number of data subsample batches to use in inference |
| dirichlet_concentration | Dirichlet concentration parameter for cell type abundances |
| rel_tol_adam | The change in Q function value (in pct) below which each optimization round is considered converged |
| rel_tol_em | The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged |
| max_iter_adam | Maximum number of ADAM iterations to perform in each M-step |
| max_iter_em | Maximum number of EM iterations to perform |
| learning_rate | Learning rate of ADAM optimization |
| verbose | Logical - should running info be printed? |
| sce_assay | The |
| return_SCE | Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details. |
| num_runs | Number of EM optimizations to perform (the one with the maximum log-marginal likelihood value will be used as the final). |
An object of class cellassign. See details
Input format
exprs_obj should be either a
SummarizedExperiment (we recommend the
SingleCellExperiment package) or a
cell (row) by gene (column) matrix of
raw RNA-seq counts (do not
log-transform or otherwise normalize).
marker_gene_info should either be
A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise
A list with names corresponding to cell types, where each entry is a
vector of marker gene names. These are converted to the above matrix using
the marker_list_to_mat function.
Cell size factors
If the cell size factors s are
not provided they are computed using the
computeSumFactors function from
the scran package.
Covariates
If X is not NULL then it should be
an N by P matrix
of covariates for N cells and P covariates.
Such a matrix would typically
be returned by a call to model.matrix
with no intercept. It is also highly
recommended that any numerical (ie non-factor or one-hot-encoded)
covariates be standardized
to have mean 0 and standard deviation 1.
cellassign
A call to cellassign returns an object
of class cellassign. To access the
MLE estimates of cell types, call fit$cell_type.
To access all MLE parameter
estimates, call fit$mle_params.
Returning a SingleCellExperiment
If return_SCE is true, a call to cellassign will return
the input SingleCellExperiment, with the following added:
A column cellassign_celltype to colData(sce) with the MAP
estimate of the cell type
A slot sce@metadata$cellassign containing the cellassign fit.
Note that a SingleCellExperiment must be provided as exprs_obj
for this option to be valid.
data(example_sce) data(example_marker_mat) fit <- em_result <- cellassign(example_sce[rownames(example_marker_mat),], marker_gene_info = example_marker_mat, s = colSums(SummarizedExperiment::assay(example_sce, "counts")), learning_rate = 1e-2, shrinkage = TRUE, verbose = FALSE)#>#>#> Warning: package ‘SummarizedExperiment’ was built under R version 3.6.1#>#> Warning: package ‘GenomicRanges’ was built under R version 3.6.1#>#>#>#> #>#> #> #> #> #>#> #> #>#> #> #> #> #> #> #> #>#>#> Warning: package ‘S4Vectors’ was built under R version 3.6.1#> #>#> #> #>#>#> Warning: package ‘IRanges’ was built under R version 3.6.1#>#>#> #> #> #> #>#>#>#> #>#> #> #>#>#> Warning: package ‘BiocParallel’ was built under R version 3.6.1#> #>#> #> #>#> #> #>