cellassign.Rd
Automatically annotate cells to known types based on the expression patterns of a priori known marker genes.
cellassign(exprs_obj, marker_gene_info, s = NULL, min_delta = 2, X = NULL, B = 10, shrinkage = TRUE, n_batches = 1, dirichlet_concentration = 0.01, rel_tol_adam = 1e-04, rel_tol_em = 1e-04, max_iter_adam = 1e+05, max_iter_em = 20, learning_rate = 0.1, verbose = TRUE, sce_assay = "counts", return_SCE = FALSE, num_runs = 1)
exprs_obj | Either a matrix representing gene
expression counts or a |
---|---|
marker_gene_info | Information relating marker genes to cell types. See details. |
s | Numeric vector of cell size factors |
min_delta | The minimum log fold change a marker gene must be over-expressed by in its cell type |
X | Numeric matrix of external covariates. See details. |
B | Number of bases to use for RBF dispersion function |
shrinkage | Logical - should the delta parameters have hierarchical shrinkage? |
n_batches | Number of data subsample batches to use in inference |
dirichlet_concentration | Dirichlet concentration parameter for cell type abundances |
rel_tol_adam | The change in Q function value (in pct) below which each optimization round is considered converged |
rel_tol_em | The change in log marginal likelihood value (in pct) below which the EM algorithm is considered converged |
max_iter_adam | Maximum number of ADAM iterations to perform in each M-step |
max_iter_em | Maximum number of EM iterations to perform |
learning_rate | Learning rate of ADAM optimization |
verbose | Logical - should running info be printed? |
sce_assay | The |
return_SCE | Logical - should a SingleCellExperiment be returned with the cell type annotations added? See details. |
num_runs | Number of EM optimizations to perform (the one with the maximum log-marginal likelihood value will be used as the final). |
An object of class cellassign
. See details
Input format
exprs_obj
should be either a
SummarizedExperiment
(we recommend the
SingleCellExperiment
package) or a
cell (row) by gene (column) matrix of
raw RNA-seq counts (do not
log-transform or otherwise normalize).
marker_gene_info
should either be
A gene by cell type binary matrix, where a 1 indicates that a gene is a marker for a cell type, and 0 otherwise
A list with names corresponding to cell types, where each entry is a
vector of marker gene names. These are converted to the above matrix using
the marker_list_to_mat
function.
Cell size factors
If the cell size factors s
are
not provided they are computed using the
computeSumFactors
function from
the scran
package.
Covariates
If X
is not NULL
then it should be
an N
by P
matrix
of covariates for N
cells and P
covariates.
Such a matrix would typically
be returned by a call to model.matrix
with no intercept. It is also highly
recommended that any numerical (ie non-factor or one-hot-encoded)
covariates be standardized
to have mean 0 and standard deviation 1.
cellassign
A call to cellassign
returns an object
of class cellassign
. To access the
MLE estimates of cell types, call fit$cell_type
.
To access all MLE parameter
estimates, call fit$mle_params
.
Returning a SingleCellExperiment
If return_SCE
is true, a call to cellassign
will return
the input SingleCellExperiment, with the following added:
A column cellassign_celltype
to colData(sce)
with the MAP
estimate of the cell type
A slot sce@metadata$cellassign
containing the cellassign fit.
Note that a SingleCellExperiment
must be provided as exprs_obj
for this option to be valid.
data(example_sce) data(example_marker_mat) fit <- em_result <- cellassign(example_sce[rownames(example_marker_mat),], marker_gene_info = example_marker_mat, s = colSums(SummarizedExperiment::assay(example_sce, "counts")), learning_rate = 1e-2, shrinkage = TRUE, verbose = FALSE)#>#>#> Warning: package ‘SummarizedExperiment’ was built under R version 3.6.1#>#> Warning: package ‘GenomicRanges’ was built under R version 3.6.1#>#>#>#> #>#>#> #> #> #>#>#> #>#>#> #> #> #> #> #> #>#>#> Warning: package ‘S4Vectors’ was built under R version 3.6.1#> #>#>#> #>#>#> Warning: package ‘IRanges’ was built under R version 3.6.1#>#>#>#> #> #> #>#>#>#> #>#>#> #>#>#> Warning: package ‘BiocParallel’ was built under R version 3.6.1#> #>#>#> #>#>#> #>