Here PROSSTT method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
estimate_result <- simmethods::PROSSTT_estimation(
ref_data = ref_data,
verbose = T,
seed = 10
)
# Estimating parameters using PROSSTT
# Loading required package: amap
# Computing nearest neighbor graph
# Computing SNN
# Your data has 3 groups
See the result:
estimate_result[["estimate_result"]][["newick_tree"]]
# [1] "(group3:372.624762582395,(group1:82.620878305447,group2:82.620878305447):372.624762582395);"
You can obtain a tree structure of Newick format where cells can be sampled from to generate the datasets with trajectory. Then the hierarchical clustering is used to obtain the relationship between different groups. If no group information is provided like above codes, the groups or clusters are determined by Seurat pipeline.
Users can also input the group information of cells:
group <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::PROSSTT_estimation(
ref_data = ref_data,
other_prior = list(group.condition = group),
verbose = T,
seed = 10
)
# Estimating parameters using PROSSTT
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
simulate_result <- simmethods::PROSSTT_simulation(
parameters = estimate_result[["estimate_result"]],
other_prior = NULL,
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000 160
In PROSSTT, we can set nCells
and nGenes
to specify the number of cells and genes.
Here, we simulate a new dataset with 1000 cells and 1000 genes:
simulate_result <- simmethods::PROSSTT_simulation(
parameters = estimate_result[["estimate_result"]],
return_format = "list",
other_prior = list(nCells = 1000,
nGenes = 1000),
seed = 111
)
# nCells: 1000
# nGenes: 1000
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 1000 1000
Make sure that you have already installed several R packages:
if(!requireNamespace("dynwrap", quietly = TRUE)){install.packages("dynwrap")}
if(!requireNamespace("dyndimred", quietly = TRUE)){install.packages("dyndimred")}
if(!requireNamespace("dynplot", quietly = TRUE)){install.packages("dynplot")}
if(!requireNamespace("tislingshot", quietly = TRUE)){devtools::install_github("dynverse/ti_slingshot/package/")}
First we should wrap the data into a standard object:
dyn_object <- dynwrap::wrap_expression(counts = t(result),
expression = log2(t(result) + 1))
Next, we infer the trajectory using SlingShot which has been proved to be the most best method to do this:
model <- dynwrap::infer_trajectory(dataset = dyn_object,
method = tislingshot::ti_slingshot(),
parameters = NULL,
give_priors = NULL,
seed = 111,
verbose = TRUE)
# Executing 'slingshot' on '20230816_111806__data_wrapper__qIJvL2H1mS'
# With parameters: list(cluster_method = "pam", ndim = 20L, shrink = 1L, reweight = TRUE, reassign = TRUE, thresh = 0.001, maxit = 10L, stretch = 2L, smoother = "smooth.spline", shrink.method = "cosine")
# inputs: expression
# priors :
# Using full covariance matrix
Finally, we can plot the trajectory after performing dimensionality reduction:
dimred <- dyndimred::dimred_umap(dyn_object$expression)
dynplot::plot_dimred(model, dimred = dimred)
# Coloring by milestone
# Using milestone_percentages from trajectory
For more details about trajectory inference and visualization, please check dynverse.