Here scMultiSim method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
estimate_result <- simmethods::scMultiSim_estimation(
ref_data = ref_data,
verbose = T,
seed = 10
)
# Estimating parameters using scMultiSim
# Loading required package: amap
# Computing nearest neighbor graph
# Computing SNN
# Your data has 3 groups
Users can also input the group information of cells:
group <- as.numeric(simmethods::group_condition)
estimate_result <- simmethods::scMultiSim_estimation(
ref_data = ref_data,
other_prior = list(group.condition = group),
verbose = T,
seed = 10
)
# Estimating parameters using scMultiSim
The estimation result contains an object with phylo
data structure:
plot(estimate_result[["estimate_result"]][["phylo"]])
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
simulate_result <- simmethods::scMultiSim_simulation(
parameters = estimate_result$estimate_result,
other_prior = NULL,
return_format = "SCE",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2
# nBatches: 1
# Time spent: 0.17 mins
# Adding experimental noise...
# 50..100..150..Using atac_counts
# Time spent: 0.54 mins
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000 160
head(colData(SCE_result))
# DataFrame with 6 rows and 2 columns
# cell_name group
# <character> <character>
# cell1 cell1 group1
# cell2 cell2 group1
# cell3 cell3 group1
# cell4 cell4 group2
# cell5 cell5 group2
# cell6 cell6 group2
simulate_result <- simmethods::scMultiSim_simulation(
parameters = estimate_result$estimate_result,
other_prior = list(nCells = 500,
nGenes = 1000),
return_format = "SCE",
seed = 111
)
# nCells: 500
# nGenes: 1000
# nGroups: 2
# nBatches: 1
# Time spent: 0.15 mins
# Adding experimental noise...
# 50..100..150..200..250..300..350..400..450..500..Using atac_counts
# Time spent: 0.45 mins
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 1000 500
In scMultiSim, we can not set nGroups
directly and should set prob.group
instead. For example, if we want to simulate 2 groups, we can type other_prior = list(prob.group = c(0.5, 0.5))
. Note that the sum of prob.group
numeric vector must equal to 1, so we can also set prob.group = c(0.3, 0.7)
.
The group number should be equal to that used or detected in the estimation step, otherwise the error may occur.
In addtion, if we want to simulate three or more groups, we should obey the rules:
prob.group
vector must always equal to the number of groups.prob.group
numeric vector must equal to 1.For demonstration, we will simulate two groups using the learned parameters.
simulate_result <- simmethods::scMultiSim_simulation(
parameters = estimate_result$estimate_result,
other_prior = list(prob.group = c(0.4, 0.6)),
return_format = "list",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2
# nBatches: 1
# Time spent: 0.18 mins
# Adding experimental noise...
# 50..100..150..Using atac_counts
# Time spent: 0.56 mins
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$group)
#
# group1 group2
# 64 96
In scMultiSim, we can set nBatches
directly to simulate cell batches.
For demonstration, we will simulate two batches.
simulate_result <- simmethods::scMultiSim_simulation(
parameters = estimate_result$estimate_result,
other_prior = list(nBatches = 2),
return_format = "list",
seed = 111
)
# nCells: 160
# nGenes: 4000
# nGroups: 2
# nBatches: 2
# Time spent: 0.17 mins
# Adding experimental noise...
# 50..100..150..Using atac_counts
# Time spent: 0.55 mins
# Adding batch effects...
## cell information
col_data <- simulate_result[["simulate_result"]][["col_meta"]]
table(col_data$batch)
#
# Batch1 Batch2
# 82 78
The parameter estimation step is the same as that demonstrated above. If you want to simulate datasets with trajectory, you should specify the parameter traj
in other_prior
to TRUE
.
simulate_result <- simmethods::scMultiSim_simulation(
parameters = estimate_result$estimate_result,
other_prior = list(traj = TRUE),
return_format = "list",
seed = 111
)
# Simulating datasets with trajectory.../n
# nCells: 160
# nGenes: 4000
# nGroups: 2
# nBatches: 1
# Time spent: 0.16 mins
# Adding experimental noise...
# 50..100..150..Using atac_counts
# Time spent: 0.99 mins
Before visualization, Make sure that you have already installed several R packages:
if(!requireNamespace("dynwrap", quietly = TRUE)){install.packages("dynwrap")}
if(!requireNamespace("dyndimred", quietly = TRUE)){install.packages("dyndimred")}
if(!requireNamespace("dynplot", quietly = TRUE)){install.packages("dynplot")}
if(!requireNamespace("tislingshot", quietly = TRUE)){devtools::install_github("dynverse/ti_slingshot/package/")}
First we should wrap the data into a standard object:
data <- simulate_result$simulate_result$count_data
dyn_object <- dynwrap::wrap_expression(counts = t(data),
expression = log2(t(data) + 1))
Next, we infer the trajectory using SlingShot which has been proved to be the most best method to do this:
model <- dynwrap::infer_trajectory(dataset = dyn_object,
method = tislingshot::ti_slingshot(),
parameters = NULL,
give_priors = NULL,
seed = 111,
verbose = TRUE)
# Executing 'slingshot' on '20230903_172409__data_wrapper__Wad2hj4sH8'
# With parameters: list(cluster_method = "pam", ndim = 20L, shrink = 1L, reweight = TRUE, reassign = TRUE, thresh = 0.001, maxit = 10L, stretch = 2L, smoother = "smooth.spline", shrink.method = "cosine")
# inputs: expression
# priors :
# Using full covariance matrix
Finally, we can plot the trajectory after performing dimensionality reduction:
dimred <- dyndimred::dimred_umap(dyn_object$expression)
dynplot::plot_dimred(model, dimred = dimred)
# Coloring by milestone
# Using milestone_percentages from trajectory
For more details about trajectory inference and visualization, please check dynverse.