How to set n_genes in SCTransform #7998

SihanLiUVa · 2023-11-09T19:16:20Z

Hi! Sorry for double posting the issue in sctransform and here. According to the issue#124 in satijalab/sctransform (satijalab/sctransform#124), I am trying to increase the number of genes in the function SCTransform. Here are my codes:

group1 <- SCTransform(group1_data,vars.to.regress = "nCount_RNA", ncells = 5000, vst.flavor = "v2", return.only.var.genes = FALSE, n_genes = 5000)
I noticed that in the argument of this function, the n_genes is supposed to pass to sctransform::vst, where the n_genes is used. However, i still got this message from my console.
Get Negative Binomial regression parameters per gene
Using 2000 genes

It seems still using default 2000 genes as input. So I wonder how to pass n_genes to vst according to the scTransform document. Thank you so much in advance!

samuel-marsh · 2023-11-09T20:20:45Z

Hi,

Not member of dev team but hopefully can be helpful. Can you provide the output of sessionInfo()?

Using Seurat 5.0.0, SeuratObject5.0.0, and sctransform 0.4.1 I cannot replicate your issue.

> test <- SCTransform(pbmc)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2700 cells

> test <- SCTransform(pbmc, n_genes = 3000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 3000 genes, 2700 cells

> test <- SCTransform(pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 2700 cells

Best,
Sam

SihanLiUVa · 2023-11-09T20:26:12Z

Hey Sam:
Thank you so much for your input! I checked the version of packages and I have Seurat 5.0.0 and scTransform 0.4.1. I will try to generate a output ASAP and see how it works.
Thank you again!

SihanLiUVa · 2023-11-10T15:42:09Z

Hi! I tried the method you mentioned above but for some reason I still cannot change the n_genes. I am also using the same version of packages that you used. Here are my code and output

`> SCTransform(pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2700 cells
Found 71 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 12572 genes
Computing corrected count matrix for 12572 genes
Calculating gene attributes
Wall clock passed: Time difference of 10.35649 secs
Determine variable features
Centering data matrix
|===================================================================================================================================================================| 100%
Set default assay to SCT
An object of class Seurat
26286 features across 2700 samples within 2 assays
Active assay: SCT (12572 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: RNA
There were 24 warnings (use warnings() to see them)`

Would you please give me some suggestions about what could be wrong?

samuel-marsh · 2023-11-10T15:49:02Z

Can you post the output of sessionInfo()?

SihanLiUVa · 2023-11-10T15:50:24Z

Thank you for your quick response! Here is the output:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] patchwork_1.1.3 Matrix_1.6-1 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.3 purrr_1.0.2 readr_2.1.4
[9] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0 Seurat_5.0.0 SeuratObject_5.0.0 sp_2.1-1

loaded via a [1] Rtsne_0.16 [7] GenomicRanges_1.46.1 [13] ggrepel_0.9.4 [19] polyclip_1.10-6 [25] uwot_0.1.16 [31] fastmap_1.1.1 [37] igraph_1.5.1 [43] reshape2_1.4.4 [49] nlme_3.1-163 [55] timechange_0.2.0 [61] future_1.33.0 [67] hms_1.1.3 [73] reticulate_1.34.0 [79] BiocGenerics_0.40.0 [85] glmGamPoi_1.6.0 [91] tidyselect_1.2.0 [97] IRanges_2.28.0 [103] RCurl_1.98-1.13 [109] utf8_1.2.4 [115] digest_0.6.33 namespace (and not attached):
colorspace_2.1-0 deldir_1.0-9 ellipsis_0.3.2 ggridges_0.5.4 XVector_0.34.0
RcppHNSW_0.5.0 rstudioapi_0.15.0 spatstat.data_3.0-3 leiden_0.4.3 listenv_0.9.0
RSpectra_0.16-1 fansi_1.0.5 sparseMatrixStats_1.6.0 codetools_0.2-19 splines_4.1.2
spam_2.10-0 jsonlite_1.8.7 ica_1.0-3 cluster_2.1.4 png_0.1-8
shiny_1.7.5.1 sctransform_0.4.1 spatstat.sparse_3.0-3 compiler_4.1.2 httr_1.4.7
lazyeval_0.2.2 cli_3.6.1 later_1.3.1 htmltools_0.5.7 tools_4.1.2
dotCall64_1.1-0 GenomeInfoDbData_1.2.7 gtable_0.3.4 glue_1.6.2 RANN_2.6.1
Rcpp_1.0.11 Biobase_2.54.0 scattermore_1.2 vctrs_0.6.4 spatstat.explore_3.2-3
progressr_0.14.0 DelayedMatrixStats_1.16.0 lmtest_0.9-40 spatstat.random_3.1-6 globals_0.16.2
mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3 irlba_2.3.5.1 goftest_1.2-3
zlibbioc_1.40.0 MASS_7.3-55 zoo_1.8-12 scales_1.2.1 MatrixGenerics_1.6.0
promises_1.2.1 spatstat.utils_3.0-4 SummarizedExperiment_1.24.0 parallel_4.1.2 RColorBrewer_1.1-3
pbapply_1.7-2 gridExtra_2.3 stringi_1.7.12 S4Vectors_0.32.4 fastDummies_1.7.3
GenomeInfoDb_1.30.1 bitops_1.0-7 rlang_1.1.2 pkgconfig_2.0.3 matrixStats_1.0.0
lattice_0.22-5 ROCR_1.0-11 tensor_1.5 htmlwidgets_1.6.2 cowplot_1.1.1
parallelly_1.36.0 RcppAnnoy_0.0.21 plyr_1.8.9 magrittr_2.0.3 R6_2.5.1
generics_0.1.3 DelayedArray_0.20.0 pillar_1.9.0 withr_2.5.2 fitdistrplus_1.1-11
survival_3.5-7 abind_1.4-5 future.apply_1.11.0 crayon_1.5.2 KernSmooth_2.23-20
spatstat.geom_3.2-7 plotly_4.10.3 tzdb_0.4.0 grid_4.1.2 data.table_1.14.8
xtable_1.8-4 httpuv_1.6.12 stats4_4.1.2 munsell_0.5.0 viridisLite_0.4.2

samuel-marsh · 2023-11-10T15:59:48Z

hmm ok.

What happens if you run this using reproducible object. If you haven't already install seurat-data package (https://github.com/satijalab/seurat-data) and the pbmc3k dataset. Then in fresh R session, run the following code and post output.

library(Seurat)

pbmc <- pbmc3k.SeuratData::pbmc3k
pbmc <- UpdateSeuratObject(pbmc)

test <- SCTransform(object = pbmc, n_genes = 1000)

Best,
Sam

SihanLiUVa · 2023-11-10T16:14:51Z

Thanks! I ran the SCTransform on pbmc object and it worked!

>pbmc <- UpdateSeuratObject(pbmc)
Validating object structure
Updating object slots
Ensuring keys are in the proper structure
Warning: Assay RNA changing from Assay to Assay
Ensuring keys are in the proper structure
Ensuring feature names don't have underscores or pipes
Updating slots in RNA
Validating object structure for Assay ‘RNA’
Object representation is consistent with the most current Seurat version
> SCTransform(pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 2700 cells
Found 34 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 12572 genes
Computing corrected count matrix for 12572 genes
Calculating gene attributes
Wall clock passed: Time difference of 7.973257 secs
Determine variable features
Centering data matrix
  |===================================================================================================================================================================| 100%
Place corrected count matrix in counts slot
Set default assay to SCT
An object of class Seurat 
26286 features across 2700 samples within 2 assays 
Active assay: SCT (12572 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA

But it is interesting that it still not working on my own dataset.

> group1.data<- UpdateSeuratObject(group1.data)
Validating object structure
Updating object slots
Ensuring keys are in the proper structure
Ensuring keys are in the proper structure
Ensuring feature names don't have underscores or pipes
Updating slots in RNA
Validating object structure for Assay5 ‘RNA’
Object representation is consistent with the most current Seurat version
> SCTransform(group1.data, n_genes = 1000)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 21110 by 5615
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
Found 102 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 21110 genes
Computing corrected count matrix for 21110 genes
Calculating gene attributes
Wall clock passed: Time difference of 28.04482 secs
Determine variable features
Centering data matrix
  |===================================================================================================================================================================| 100%
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
Centering data matrix
  |===================================================================================================================================================================| 100%
Finished calculating residuals for counts
Set default assay to SCT
An object of class Seurat 
43492 features across 5615 samples within 2 assays 
Active assay: SCT (21110 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA
There were 24 warnings (use warnings() to see them)

So I wonder if there are something wrong with my dataset?

samuel-marsh · 2023-11-10T16:16:52Z

What happens if you recreate your object from scratch and then run SCTransform?

SihanLiUVa · 2023-11-10T16:20:43Z

Thank you for your help! It still doesn't work if I recreate the object. ;(

samuel-marsh · 2023-11-10T16:23:27Z

Hmmmm not sure then. Hopefully someone from Seurat team might be able to figure this one out. I’ll keep thinking but not sure why it’s not working.

SihanLiUVa · 2023-11-10T16:24:15Z

Again! Thank you so much for your help! :D

SihanLiUVa · 2023-11-10T16:43:39Z

Hi Sam:
I figure out the problem. It seems to be the n_genes can be set to an object with Class=Assay rather than Assay. So after I used this line:

options(Seurat.object.assay.version = "v3")
group1.data <- CreateSeuratObject(Read10X(data.dir = "../Raw_data/1_3_filtered_feature_bc_matrix"))

Then the SCTransform gives me correct result:

> SCTransform(group1.data, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 21110 by 5615
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 5000 cells
Found 45 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 21110 genes
Computing corrected count matrix for 21110 genes
Calculating gene attributes
Wall clock passed: Time difference of 23.46258 secs
Determine variable features
Centering data matrix
  |===================================================================================================================================================================| 100%
Place corrected count matrix in counts slot
Set default assay to SCT
An object of class Seurat 
53395 features across 5615 samples within 2 assays 
Active assay: SCT (21110 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: RNA
There were 12 warnings (use warnings() to see them)

SihanLiUVa · 2023-11-10T16:44:01Z

Thank you so much, Sam!

samuel-marsh · 2023-11-10T16:53:04Z

Actually wait sorry my brain was playing tricks. It looks like from your output it worked in latest version:

SCTransform(group1.data, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 21110 by 5615
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 5000 cells
…

SihanLiUVa · 2023-11-10T16:56:08Z

Yeah you are right. When I create a object without 'options(Seurat.object.assay.version = "v3")', then the class will be 'Assay5' and the n_genes cannot be changed in SCTransform.
But if I create the object with v3, then the class is changed to 'Assay' and now it is compatible with n_genes.

samuel-marsh · 2023-11-10T17:05:08Z

Ok well we found issue now :). @saketkc hope you don't mind me tagging you here since I've seen you handle most sctransform related issues.

There is seems to be failure to pass n_genes when using V5 structure.
Reproducible example here:

library(Seurat)

> options(Seurat.object.assay.version = "v3")
> pbmc <- pbmc3k.SeuratData::pbmc3k
> pbmc <- CreateSeuratObject(counts = pbmc@assays$RNA$counts)
> test <- SCTransform(object = pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 1000 genes, 2700 cells


> options(Seurat.object.assay.version = "v5")
> pbmc <- pbmc3k.SeuratData::pbmc3k
> pbmc <- CreateSeuratObject(counts = pbmc@assays$RNA$counts)
> test <- SCTransform(object = pbmc, n_genes = 1000)
Running SCTransform on assay: RNA
Running SCTransform on layer: counts
vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
Variance stabilizing transformation of count matrix of size 12572 by 2700
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2700 cells

Best,
Sam

saketkc · 2023-11-10T17:10:26Z

Thanks for flagging it both! This is because the extra arguments currently not being passed internally (...) in StdAssay. I will fix it in v5.0.1.

samuel-marsh · 2023-11-10T17:11:47Z

Awesome. Thanks as always!!!

saketkc · 2023-11-10T20:57:51Z

Hi @SihanLiUVa and @samuel-marsh , thanks once again for noticing this. It should be fixed by e2a956a (currently in the develop branch).

Please let me know if you notice any other issues. Thanks!

saketkc mentioned this issue Nov 10, 2023

How to set n_genes in SCTransform satijalab/sctransform#173

Closed

saketkc self-assigned this Nov 10, 2023

saketkc closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set n_genes in SCTransform #7998

How to set n_genes in SCTransform #7998

SihanLiUVa commented Nov 9, 2023

samuel-marsh commented Nov 9, 2023

SihanLiUVa commented Nov 9, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

saketkc commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

saketkc commented Nov 10, 2023

How to set n_genes in SCTransform #7998

How to set n_genes in SCTransform #7998

Comments

SihanLiUVa commented Nov 9, 2023

samuel-marsh commented Nov 9, 2023

SihanLiUVa commented Nov 9, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

SihanLiUVa commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

saketkc commented Nov 10, 2023

samuel-marsh commented Nov 10, 2023

saketkc commented Nov 10, 2023