Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split does not split counts in two layers #9635

Open
AndrewOkin opened this issue Jan 22, 2025 · 6 comments
Open

Split does not split counts in two layers #9635

AndrewOkin opened this issue Jan 22, 2025 · 6 comments

Comments

@AndrewOkin
Copy link

AndrewOkin commented Jan 22, 2025

Hello everyone,

I am trying to follow the vignette in https://satijalab.org/seurat/articles/seurat5_integration to integrate two datasets. However I notice that there is an issue when using the split function.
I run the function as follows:
combined_seurat[["counts"]] <- split(combined_seurat[["counts"]] ,f = combined_seurat $dataset)
However I noticed that despite the seurat object being split into two based on the label in "dataset", as follows:

An object of class Seurat 
10494 features across 16711 samples within 1 assay 
Active assay: counts (10494 features, 0 variable features)
 2 layers present: counts.mouse1, counts.mouse2

When I check the counts in the two layers, it looks like the counts have been transferred to the second dataset but not the first one:

> sum(combined_seurat[["counts"]]$counts.mouse1)
[1] 0
> sum(combined_seurat[["counts"]]$counts.mouse2)
[1] 11310227

I also tried to use the SplitObject function in R and then merge the two objects later, but I get the same issue. Does anyone have any suggestions?

Below is how I create the Seurat object before the split function:

# Load Seurat object
combined_seurat<- Connect("C:/Users/data/combined_seurat.h5seurat", mode = "r")
metadata <- h5read("C:/Users/data/combined_seurat.h5seurat", 
                   "/meta.data")

# Extract the counts data
counts_data <- combined_seurat[["assays"]][["counts"]][["data"]]

# Extract the 'dataset' metadata from /meta.data
dataset_info <- combined_seurat[["meta.data"]][["dataset"]]

# Convert counts_data into a matrix
counts_matrix <- as.matrix(counts_data)

# Create a Seurat object
combined_seurat<- CreateSeuratObject(counts = counts_matrix, assay = "counts")

# Extract cell names from Seurat object
# cell_names <- colnames(combined_seurat)
cell_names <- h5read("C:/Users/data/combined_seurat.h5seurat", 
                     "/cell.names")
colnames(combined_seurat) <- cell_names


# Extract dataset info and ensure it's named according to cell names
dataset_info <- metadata[["dataset"]]
dataset_categories <- dataset_info$categories[dataset_info$codes + 1]
names(dataset_categories) <- cell_names  # Assign cell names as names of the vector


# Add dataset metadata to Seurat object
combined_seurat$dataset <- dataset_categories

@samuel-marsh
Copy link
Collaborator

Hi,

Not member of dev team but hopefully can be helpful. Can you please post the output of:

table([email protected][, “dataset”])

Best,
Sam

@AndrewOkin
Copy link
Author

Thank you for the quick reply. I get as exptected:

> table([email protected][, "dataset"])

 mouse1 mouse2 
 4491      12220 

So the cells are split as expected, but I think the raw counts aren't...

@samuel-marsh
Copy link
Collaborator

Hi,

Ok that’s good. What happens if you split on assay “RNA” as is done in the tutorial?

Best,
Sam

@samuel-marsh
Copy link
Collaborator

samuel-marsh commented Jan 22, 2025

I’m wondering if there is potential issue with naming your assay “counts” and having layer named counts. If you recreate the object without setting the assay name (it will default to RNA) does that solve things?

@AndrewOkin
Copy link
Author

Okay,

So I tried to not specify the assay in CreateSeuratObject:
combined_seurat<- CreateSeuratObject(counts = counts_matrix)

And then split according to RNA:

combined_seurat[["RNA"]] <- split(combined_seurat[["RNA"]],
                                              f = combined_seurat$dataset)

Now though neither mouse1, or mouse 2 have any raw counts:

> sum(combined_seurat[["nCount_RNA"]]$nCount_RNA.mouse1)
[1] 0
> sum(combined_seurat[["nCount_RNA"]]$nCount_RNA.mouse2)
[1] 0

But again the cells are split appropriately:

> table([email protected][, "dataset"])

 mouse1 mouse2 
 4491     12220 

@samuel-marsh
Copy link
Collaborator

Hi,

So your check for raw counts in that code should not result in anything. The meta data (containing nCount_RNA) does not get split and stays at the level of the entire object.

Here is example to check raw counts. I picked ACTB as gene for this example because it's just a pretty much universally expressed gene but doesn't have to be that. If you try this code with your object (change to "Actb" for mouse gene symbol does that work?

library(tidyverse)
#> Warning: package 'lubridate' was built under R version 4.4.1
library(Seurat)
#> Loading required package: SeuratObject
#> Loading required package: sp
#> 'SeuratObject' was built with package 'Matrix' 1.7.0 but the current
#> version is 1.7.1; it is recomended that you reinstall 'SeuratObject' as
#> the ABI for 'Matrix' may have changed
#> 
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, t

pbmc <- pbmc3k.SeuratData::pbmc3k
pbmc <- UpdateSeuratObject(pbmc)
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Warning: Assay RNA changing from Assay to Assay
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Validating object structure for Assay 'RNA'
#> Object representation is consistent with the most current Seurat version


pbmc$sample_id <- sample(c("sample1", "sample2"), size = ncol(pbmc),
                         replace = TRUE)

table(pbmc@meta.data[, "sample_id"])
#> 
#> sample1 sample2 
#>    1341    1359


FetchData(object = pbmc, vars = c("ACTB", "sample_id"), layer = "counts") %>% 
  filter(sample_id == "sample1") %>% 
  select(ACTB) %>% 
  colSums()
#>  ACTB 
#> 22587

FetchData(object = pbmc, vars = c("ACTB", "sample_id"), layer = "counts") %>% 
  filter(sample_id == "sample2") %>% 
  select(ACTB) %>% 
  colSums()
#>  ACTB 
#> 24776

Created on 2025-01-23 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       macOS Monterey 12.7.5
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-01-23
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package           * version date (UTC) lib source
#>  abind               1.4-8   2024-09-12 [1] CRAN (R 4.4.1)
#>  cli                 3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
#>  cluster             2.1.8   2024-12-11 [1] CRAN (R 4.4.1)
#>  codetools           0.2-20  2024-03-31 [1] CRAN (R 4.4.0)
#>  colorspace          2.1-1   2024-07-26 [1] CRAN (R 4.4.0)
#>  cowplot             1.1.3   2024-01-22 [1] CRAN (R 4.4.0)
#>  data.table          1.16.4  2024-12-06 [1] CRAN (R 4.4.1)
#>  deldir              2.0-4   2024-02-28 [1] CRAN (R 4.4.0)
#>  digest              0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dotCall64           1.2     2024-10-04 [1] CRAN (R 4.4.1)
#>  dplyr             * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate            1.0.3   2025-01-10 [1] CRAN (R 4.4.1)
#>  farver              2.1.2   2024-05-13 [1] CRAN (R 4.4.0)
#>  fastDummies         1.7.4   2024-08-16 [1] CRAN (R 4.4.1)
#>  fastmap             1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fitdistrplus        1.2-2   2025-01-07 [1] CRAN (R 4.4.1)
#>  forcats           * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
#>  fs                  1.6.5   2024-10-30 [1] CRAN (R 4.4.1)
#>  future              1.34.0  2024-07-29 [1] CRAN (R 4.4.0)
#>  future.apply        1.11.3  2024-10-27 [1] CRAN (R 4.4.1)
#>  generics            0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
#>  ggplot2           * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
#>  ggrepel             0.9.6   2024-09-07 [1] CRAN (R 4.4.1)
#>  ggridges            0.5.6   2024-01-23 [1] CRAN (R 4.4.0)
#>  globals             0.16.3  2024-03-08 [1] CRAN (R 4.4.0)
#>  glue                1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  goftest             1.2-3   2021-10-07 [1] CRAN (R 4.4.0)
#>  gridExtra           2.3     2017-09-09 [1] CRAN (R 4.4.0)
#>  gtable              0.3.6   2024-10-25 [1] CRAN (R 4.4.1)
#>  hms                 1.1.3   2023-03-21 [1] CRAN (R 4.4.0)
#>  htmltools           0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  htmlwidgets         1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
#>  httpuv              1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
#>  httr                1.4.7   2023-08-15 [1] CRAN (R 4.4.0)
#>  ica                 1.0-3   2022-07-08 [1] CRAN (R 4.4.0)
#>  igraph              2.1.2   2024-12-07 [1] CRAN (R 4.4.1)
#>  irlba               2.3.5.1 2022-10-03 [1] CRAN (R 4.4.0)
#>  jsonlite            1.8.9   2024-09-20 [1] CRAN (R 4.4.1)
#>  KernSmooth          2.23-26 2025-01-01 [1] CRAN (R 4.4.1)
#>  knitr               1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  later               1.4.1   2024-11-27 [1] CRAN (R 4.4.1)
#>  lattice             0.22-6  2024-03-20 [1] CRAN (R 4.4.0)
#>  lazyeval            0.2.2   2019-03-15 [1] CRAN (R 4.4.0)
#>  lifecycle           1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  listenv             0.9.1   2024-01-29 [1] CRAN (R 4.4.0)
#>  lmtest              0.9-40  2022-03-21 [1] CRAN (R 4.4.0)
#>  lubridate         * 1.9.4   2024-12-08 [1] CRAN (R 4.4.1)
#>  magrittr            2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  MASS                7.3-64  2025-01-04 [1] CRAN (R 4.4.1)
#>  Matrix              1.7-1   2024-10-18 [1] CRAN (R 4.4.1)
#>  matrixStats         1.5.0   2025-01-07 [1] CRAN (R 4.4.1)
#>  mime                0.12    2021-09-28 [1] CRAN (R 4.4.0)
#>  miniUI              0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
#>  munsell             0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
#>  nlme                3.1-166 2024-08-14 [1] CRAN (R 4.4.1)
#>  parallelly          1.41.0  2024-12-18 [1] CRAN (R 4.4.1)
#>  patchwork           1.3.0   2024-09-16 [1] CRAN (R 4.4.1)
#>  pbapply             1.7-2   2023-06-27 [1] CRAN (R 4.4.0)
#>  pbmc3k.SeuratData   3.1.4   2024-05-01 [1] local
#>  pillar              1.10.1  2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig           2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
#>  plotly              4.10.4  2024-01-13 [1] CRAN (R 4.4.0)
#>  plyr                1.8.9   2023-10-02 [1] CRAN (R 4.4.0)
#>  png                 0.1-8   2022-11-29 [1] CRAN (R 4.4.0)
#>  polyclip            1.10-7  2024-07-23 [1] CRAN (R 4.4.0)
#>  progressr           0.15.1  2024-11-22 [1] CRAN (R 4.4.1)
#>  promises            1.3.2   2024-11-28 [1] CRAN (R 4.4.1)
#>  purrr             * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
#>  R6                  2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
#>  RANN                2.6.2   2024-08-25 [1] CRAN (R 4.4.1)
#>  RColorBrewer        1.1-3   2022-04-03 [1] CRAN (R 4.4.0)
#>  Rcpp                1.0.14  2025-01-12 [1] CRAN (R 4.4.1)
#>  RcppAnnoy           0.0.22  2024-01-23 [1] CRAN (R 4.4.0)
#>  RcppHNSW            0.6.0   2024-02-04 [1] CRAN (R 4.4.0)
#>  readr             * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
#>  reprex              2.1.1   2024-07-06 [1] CRAN (R 4.4.0)
#>  reshape2            1.4.4   2020-04-09 [1] CRAN (R 4.4.0)
#>  reticulate          1.40.0  2024-11-15 [1] CRAN (R 4.4.1)
#>  rlang               1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown           2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  ROCR                1.0-11  2020-05-02 [1] CRAN (R 4.4.0)
#>  RSpectra            0.16-2  2024-07-18 [1] CRAN (R 4.4.0)
#>  rstudioapi          0.17.1  2024-10-22 [1] CRAN (R 4.4.1)
#>  Rtsne               0.17    2023-12-07 [1] CRAN (R 4.4.0)
#>  scales              1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
#>  scattermore         1.2     2023-06-12 [1] CRAN (R 4.4.0)
#>  sctransform         0.4.1   2023-10-19 [1] CRAN (R 4.4.0)
#>  sessioninfo         1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  Seurat            * 5.2.0   2025-01-13 [1] CRAN (R 4.4.0)
#>  SeuratObject      * 5.0.2   2024-05-08 [1] CRAN (R 4.4.0)
#>  shiny               1.10.0  2024-12-14 [1] CRAN (R 4.4.1)
#>  sp                * 2.1-4   2024-04-30 [1] CRAN (R 4.4.0)
#>  spam                2.11-0  2024-10-03 [1] CRAN (R 4.4.1)
#>  spatstat.data       3.1-4   2024-11-15 [1] CRAN (R 4.4.1)
#>  spatstat.explore    3.3-4   2025-01-08 [1] CRAN (R 4.4.1)
#>  spatstat.geom       3.3-4   2024-11-18 [1] CRAN (R 4.4.1)
#>  spatstat.random     3.3-2   2024-09-18 [1] CRAN (R 4.4.1)
#>  spatstat.sparse     3.1-0   2024-06-21 [1] CRAN (R 4.4.0)
#>  spatstat.univar     3.1-1   2024-11-05 [1] CRAN (R 4.4.1)
#>  spatstat.utils      3.1-2   2025-01-08 [1] CRAN (R 4.4.1)
#>  stringi             1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
#>  stringr           * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
#>  survival            3.8-3   2024-12-17 [1] CRAN (R 4.4.1)
#>  tensor              1.5     2012-05-05 [1] CRAN (R 4.4.0)
#>  tibble            * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr             * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect          1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  tidyverse         * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
#>  timechange          0.3.0   2024-01-18 [1] CRAN (R 4.4.0)
#>  tzdb                0.4.0   2023-05-12 [1] CRAN (R 4.4.0)
#>  uwot                0.2.2   2024-04-21 [1] CRAN (R 4.4.0)
#>  vctrs               0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite         0.4.2   2023-05-02 [1] CRAN (R 4.4.0)
#>  withr               3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun                0.50    2025-01-07 [1] CRAN (R 4.4.1)
#>  xtable              1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
#>  yaml                2.3.10  2024-07-26 [1] CRAN (R 4.4.0)
#>  zoo                 1.8-12  2023-04-13 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Best,
Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants