GroupSingletons behavior #1655

andrewhill157 · 2019-06-07T18:02:42Z

I have two related questions about GroupSingletons:

For some larger datasets we have come across cases where in addition to a reasonable number of singletons there will also be a number of clusters composed of 2 or 3 cells, etc. and these are not easily fixed by adjusting resolution. Ideally, I'd like to merge these into their closest clusters in a manner similar to singletons (preferably at the same time as singletons). Was wondering if you would consider one of two options:

Option 1: Add an extra argument to FindClusters that allows you to retain singleton assignments. This would then allow people to deal with them however they want.
Option 2: Add something like a min.cluster.size argument to FindClusters that would be 2 by default to match current behavior, but would then merge all cells in clusters smaller than this size in the manner currently used to merge singletons. I have tested this out myself and would be happy to submit a PR if it sounds like something you'd consider.

At https://github.com/satijalab/seurat/blob/master/R/clustering.R#L453, presumably you are setting up a new variable, "new.ids", so that you can change singleton ids while maintaining original cluster assignments when assessing connectivity. But then on https://github.com/satijalab/seurat/blob/master/R/clustering.R#L469 you are using "ids" when presumably you mean to use "new.ids". The way it works now it seems like as you process singletons they will be counted towards connectivity calculations for the clusters they join, which could, in theory, results dependent on the ordering of the data. Probably very rarely an issue, but wanted to mention as it seems like it might not have been intentional.

The text was updated successfully, but these errors were encountered:

andrewwbutler · 2019-06-10T17:47:22Z

Hi,

Option 1 has recently been added to the develop branch as a group.singletons parameter which if set to FALSE, will assign all singletons to their own "singleton" cluster. You could then process however you wanted. We would be happy to add a min.cluster.size parameter if you put together a PR.
I believe this was intentional when this was initially implemented several years ago as I was interested in testing out the stability of singleton assignments. I also might have been trying to prevent errors in the case where every cell was a singleton (datasets used to have far fewer cells :) ). However, thinking about this again now, it probably makes more sense to not include the processed singletons in the connectivity calculations of future singletons. That would also allow for a more efficient implementation of that function.

andrewhill157 · 2019-07-05T20:31:59Z

Sorry for the late reply on this and thanks for #1, very helpful! I have an initial version of #2 but am going to be travelling for a few weeks and haven't had a chance to do thorough tests, so will hold off on submitting a PT until sometime after I get back. Thanks for considering! Hopefully be back in touch with PR soon, Andrew

…

On Mon, Jun 10, 2019, 1:47 PM Andrew Butler ***@***.***> wrote: Hi, 1. Option 1 has recently been added to the develop branch as a group.singletons parameter which if set to FALSE, will assign all singletons to their own "singleton" cluster. You could then process however you wanted. We would be happy to add a min.cluster.size parameter if you put together a PR. 2. I believe this was intentional when this was initially implemented several years ago as I was interested in testing out the stability of singleton assignments. I also might have been trying to prevent errors in the case where every cell was a singleton (datasets used to have far fewer cells :) ). However, thinking about this again now, it probably makes more sense to not include the processed singletons in the connectivity calculations of future singletons. That would also allow for a more efficient implementation of that function. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1655?email_source=notifications&email_token=ABER4AYNCBA2BBDV5UAMPKTPZ2HS5A5CNFSM4HVZFSO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXKTLVI#issuecomment-500512213>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABER4A4DGMVK2T2QG6TZPDTPZ2HS5ANCNFSM4HVZFSOQ> .

zacharyzale · 2022-10-13T19:18:22Z

Hey I modified the function as described in this post. Didn't have time to make a formal PR.

  # identify singletons
  singletons <- c()
  singletons <- names(x = which(x = table(ids) <= small_size))
  singletons <- intersect(x = unique(x = ids), singletons)
  #print(singletons)
  if (!group.singletons) {
    ids[which(ids %in% singletons)] <- "singleton"
    return(ids)
  }
  # calculate connectivity of singletons to other clusters, add singleton
  # to cluster it is most connected to
  cluster_names <- as.character(x = unique(x = ids))
  cluster_names <- setdiff(x = cluster_names, y = singletons)
  #cat("cluster_names", cluster_names, "\n")
  connectivity <- vector(mode = "numeric", length = length(x = cluster_names))
  names(x = connectivity) <- cluster_names
  new.ids <- ids
  for (i in singletons) {
    #cat("i single", i, "\n")
    i.cells <- names(ids[ids == i])
    #cat("i cells", i.cells, "\n")
    for (j in cluster_names) {
      #cat("j", j, "\n")
      j.cells <- names(ids[ids ==j])
      #cat("j.cells", j, "\n")
      subSNN <- SNN[i.cells, j.cells]
      set.seed(1) # to match previous behavior, random seed being set in WhichCells
      if (is.object(x = subSNN)) {
        connectivity[j] <- sum(subSNN) / (nrow(x = subSNN) * ncol(x = subSNN))
      } else {
        connectivity[j] <- mean(x = subSNN)
      }
    }
    m <- max(connectivity, na.rm = T)
    mi <- which(x = connectivity == m, arr.ind = TRUE)
    closest_cluster <- sample(x = names(x = connectivity[mi]), 1)
    ids[i.cells] <- closest_cluster
  }
  if (length(x = singletons) > 0 && verbose) {
    message(paste(
      length(x = singletons),
      "singletons identified.",
      length(x = unique(x = ids)),
      "final clusters."
    ))
  }
  return(ids)
}

#example implementation

group_small_out <- GroupSmall_clus(ids = merged_pbmc$seurat_clusters, SNN = merged_pbmc@graphs$harm_wsnn)

mojaveazure closed this as completed Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupSingletons behavior #1655

GroupSingletons behavior #1655

andrewhill157 commented Jun 7, 2019

andrewwbutler commented Jun 10, 2019

andrewhill157 commented Jul 5, 2019 via email

zacharyzale commented Oct 13, 2022

GroupSingletons behavior #1655

GroupSingletons behavior #1655

Comments

andrewhill157 commented Jun 7, 2019

andrewwbutler commented Jun 10, 2019

andrewhill157 commented Jul 5, 2019 via email

zacharyzale commented Oct 13, 2022