Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating FISHnCHIPs input files: gene-gene correlation matrices from Seurat object #1

Open
evaknichols opened this issue Jun 15, 2024 · 3 comments

Comments

@evaknichols
Copy link

Hello, thank you for developing FISHnCHIPs! It's exactly the type of tool I was hoping for that will help me design effective FISH panels. I really am super excited about this.

So far, I was able to successfully reproduce the environment and run the tutorial code in the Gene Panel Design Demo notebook and I'm ready to try on my own data. I'm hopeful that you might be able to provide code for generating the input files (such as the gene-gene correlation matrices) starting from a Seurat object?

It's something that I should be able to figure out eventually, but since I'm primarily an experimentalist, it will take me some time. Perhaps including this extra code will help out other end users get back to the bench sooner :).

Apologies in advance if I had missed the code nested elsewhere. Meanwhile I will take a stab at it. Thank you!

@Xinrui0523
Copy link

Hello Eva,

Glad to hear that you find it useful for design effective FISH panels.

To generate the gene-gene correlation matrices, you can use the function get_correlation matrix() in ./FISHnCHIPs_GenePanelDesign_Tutorial/scripts/genepaneldesign_python.py script to generate a correlation matrix from a Python dataframe.

If you are working with a Seurat object in R, you can use the following function:

function(data, out.rds.filename = "./correlation_matrix.rds"){
  print("Computing GGC")
  t_data <- Matrix::t(data)
  correlation_matrix <- qlcMatrix::corSparse(X = t_data, Y = NULL, cov = FALSE)
  dimnames(correlation_matrix) <- list(rownames(data), rownames(data))
  print("Done.")

  if(!is.null(out.rds.filename)){
    print(paste0("Saving the correlation matrix at: ", out.rds.filename))
    saveRDS(correlation_matrix, out.rds.filename)
  }

  return(correlation_matrix)
}

Here's an example, assuming you have a Seurat object named seurat_obj:

correlation_matrix <- get_correlation_matrix(seurat_obj@assays[["RNA"]]@data, out.rds.filename = "./correlation_matrix.rds")

We hope this helps you generate the necessary input files and gets you back to the bench sooner.

@evaknichols
Copy link
Author

evaknichols commented Jun 18, 2024

Thank you very much for pointing out the existing function, and for providing an R version. It worked :)

At risk of sounding greedy :), do you have code for how I can generate gcm_cellScaled_gene_centric1.zip (normalized by total counts per cell, multiplied by scale factor), gcm_cellScaled_gene_centric2.zip (log-transformed), and gcm_cellScaled_gene_centric3.zip (z-score normalization), also from a Seurat object?

My attempts, especially for gcm_cellScaled_gene_centric3.zip, is not scaling well for the size of data that I'm working with (24552 genes x 256124 cells).

Once I have these .CSV files, I'll be able to do the gene-centric panel design with my own data all the way through.

Thank you once again!

@Xinrui0523
Copy link

Hi Eva,

The following R code can be used for normalizing large matrix.

# R code for calculating gcm_cellScaled: Normalized by total counts per cell, multiplied by scale factor.
normalizer <- function (A, normFactor=1e4, logNorm=FALSE, pseudocount=1)
{
  A@x <- A@x / rep.int(colSums(A), diff(A@p))*normFactor
  if(logNorm){
    A <- log2(A + pseudocount)
  }  
  return(A)  
}

# Example
counts <- seurat_obj@assays[['RNA']]@counts
cell_scaled_counts <-normalizer(counts, normFactor = 1e4, logNorm = F, pseudocount = 1)
cell_scaled_counts <- scrattch.io::large_matrix_to_dgCMatrix(cell_scaled_counts, chunk_size = 10000)
saveRDS(cell_scaled_counts, paste0("./gcm_cellScaled.rds"))

For generating the z-scaled matrix, you can simply use the ScaleData() function in Seurat, adjusting the block.size to balance the computing speed and memory cost.

gene_list <- rownames(seurat_obj)  # or the genes of interest
seurat_obj<- ScaleData(object = seurat_obj, features = gene_list, scale.max = 10, block.size = 1000, min.cells.to.block = 3000)

The z-scaled data will be available in the scale.data slot gcm_zscaled <- seurat_obj@assays[['RNA']]@scale.data

Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants