Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on batch removal tutorial #58

Open
Pazuzzilla opened this issue Nov 16, 2021 · 2 comments
Open

Questions on batch removal tutorial #58

Pazuzzilla opened this issue Nov 16, 2021 · 2 comments

Comments

@Pazuzzilla
Copy link

Hi,
I'm running the batch removal tutorial provided in:
https://scgen.readthedocs.io/en/latest/tutorials/scgen_batch_removal.html

My interest is the feature of you're software of producing a corrected expression matrix after the batch removal.
I need some clarifications before moving on my dataset, in the dataset you provided pancreas.h5ad which was load in python as train, i can found:

>>> train AnnData object with n_obs × n_vars = 2448 × 14693 obs: 'n_cells-0', 'n_cells-1', 'n_cells-2', 'n_cells-3' var: 'celltype', 'sample', 'n_genes', 'batch', 'n_counts', 'louvain' uns: 'celltype_colors', 'louvain', 'neighbors', 'pca', 'sample_colors' obsm: 'PCs' varm: 'X_pca', 'X_umap' varp: 'distances', 'connectivities'

i usually don't work with AnnData object, but if i understand well we have 2448 gene expression values over 14693 cells.
In the same object i have:

>>>train.raw.X <14693x24516 sparse matrix of type '<class 'numpy.float32'>' with 55503411 stored elements in Compressed Sparse Row format>

here we have 24516 genes expression value for the 14693 cells.
after the step

corrected_adata = model.batch_removal()

we have the same situation but in corrected_adata.X i have different values with respect to train.X .
So I assume a subsample of the genes was made in the starting dataset and the corrected expression matrix is the one i found in corrected_adata.X, i wonder if this filtering was done for reduce the computational weight only in the tutorial, retaining a subset of significant genes, or because a preprocessing step of this kind is mandatory.

Sorry if it's trivial, but i was not clear to me.
As supplementary comment i want to tell you about the code in the preprocessing step

train = scgen.setup_anndata(train, batch_key="batch", labels_key="cell_type", copy=True)
i obtain the error

Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'scgen' has no attribute 'setup_anndata'

using instead:
train = scgen.SCGEN.setup_anndata(train, batch_key="batch", labels_key="cell_type", copy=True)
i have no error.

It is the correct way to do?

@shangyf-stu
Copy link

Hi, Pazuzzilla
I have the same interest with you! But I couldn't install scGEN smoothly.
I have tried many methods, and the obtained corrected expression matrix is gene filtered. Can you tell me if the genes in the expression matrix obtained by this tool are also filtered?
Thanks a lot of your help

@Pazuzzilla
Copy link
Author

Since i dind't had a reply i can only express my impression, the results are filtered but i still don't undertand under which criteria, since i dind't specify for example an amount of high variable gene to retain. It is not very clear from the example. Also i didn't proceed with this tool at the time, but.. i 'm probably going to use it in a bit, if i will have more information using it i will update this issue with new informations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants