Skip to content
This repository was archived by the owner on Apr 19, 2023. It is now read-only.

[SUGGESTION] Avoid correction of barcode names #288

Closed
cbravo93 opened this issue Jan 18, 2021 · 6 comments
Closed

[SUGGESTION] Avoid correction of barcode names #288

cbravo93 opened this issue Jan 18, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@cbravo93
Copy link

Is your feature request related to a problem? Please describe.
In 10X data, barcode names generally have '-[0-9]' at the end (e.g. ATGCTGCTCTA-1). I noticed that the number is removed in the pipeline, resulting in barcode-sample_id (e.g. ATGCTGCTCTA-Sample_1). However, for downstream analyses, and eventually working with fragments files for the multiome having the initial number is very relevant.

Describe the solution you'd like
Would it be possible to return the cell names as barcode-number-sample_id? E.g ATGCTGCTCTA-1-Sample_1

@cbravo93 cbravo93 added the enhancement New feature or request label Jan 18, 2021
@cbravo93
Copy link
Author

cbravo93 commented Jan 18, 2021

I also found a solution to remove the '-1' from the fragments file; however the fastest I managed was 1 min/file (for average runs with ~5K cells). Also this is a bit risky if having more than a GEM well.

@cflerin
Copy link
Member

cflerin commented Jan 19, 2021

Seems that this is coded here:

adata.obs.index = list(map(lambda x: re.sub(r"([ACGT]*)-.*", rf'\1-{tag}', x), adata.obs.index))

and three entries in the R version:

new.names <- gsub(
pattern = "-([0-9]+)$",
replace = paste0("-", args$`sample_id`),
x = colnames(x = seurat)
)

@dweemx
Copy link
Contributor

dweemx commented Jan 19, 2021

Yes, I added this so that it's easier to identify the cells w/o having to mask them first.
But indeed, we could leave this index in place I guess ?

@cbravo93
Copy link
Author

cbravo93 commented Jan 19, 2021

That would be great (or at least giving it as an option)! I found solutions to work with the fragments file without it, but it slows things significantly: while it is true that normally we work with single GEM wells ('-1'), I can't assume it will always be like this. Keeping the index would make it very straight forward :)

I guess this could also be problematic if you have aggregated runs in the 10x scRNA-seq results, where if removing the '-[0-9]' can result in repeated barcodes? I have some data to test this.

@dweemx
Copy link
Contributor

dweemx commented Jan 20, 2021

@cbravo93 yes indeed would be better and more robust for later. Let's append the sample name to the complete cell barcode.

@dweemx
Copy link
Contributor

dweemx commented Jan 20, 2021

@cbravo93 this is fixed in develop branch. By default now it will append the sample to the complete cell barcode. We still keep the old way by setting a new param remove10xGEMWell in the publish scope of the config.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants