[SUGGESTION] Avoid correction of barcode names #288

cbravo93 · 2021-01-18T19:12:19Z

Is your feature request related to a problem? Please describe.
In 10X data, barcode names generally have '-[0-9]' at the end (e.g. ATGCTGCTCTA-1). I noticed that the number is removed in the pipeline, resulting in barcode-sample_id (e.g. ATGCTGCTCTA-Sample_1). However, for downstream analyses, and eventually working with fragments files for the multiome having the initial number is very relevant.

Describe the solution you'd like
Would it be possible to return the cell names as barcode-number-sample_id? E.g ATGCTGCTCTA-1-Sample_1

cbravo93 · 2021-01-18T19:47:43Z

I also found a solution to remove the '-1' from the fragments file; however the fastest I managed was 1 min/file (for average runs with ~5K cells). Also this is a bit risky if having more than a GEM well.

cflerin · 2021-01-19T09:56:06Z

Seems that this is coded here:

vsn-pipelines/src/utils/bin/sc_file_converter.py

Line 130 in 91e5724

    
           adata.obs.index = list(map(lambda x: re.sub(r"([ACGT]*)-.*", rf'\1-{tag}', x), adata.obs.index))

and three entries in the R version:

vsn-pipelines/src/utils/bin/sc_file_converter.R

Lines 123 to 127 in 91e5724

    
           new.names <- gsub( 
        
           	pattern = "-([0-9]+)$", 
        
           	replace = paste0("-", args$`sample_id`), 
        
           	x = colnames(x = seurat) 
        
           )

dweemx · 2021-01-19T10:14:00Z

Yes, I added this so that it's easier to identify the cells w/o having to mask them first.
But indeed, we could leave this index in place I guess ?

cbravo93 · 2021-01-19T10:44:47Z

That would be great (or at least giving it as an option)! I found solutions to work with the fragments file without it, but it slows things significantly: while it is true that normally we work with single GEM wells ('-1'), I can't assume it will always be like this. Keeping the index would make it very straight forward :)

I guess this could also be problematic if you have aggregated runs in the 10x scRNA-seq results, where if removing the '-[0-9]' can result in repeated barcodes? I have some data to test this.

dweemx · 2021-01-20T14:42:21Z

@cbravo93 yes indeed would be better and more robust for later. Let's append the sample name to the complete cell barcode.

dweemx · 2021-01-20T23:09:39Z

@cbravo93 this is fixed in develop branch. By default now it will append the sample to the complete cell barcode. We still keep the old way by setting a new param remove10xGEMWell in the publish scope of the config.

cbravo93 added the enhancement New feature or request label Jan 18, 2021

dweemx added a commit that referenced this issue Jan 20, 2021

Fix #288 for R converter

659cf51

dweemx mentioned this issue Jan 20, 2021

Fix/288 cellbarcode sample suffix #290

Merged

cflerin mentioned this issue Jan 26, 2021

Develop for v0.25.0 #293

Merged

cflerin closed this as completed in 795d2c2 Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUGGESTION] Avoid correction of barcode names #288

[SUGGESTION] Avoid correction of barcode names #288

cbravo93 commented Jan 18, 2021

cbravo93 commented Jan 18, 2021 •

edited

Loading

cflerin commented Jan 19, 2021

dweemx commented Jan 19, 2021 •

edited

Loading

cbravo93 commented Jan 19, 2021 •

edited

Loading

dweemx commented Jan 20, 2021

dweemx commented Jan 20, 2021

[SUGGESTION] Avoid correction of barcode names #288

[SUGGESTION] Avoid correction of barcode names #288

Comments

cbravo93 commented Jan 18, 2021

cbravo93 commented Jan 18, 2021 • edited Loading

cflerin commented Jan 19, 2021

dweemx commented Jan 19, 2021 • edited Loading

cbravo93 commented Jan 19, 2021 • edited Loading

dweemx commented Jan 20, 2021

dweemx commented Jan 20, 2021

cbravo93 commented Jan 18, 2021 •

edited

Loading

dweemx commented Jan 19, 2021 •

edited

Loading

cbravo93 commented Jan 19, 2021 •

edited

Loading