sci-dash incorrect information #29

AgedMordorBlue · 2024-01-23T15:14:19Z

Hi Job,

I'm getting some weird output in my sci-dash:

The Total Input Reads column adds up to more than the Total input read-pairs
Many cell metrics (mean reads/cell, mean UMI/cell, etc) are about 10 times lower than indicated in the STARsolo summary output.

I went back to the STARsolo summary file, and the values in the sci-dash don't match what is written there. When ooking at these Summary stats, they are much more in line with the sci-dash output of an earlier version of the pipeline. I've added the JSON and the STARsolo Summary.csv content of the same sample below.

Best,
Yani

sci-dash JSON:
"sample_succes": {
"5mm_dsDNAse": {
"n_pairs_success": 373470356,
"sequencing_saturation": 0.751197,
"estimated_cells": 5738,
"total_mapped_reads": 131048010,
"total_unique_reads": 111547480,
"total_multimapped_reads": 19500530,
"total_correct_reads_genes": 90306169,
"total_exonic_reads": 41117200,
"total_intronic_reads": 49189000,
"total_intergenic_reads": 40741810,
"total_mitochondrial_reads": 0,
"total_exonicAS_reads": 2446128,
"total_intronicAS_reads": 7610787,
"mean_reads_per_cell": 2414,
"mean_genes_per_cell": 210,
"mean_umis_per_cell": 376

STARsolo Summary:
Number of Reads,185672402
Reads With Valid Barcodes,1
Sequencing Saturation,0.751197
Q30 Bases in CB+UMI,1
Q30 Bases in RNA read,0.93507
Reads Mapped to Genome: Unique+Multiple,0.705802
Reads Mapped to Genome: Unique,0.600776
Reads Mapped to GeneFull_Ex50pAS: Unique+Multiple GeneFull_Ex50pAS,0.486374
Reads Mapped to GeneFull_Ex50pAS: Unique GeneFull_Ex50pAS,0.441959
Estimated Number of Cells,5738
Unique Reads in Cells Mapped to GeneFull_Ex50pAS,71273128
Fraction of Unique Reads in Cells,0.868553
Mean Reads per Cell,12421
Median Reads per Cell,9947
UMIs in Cells,17618322
Mean UMI per Cell,3070
Median UMI per Cell,2504
Mean GeneFull_Ex50pAS per Cell,1596
Median GeneFull_Ex50pAS per Cell,1442
Total GeneFull_Ex50pAS Detected,20151

STARsolo summary of prior run (I think the switch from GeneFull to GeneFull_Ex50pAS explains the difference between versions):
Number of Reads,190841919
Reads With Valid Barcodes,1
Sequencing Saturation,0.730686
Q30 Bases in CB+UMI,1
Q30 Bases in RNA read,0.934432
Reads Mapped to Genome: Unique+Multiple,0.818547
Reads Mapped to Genome: Unique,0.676697
Reads Mapped to GeneFull: Unique+Multiple GeneFull,0.545113
Reads Mapped to GeneFull: Unique GeneFull,0.483755
Estimated Number of Cells,5758
Unique Reads in Cells Mapped to GeneFull,79965182
Fraction of Unique Reads in Cells,0.866167
Mean Reads per Cell,13887
Median Reads per Cell,11178
UMIs in Cells,21385715
Mean UMI per Cell,3714
Median UMI per Cell,3041
Mean GeneFull per Cell,1792
Median GeneFull per Cell,1629
Total GeneFull Detected,17842

J0bbie · 2024-01-24T13:39:39Z

Hi Yani,

Good catch! It was indeed generating a mean/sum based on all 'raw' cells / ambient RNA (instead of just the filtered cells).
This was throwing the numbers off.

I've fixed this in the latest commit and also made some other small changes to the sci-dash.

Just pull the latest code, delete the sci-dash folder of your run and start the snakemake workflow again. It should re-generate just the sci-dash.

Let me know if this fixed it for you!

Best,

Job

gauravvaidya16 · 2024-01-29T13:23:05Z

Hi Job,

I am facing a similar issue where both the samples have identical stats on the sci-dash but when you look at the STARsolo summary file for the samples they differ. Also the successful read-pairs for the two samples in total are higher than the total input read pairs

Best,
Gaurav

Below are the sci-dash JSON and the StarSolo summaries for each sample:

"sample_succes": {
"Pmor_50percPEG": {
"n_pairs_success": 362401472,
"total_reads": 175944916,
"sequencing_saturation": 0.490957,
"perc_mapped_reads_genome": 0.560275,
"perc_unique_reads_genome_unique": 0.305331,
"perc_mapped_reads_gene": 0.126788,
"perc_unique_reads_gene_unique": 0.105158,
"estimated_cells": 8953,
"mean_reads_per_cell": 1458,
"mean_umi_per_cell": 729,
"mean_genes_per_cell": 560,
"total_exonic_reads": 8109995,
"total_intronic_reads": 7492202,
"total_intergenic_reads": 46385009,
"total_mitochondrial_reads": 0,
"total_exonicAS_reads": 1492351,
"total_intronicAS_reads": 3167266
}

"Pmor": {
  "n_pairs_success": 202924630,
  "total_reads": 175944916,
  "sequencing_saturation": 0.490957,
  "perc_mapped_reads_genome": 0.560275,
  "perc_unique_reads_genome_unique": 0.305331,
  "perc_mapped_reads_gene": 0.126788,
  "perc_unique_reads_gene_unique": 0.105158,
  "estimated_cells": 8953,
  "mean_reads_per_cell": 1458,
  "mean_umi_per_cell": 729,
  "mean_genes_per_cell": 560,
  "total_exonic_reads": 8109995,
  "total_intronic_reads": 7492202,
  "total_intergenic_reads": 46385009,
  "total_mitochondrial_reads": 0,
  "total_exonicAS_reads": 1492351,
  "total_intronicAS_reads": 3167266
}

STARsolo Summary for Pmor_50percPEG

STARsolo Summary for Pmor

J0bbie · 2024-01-31T09:04:13Z

I think I figured it out, it had to due with similar naming schematics and the regular expression used to retrieve the STARSolo files: ad29488

I.e. Pmor / Pmor_50percPEG were getting the wrong statistic files retrieved due to a wildcard search without the species.
Could you try again with the latest code and see if it makes more sense now?

gauravvaidya16 · 2024-02-15T16:04:05Z

Hi Job,

It did fix most of the stats except the successful read-pairs for the two samples in total being higher than the total input read pairs

J0bbie · 2024-02-16T11:46:00Z

That indeed sounds a bit fishy. I'll try to check whether I'm counting some reads double somewhere.
Are you using hashing-barcodes for these samples by chance?

gauravvaidya16 · 2024-02-16T11:47:20Z

No the samples are unhashed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sci-dash incorrect information #29

sci-dash incorrect information #29

AgedMordorBlue commented Jan 23, 2024

J0bbie commented Jan 24, 2024

gauravvaidya16 commented Jan 29, 2024 •

edited

Loading

J0bbie commented Jan 31, 2024

gauravvaidya16 commented Feb 15, 2024

J0bbie commented Feb 16, 2024 •

edited

Loading

gauravvaidya16 commented Feb 16, 2024

sci-dash incorrect information #29

sci-dash incorrect information #29

Comments

AgedMordorBlue commented Jan 23, 2024

J0bbie commented Jan 24, 2024

gauravvaidya16 commented Jan 29, 2024 • edited Loading

J0bbie commented Jan 31, 2024

gauravvaidya16 commented Feb 15, 2024

J0bbie commented Feb 16, 2024 • edited Loading

gauravvaidya16 commented Feb 16, 2024

gauravvaidya16 commented Jan 29, 2024 •

edited

Loading

J0bbie commented Feb 16, 2024 •

edited

Loading