Deuplicate read IDs in RNA dataset: Bham2_Run #114

hasindu2008 · 2022-08-10T12:26:39Z

In the Bham2_run downloaded from https://s3.amazonaws.com/nanopore-human-wgs/rna/links/NA12878-DirectRNA_All.files.txt, 94143 read IDs appear twice. Is this an anomaly caused during single fast5 to multi fast5, that is same read being packed twice, or are they real separate reads that MinKNOW assigned the same read ID?

hasindu2008 · 2022-08-24T14:01:07Z

@mattloose. A similar duplication is present in two CDNA runs namely, Bham1 and Bham2.

mattloose · 2022-08-24T14:09:14Z

These will not be real separate reads. I presume these are errors in how the run data was originally compiled. @mitenjain did you process these?

Provide feedback