Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double reads in BAMInputFormat #206

Open
serash opened this issue Oct 9, 2019 · 0 comments
Open

Double reads in BAMInputFormat #206

serash opened this issue Oct 9, 2019 · 0 comments

Comments

@serash
Copy link

serash commented Oct 9, 2019

I used the BAMInputFormat to read a bam file in Spark:

val tmprdd = sc.newAPIHadoopFile(input, classOf[BAMInputFormat], classOf[LongWritable], classOf[SAMRecordWritable], conf_).map(_._2.get())

And the number of entries isn't correct. In this RDD I had 222 more reads than in the original BAM file. It seems that these reads are all doubles, exactly the same read appears twice in the RDD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant