Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make public SamReader.Type methods to allow other implementations #1144

Merged

Conversation

magicDGS
Copy link
Member

Description

Currently, SamReader cannot be implemented for other "types" of data, unless the getTypemethod returns any of the supported type implementations (SAM/BAM/CRAM/SRA). Other abstractions are impossible to implement unless some of the method of Type are implemented, and this is not possible because they aren't public.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@codecov-io
Copy link

codecov-io commented Jun 11, 2018

Codecov Report

Merging #1144 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##              master     #1144   +/-   ##
===========================================
  Coverage     68.053%   68.053%           
  Complexity      7936      7936           
===========================================
  Files            538       538           
  Lines          32582     32582           
  Branches        5529      5529           
===========================================
  Hits           22173     22173           
  Misses          8188      8188           
  Partials        2221      2221
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/SamReader.java 81.25% <ø> (ø) 0 <0> (ø) ⬇️
...sjdk/samtools/util/Md5CalculatingOutputStream.java 69.231% <0%> (-7.692%) 8% <0%> (-1%)
...samtools/util/AsyncBlockCompressedInputStream.java 76% <0%> (+4%) 13% <0%> (+1%) ⬆️

@nh13
Copy link
Member

nh13 commented Jun 11, 2018

Is there an overwhelming need to support types outside of HtsJdk or should new formats be contributed to HtsJdk? Can you link to where this is being used? Presumably who ever wrote and then the folks who reviewed thought having this be non-public was a good idea and so they should weigh in?

@magicDGS
Copy link
Member Author

The SamReader is a public API, that can be extended by downstream projects, but in practice cannot be implemented unless the new reader leaves in the same package due to this methods being private (SamReader.type() should return a proper type).

My use case is to read a FASTQ file with a SamReader to treat them as unmapped SAM/BAMs. My current solution is a layer on the GATKRead interface from gatk4 - but I am interested in moving it down to the htsjdk SAMRecord implementation, and I would like to use the SamReader interface for integration with methods/classes from htsjdk and gatk4.

If you are interested in the reason for reading FASTQ as a SAM file, I have a software for mangaing sources of reads in a SAM-spec compliant way (ReadTools). Not having the availability to implement custom formats makes difficult to integrate with the library - that's why I am trying to push the htsjdk3 version for so long. I would love to contribute with some SamReader implementations to htsjdk, but I always found difficult to introduce this kind of big changes (e.g., the long-waiting PR for tribble writting support #822). And this stops my development (I've been waiting for some updates to implement properly the long-read read sources such as PacBio HDF5 files, expecting htsjdk3 to move forward). That's why I opt for small compatible changes to the API that allow downstream projects to develop their own readers/writers/factories without intrusion on the htsjdk code.

Finally, API users from SamReader.type() method cannot retrieve any useful information except the fileExtension, which is the only public one. On the other hand, the most useful methods - indexExtension (e.g., to create an index name following conventions for the actual file) and name (for logging) are not public. This change does not open any private method that is important to hide, but it opens the possibility to implement a public interface that otherwise cannot be used.

@magicDGS
Copy link
Member Author

magicDGS commented Jun 11, 2018

Another note is that trying to manage FASTQ records/files as SAM/BAM is something that htsjdk community is not interested in. See my PR #572, where I proposed a Read interface common for SAMRecord and FastqRecord (and even a RawRead interface after comments) - but at the end only the refactoring was accepted.

@yfarjoun
Copy link
Contributor

@nh13 Given that we rejected the other reasonable ways to add readers to htsjdk, this seems like a fine solution while we wait for htsjdk3. Do you know of a specific objection this change, or are you simply concerned that there might be a problem that isn't being thought of?

@nh13
Copy link
Member

nh13 commented Jun 11, 2018

Going from non-public adds support burden to this project, and so each time it is done, the maintainers should weigh-in in my opinion. I was trying to highlight that opinion. But don't let me block this as long as the maintainers are ok with it.

@lbergelson
Copy link
Member

@nh13 These classes are already extensible though, so it doesn't really save much support burden by making the rest of the methods public. I agree that in general we should try to make more things private, but the current API's aren't that well designed for extensibility.

@magicDGS I don't have any objections to this. I would like to revisit the unified Fastq / SAM record unified interface in htsjdk3 because I'm increasingly thinking that it makes sense.

@lbergelson lbergelson merged commit 5eb8ee3 into samtools:master Jun 11, 2018
@nh13
Copy link
Member

nh13 commented Jun 11, 2018

@lbergelson I am not arguing that the burden is too high, just that those that maintain it should weigh-in 👍

@magicDGS magicDGS deleted the dgs_better_SamReader_extensibility branch June 12, 2018 13:54
@magicDGS magicDGS restored the dgs_better_SamReader_extensibility branch June 28, 2018 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants