Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an IntervalCodec that use useful for sorting large sets of #1288

Merged
merged 4 commits into from
Feb 20, 2019

Conversation

nh13
Copy link
Member

@nh13 nh13 commented Feb 16, 2019

intervals.

Description

This will be useful for using with a SortingCollection when we are sorting over MANY intervals. My intention is to submit a PR to Picard to use this codec and SortingCollection when lifting over an interval list, as I sometimes run out of memory with LiftoverIntervalList even with 32GB of memory.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@codecov-io
Copy link

codecov-io commented Feb 16, 2019

Codecov Report

Merging #1288 into master will increase coverage by 0.092%.
The diff coverage is 70.33%.

@@               Coverage Diff               @@
##              master     #1288       +/-   ##
===============================================
+ Coverage     67.495%   67.587%   +0.092%     
- Complexity      8150      8178       +28     
===============================================
  Files            558       561        +3     
  Lines          33364     33409       +45     
  Branches        5608      5611        +3     
===============================================
+ Hits           22519     22580       +61     
+ Misses          8657      8641       -16     
  Partials        2188      2188
Impacted Files Coverage Δ Complexity Δ
...c/main/java/htsjdk/samtools/util/IntervalList.java 73.754% <0%> (+5.318%) 74 <0> (+3) ⬆️
src/main/java/htsjdk/samtools/BAMRecordCodec.java 79.839% <100%> (ø) 25 <0> (ø) ⬇️
...dk/samtools/util/IntervalCoordinateComparator.java 58.333% <58.333%> (ø) 6 <6> (?)
.../main/java/htsjdk/samtools/util/IntervalCodec.java 79.412% <79.412%> (ø) 8 <8> (?)
.../java/htsjdk/samtools/util/IntervalListWriter.java 88% <88%> (ø) 5 <5> (?)
...rc/main/java/htsjdk/samtools/util/BinaryCodec.java 69.725% <0%> (+0.459%) 57% <0%> (+1%) ⬆️
src/main/java/htsjdk/samtools/util/IOUtil.java 57.653% <0%> (+0.51%) 117% <0%> (+2%) ⬆️
... and 3 more

@nh13
Copy link
Member Author

nh13 commented Feb 16, 2019

I may need to also make a IntervalListWriter so we don't need the whole interval list in memory... Sigh...

@nh13 nh13 self-assigned this Feb 16, 2019
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh13 Thank you, this seems useful. I have two very minor comments and then it's good to merge 👍

This may be useful when writing intervals to a file when storing all the
intervals in memory is not possible.
@nh13
Copy link
Member Author

nh13 commented Feb 18, 2019

@lbergelson commit bd872b5 is needed for this PR to be fully useful for the PR in Picard. Thanks for the review!

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh13 A few more comments on the new changes. 👍 when they're resolved.


// Write out the intervals
try {
final IntervalListWriter writer = new IntervalListWriter(file, this.header);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common refrain, why not use a try-with-resources to autoclose this thing?

/** Creates a new writer, writing a header to the file.
* @param file a file to write to. If exists it will be overwritten.
*/
public IntervalListWriter(final File file) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now top new level code can you convert it use Path instead of File.

public class IntervalListWriter implements Closeable {

private final BufferedWriter out;
private final FormatUtil format = new FormatUtil();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just moving existing code, but this use of FormatUtil seems unnecessary. Wouldn't Integer.toString() serve the same purpose with less overhead?


final IntervalList actualList = IntervalList.fromFile(tempFile);

Assert.assertEquals(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should assert that the header is correct as well.

return retval;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an extra }. It's not compiling.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh13 Thanks. Going to merge this and then follow up with a pr that fixes the path issue.

public IntervalListWriter(final File file, final SAMFileHeader header) {
out = IOUtil.openFileForBufferedWriting(file);
public IntervalListWriter(final Path path, final SAMFileHeader header) {
out = IOUtil.openFileForBufferedWriting(path.toFile());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of defeats the purpose since it means that you can't actually use generic Paths, only ones that represent files. I hadn't realized that we were missing path equivalents of this method though. I've opened #1294 and #1296 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants