Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM: Immutable Block #1231

Merged
merged 34 commits into from
Dec 10, 2018
Merged

CRAM: Immutable Block #1231

merged 34 commits into from
Dec 10, 2018

Conversation

jmthibault79
Copy link
Contributor

@jmthibault79 jmthibault79 commented Nov 28, 2018

Description

Refactoring CRAM code for future maintainability

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@@ -169,21 +168,13 @@ private static Slice buildSlice(final List<CramCompressionRecord> records,
writer.writeCramCompressionRecords(records, slice.alignmentStart);

bitOutputStream.close();
slice.coreBlock = Block.buildNewCore(bitBAOS.toByteArray());
slice.coreBlock = Block.buildNewCoreBlock(bitBAOS.toByteArray());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed for consistency with other static methods

switch (method) {
case RAW:
return compressedContent;
case GZIP:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an opportunity for refactoring the compression classes here. Leaving alone for this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely should do that in a subsequent round. Especially since ExternalCompressor already has compress methods for each compression type, and the compress/decompress implementations should live together. Maybe BlockCompressionMethod should just have compress/uncompress methods directly, and this method and ExternalCompressor can go away.

@@ -1,323 +0,0 @@
/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to htsjdk.samtools.cram.structure.block.Block

* limitations under the License.
* ****************************************************************************
*/
package htsjdk.samtools.cram.structure.block;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from package htsjdk.samtools.cram.structure

@@ -0,0 +1,234 @@
/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block is now the abstract base class for the 2 concrete implementations RawBlock and CompressibleBlock

* @param rawContent the uncompressed content of the block
* @return a new file header {@link RawBlock} object
*/
public static RawBlock buildNewFileHeaderBlock(final byte[] rawContent) {
Copy link
Contributor Author

@jmthibault79 jmthibault79 Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternate implementation: FileHeaderBlock, CoreBlock, etc classes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of thinking the same thing, but are there any other differences in their behavior (besides the raw/compressed) type of block and the content type) - would there be other useful methods with different implementations ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do keep these, maybe the subclass constructors should be protected so this is basically the block factory and is the only way to create blocks ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way to make this work well (including the static factory) and I'll be committing in a moment.

@codecov-io
Copy link

codecov-io commented Nov 28, 2018

Codecov Report

Merging #1231 into master will increase coverage by 0.055%.
The diff coverage is 76.282%.

@@               Coverage Diff               @@
##              master     #1231       +/-   ##
===============================================
+ Coverage     69.323%   69.378%   +0.055%     
- Complexity      8105      8107        +2     
===============================================
  Files            543       544        +1     
  Lines          32480     32437       -43     
  Branches        5501      5496        -5     
===============================================
- Hits           22516     22504       -12     
+ Misses          7755      7732       -23     
+ Partials        2209      2201        -8
Impacted Files Coverage Δ Complexity Δ
...java/htsjdk/samtools/cram/structure/Container.java 81.25% <ø> (ø) 7 <0> (ø) ⬇️
.../samtools/cram/compression/ExternalCompressor.java 40% <ø> (ø) 4 <0> (ø) ⬇️
...amtools/cram/encoding/writer/CramRecordWriter.java 89.063% <ø> (ø) 26 <0> (ø) ⬇️
...va/htsjdk/samtools/cram/structure/ContainerIO.java 65.385% <100%> (+4.779%) 13 <0> (ø) ⬇️
...amtools/cram/structure/block/BlockContentType.java 100% <100%> (ø) 4 <1> (?)
...s/cram/structure/block/BlockCompressionMethod.java 100% <100%> (ø) 4 <1> (?)
...k/samtools/cram/structure/block/ExternalBlock.java 100% <100%> (ø) 2 <2> (?)
...c/main/java/htsjdk/samtools/cram/build/CramIO.java 83.721% <100%> (ø) 20 <0> (ø) ⬇️
...ain/java/htsjdk/samtools/cram/structure/Slice.java 44.545% <100%> (ø) 20 <2> (ø) ⬇️
...samtools/cram/compression/ExternalCompression.java 77.273% <33.333%> (-11.299%) 14 <3> (+3)
... and 9 more

@jmthibault79 jmthibault79 added cram Waiting for Review This PR is waiting for a reviewer to respond labels Nov 28, 2018
final byte[] compressedContent) {
super(type);

if (type == BlockContentType.EXTERNAL && contentId == Block.NO_CONTENT_ID) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is causing many test failures - I'm investigating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this check and opened #1232

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few stream of consciousness questions and comments since - see what you think.

switch (method) {
case RAW:
return compressedContent;
case GZIP:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely should do that in a subsequent round. Especially since ExternalCompressor already has compress methods for each compression type, and the compress/decompress implementations should live together. Maybe BlockCompressionMethod should just have compress/uncompress methods directly, and this method and ExternalCompressor can go away.

* @param rawContent the uncompressed content of the block
* @return a new file header {@link RawBlock} object
*/
public static RawBlock buildNewFileHeaderBlock(final byte[] rawContent) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of thinking the same thing, but are there any other differences in their behavior (besides the raw/compressed) type of block and the content type) - would there be other useful methods with different implementations ?

* @param rawContent the uncompressed content of the block
* @return a new file header {@link RawBlock} object
*/
public static RawBlock buildNewFileHeaderBlock(final byte[] rawContent) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do keep these, maybe the subclass constructors should be protected so this is basically the block factory and is the only way to create blocks ?

@cmnbroad
Copy link
Collaborator

cmnbroad commented Dec 3, 2018

@jmthibault79 Was just looking at this again - is it ready for review (wasn't sure if you were still making changes) ?

@jmthibault79
Copy link
Contributor Author

@cmnbroad ready for re-review now, after the subclasses commit

@cmnbroad
Copy link
Collaborator

cmnbroad commented Dec 4, 2018

@jmthibault79 I still feel like we don't have the Block hierarchy factored quite correctly yet. It doesn't seem like any useful polymorphism results from having the class hierarchy based on content type, type casting is still required, and the actual "content type" of a given block is redundantly represented three different ways now (in the class hierarchy itself; in the base class BlockContentType instance field; and in the static BlockContentType constant maintained by each subclass). Since the call sites all seem to know what content type they're reading or writing (i.e., SliceIO knows it wants to extract, say, a Slice header), most of this seems to be about enforcing/validating the compression/content-type combination for each block type.

What do you think about this proposal: go back to just Block (possibly with protected subclass CompressibleBlock since there is some useful polymorphism there wrt/compressed vs. raw content), where Block has factory methods like it does now for creating/validating each of type of block from either raw content or from a stream. The factory methods would enforce or validate the correct combination of "Block" params (compressible, etc.), and possibly substitute Compressible block as appropriate - but the only class type callers would only ever deal with is Block.

Then in subsequent PRs, we can create classes that model things like SliceHeader, with fields and methods specific to it, and constructors for those classes read/write from/to plain old Block. (Edit: or maybe those constructors should take a input stream or raw content, and themselves call the static factories in Block).

@jmthibault79
Copy link
Contributor Author

I see your point about the hierarchy. Here's another attempt, with ExternalDataBlock as the only subclass, because it has the meaningful distinction of being associated with a Content ID.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmthibault79 I think the hierarchy is a lot better now, but I still have some suggestions for potential changes. Hopefully they make sense - if not let me know what you think since I don't want to hold this up indefinitely.

src/main/java/htsjdk/samtools/cram/build/CramIO.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/cram/structure/SliceIO.java Outdated Show resolved Hide resolved
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly there now, a few more minor comments and this should be good to merge.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more suggestions, for either this PR or for the next one, since we've beat this one to a pulp. @lbergelson let us know if you also want to review this.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cmnbroad cmnbroad merged commit 1126e5c into master Dec 10, 2018
@cmnbroad cmnbroad deleted the jt_Block branch December 10, 2018 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cram Waiting for Review This PR is waiting for a reviewer to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants