Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More CRAM Encoding tests + updates #1203

Merged
merged 9 commits into from
Nov 6, 2018
Merged

More CRAM Encoding tests + updates #1203

merged 9 commits into from
Nov 6, 2018

Conversation

jmthibault79
Copy link
Contributor

@jmthibault79 jmthibault79 commented Oct 25, 2018

Description

External*CodecTests
ByteArrayLenCodec test
ITF8 and LTF8 updates

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@@ -41,6 +69,48 @@ public static int readUnsignedITF8(final InputStream inputStream) throws IOExcep
return ((b1 & 15) << 28) | inputStream.read() << 20 | inputStream.read() << 12 | inputStream.read() << 4 | (15 & inputStream.read());
}

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up to group the read methods together

* @param value the value to be written out
* @return the bytes holding ITF8 representation of the value
*/
public static byte[] writeUnsignedITF8(final int value) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved down to group the write methods together

/**
* Writes an unsigned (32 bit) integer to an {@link OutputStream} encoded as ITF8. The sign bit is interpreted as a value bit.
*
* @param value the value to be written out
* @param buffer the {@link ByteBuffer} to write to
*/
public static void writeUnsignedITF8(final int value, final ByteBuffer buffer) {
public static int writeUnsignedITF8(final int value, final ByteBuffer buffer) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easy to return the length, so why not

@@ -46,7 +46,7 @@
*/
BYTE_ARRAY_STOP,
/**
* http://en.wikipedia.org/wiki/Beta_Code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different unrelated Beta Code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..good to know....

@codecov-io
Copy link

codecov-io commented Oct 25, 2018

Codecov Report

Merging #1203 into master will increase coverage by 0.018%.
The diff coverage is 93.878%.

@@              Coverage Diff               @@
##              master    #1203       +/-   ##
==============================================
+ Coverage     68.712%   68.73%   +0.018%     
- Complexity      8062     8068        +6     
==============================================
  Files            542      542               
  Lines          32728    32734        +6     
  Branches        5537     5537               
==============================================
+ Hits           22488    22498       +10     
+ Misses          8041     8039        -2     
+ Partials        2199     2197        -2
Impacted Files Coverage Δ Complexity Δ
...ava/htsjdk/samtools/cram/structure/EncodingID.java 100% <ø> (ø) 1 <0> (ø) ⬇️
...ain/java/htsjdk/samtools/cram/io/CramIntArray.java 88.889% <ø> (ø) 4 <0> (ø) ⬇️
...k/samtools/cram/encoding/ExternalIntegerCodec.java 58.333% <0%> (+8.333%) 3 <0> (ø) ⬇️
.../samtools/cram/build/CompressionHeaderFactory.java 90.876% <100%> (ø) 64 <0> (ø) ⬇️
...htsjdk/samtools/cram/encoding/EncodingFactory.java 34.483% <100%> (ø) 10 <0> (ø) ⬇️
...ng/huffman/codec/CanonicalHuffmanByteEncoding.java 43.75% <50%> (ø) 5 <0> (?)
...huffman/codec/CanonicalHuffmanIntegerEncoding.java 85.714% <50%> (ø) 10 <1> (?)
src/main/java/htsjdk/samtools/cram/io/ITF8.java 94.505% <94.366%> (-2.046%) 23 <17> (+1)
src/main/java/htsjdk/samtools/cram/io/LTF8.java 96.032% <96.429%> (-1.509%) 19 <19> (+1)
...va/htsjdk/samtools/sra/SRAIndexedSequenceFile.java 62.162% <0%> (-2.703%) 6% <0%> (-1%)
... and 5 more

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still making my way through this and am not finished yet, just checkpointing my questions so far.

* write out [bits 13-20]
* write out [bits 5-12]
* write out [bits 1-8]
*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc...

*/
public class ITF8 {

public static final int MAX_BYTES = 5;
public static final int MAX_BITS = 8 * MAX_BYTES;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be package-protected (no access modifier) rather than public, since they're only used by tests ? Or are they useful outside of this package ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be useful outside of this? But I have no specific use case, so I can make them pkg-priv


return value;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative factoring to share a single implementation across the different source types that I mentioned when we talked can be seen here. Since you can't use primitive types a generic type param in Java, it requires boxing (on the other hand, small integer values are interned), so I'd be inclined to not actually make this change now since it might have perf implications.

One other interesting issue that I was reminded of when I did this was how the cram code propagates checked exceptions all over the place. Most of the rest of htsdjk wraps checked exceptions in unchecked exceptions so they don't have to be declared and caught everywhere. I'd propose that we start migrating the cram code to the htsjdk pattern, and have these methods catch IOException and re-throw it wrapped in a RuntimeIOException. Then we can start removing all of the throws clauses and try/catch blocks as we move up the stack.

*/
public class LTF8 {

public static final int MAX_BYTES = 9;
public static final int MAX_BITS = 8 * MAX_BYTES;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about package protected here as above.

@@ -46,7 +46,7 @@
*/
BYTE_ARRAY_STOP,
/**
* http://en.wikipedia.org/wiki/Beta_Code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..good to know....

import java.util.HashMap;
import java.util.Map;

public class ByteArrayLenCodecTest extends HtsjdkTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class is called ByteArrayLenCodecTest, but ByteArrayLenCodec is a private class inside of ByteArrayLenEncoding. That makes me think this should be called `ByteArrayLenEncodingTest. But I'm not sure if the tests should change, or the actual classes ? Is this is part of the weirdness you were talking about yesterday ?

bale.fromByteArray(byteArrayOutputStream.toByteArray());
return bale;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmthibault79 It seems like this code is very similar to the actual code in ByteArrayLenEncoding (but its slightly different ?) I wonder if we shouldn't refactor ByteArrayLenEncoding and ByteArrayLenCodec so we can test them directly instead ? Or is that what you're planning to do in the next round ? It almost seems like we should wait for that if its the latter.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - seeing these tests certainly helped me get some insight into the existing class modeling (!) Just a couple of questions and minor comments - the new tests look like a big improvement.

} catch (final IOException e) {
// this should never happened but still:
throw new RuntimeException(e);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay thanks.

public void codecTest(final byte[] values) throws IOException {

try (final ByteArrayOutputStream os = new ByteArrayOutputStream()) {
final BitCodec<byte[]> writeCodec = new ExternalByteArrayCodec(os, null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for my own education, the External in ExternalByteArrayCodec indicates that its writes to external blocks (?), which implies that its byte-oriented as opposed to bit-oriented ? But the class implements BitCodec, so it has to take ByteArrayOutputStream and ByteArrayInputStream as constructor side-inputs since it doesn't consume the BitStreams defined by the interface methods ? Wow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Separating these is one of the goals of an upcoming refactor sweep.

@@ -15,7 +15,7 @@
* @return array of integers from the input stream
* @throws IOException as per java IO contract
*/
public static int[] array(final InputStream inputStream) throws IOException {
public static int[] array(final InputStream inputStream) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc @throws can be removed for these now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks

@@ -32,7 +32,7 @@
* @return the number of bits written out
* @throws IOException as per java IO contract
*/
public static int write(final int[] array, final OutputStream outputStream) throws IOException {
public static int write(final int[] array, final OutputStream outputStream) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

* @return the bytes holding ITF8 representation of the value
*/
public static byte[] writeUnsignedITF8(final int value) {
final ByteBuffer buffer = ByteBuffer.allocate(10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this code was just moved from elsewhere, but is 10 meaningful here ? Although harmless, it seems like random over-allocation by half. Is there any way possible this can use more than the max possible # of bytes in an ITF8 encoding for an int, which is 5?. Can this use the new MAX_BYTES constant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. I'll see how the tests respond.

list.add((byte)(1 << i));
list.add((byte)(-(1 << i) + 2));
list.add((byte)(-(1 << i) + 1));
list.add((byte)-(1 << i));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. On to the refactoring!

@lbergelson lbergelson self-assigned this Nov 5, 2018
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you @jmthibault79 and @cmnbroad

@lbergelson lbergelson merged commit 47bda38 into master Nov 6, 2018
@lbergelson lbergelson deleted the jt_encoding_ut branch November 6, 2018 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants