Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit Tests and fixes for a few classes in samtools.cram.io #1198

Merged
merged 7 commits into from
Oct 19, 2018

Conversation

jmthibault79
Copy link
Contributor

@jmthibault79 jmthibault79 commented Oct 16, 2018

Description

Increasing test coverage for the samtools.cram.io package, and fixes to bugs found by these tests

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

- add IOTestCases and CramIntTest
- add 64 bit CramInt
- byte[] and ByteBuffer tests and fixes
- add CramLong and CramLongTest
@@ -40,7 +40,10 @@ public static int int32(final byte[] data) {
* @return an integer value read from the buffer
*/
public static int int32(final ByteBuffer buffer) {
return buffer.get() | buffer.get() << 8 | buffer.get() << 16 | buffer.get() << 24;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shifts were being attempted in a byte context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test case that fails without this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does my 0xff comment work for you on this point?

@@ -7,7 +7,7 @@
/**
* Methods to read and write CRAM array of integers data type.
*/
public class CramArray {
public class CramIntArray {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed because this gave me the impression of a generic array type, though it's limited to integer. I can revert this if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

* Methods to read and write CRAM long values as given in the file format specification.
*/

public class CramLong {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All methods correspond to those in CramInt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it looks like longs are a valid data type in the spec, but are unused by the implementation ? Its strange that we've never encountered a problem decoding CRAMs produced by other software, unless they don't use it either. It may be that longs are always written as LTF8 ? I think we should try to resolve that before we commit code and tests that aren't used anywhere. If we are missing functionality, we can keep this and then create a ticket that we need to hook it up. Do you know if/how htslib uses these ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through cram_io.h in htslib, they only seem to use ITF8, LFT8, and int32.

CramInt in htsjdk is only used for Block checksums and Container Header sizes and checksums. Most other such values are ITF8/LTF8.

So you're right that this does seem unused. I can pull this change out of this PR into a different branch we can bring back if we find a later use for it.

@@ -1,29 +0,0 @@
package htsjdk.cram.io;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from htsjdk.cram.io to htsjdk.samtools.cram.io like the class it's testing

ByteBuffer bb = ByteBuffer.wrap(bytes);

// a manually-advanced byte buffer
final int manualBufferSize = 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To exercise the read from byte[] method

return params;
}

static List<Integer> int32Tests() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from ITF8Test for reuse

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice consolidation!

};
}

static List<Long> int64Tests() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from LTF8Test for reuse

}

// special cases:
list.add(Long.MAX_VALUE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two were not in the original test cases

@codecov-io
Copy link

codecov-io commented Oct 16, 2018

Codecov Report

Merging #1198 into master will increase coverage by 0.058%.
The diff coverage is 91.667%.

@@               Coverage Diff               @@
##              master     #1198       +/-   ##
===============================================
+ Coverage     68.414%   68.472%   +0.058%     
- Complexity      8019      8025        +6     
===============================================
  Files            542       542               
  Lines          32701     32704        +3     
  Branches        5530      5530               
===============================================
+ Hits           22372     22393       +21     
+ Misses          8125      8109       -16     
+ Partials        2204      2202        -2
Impacted Files Coverage Δ Complexity Δ
...ain/java/htsjdk/samtools/cram/io/CramIntArray.java 88.889% <0%> (ø) 4 <0> (?)
...ain/java/htsjdk/samtools/cram/structure/Block.java 62.5% <100%> (ø) 30 <0> (ø) ⬇️
src/main/java/htsjdk/samtools/cram/io/CramInt.java 85% <100%> (+43.824%) 5 <1> (+2) ⬆️
...n/java/htsjdk/samtools/cram/structure/SliceIO.java 88.889% <100%> (ø) 15 <0> (ø) ⬇️
...jdk/samtools/cram/structure/ContainerHeaderIO.java 86.842% <100%> (ø) 6 <0> (ø) ⬇️
...va/htsjdk/samtools/sra/SRAIndexedSequenceFile.java 64.865% <0%> (+2.703%) 7% <0%> (+1%) ⬆️
...samtools/util/AsyncBlockCompressedInputStream.java 76% <0%> (+4%) 13% <0%> (+1%) ⬆️
...a/htsjdk/samtools/cram/io/ExternalCompression.java 88.571% <0%> (+20%) 11% <0%> (+2%) ⬆️

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay - more tests. A few questions/comments.

@@ -7,7 +7,7 @@
/**
* Methods to read and write CRAM array of integers data type.
*/
public class CramArray {
public class CramIntArray {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

* Methods to read and write CRAM long values as given in the file format specification.
*/

public class CramLong {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it looks like longs are a valid data type in the spec, but are unused by the implementation ? Its strange that we've never encountered a problem decoding CRAMs produced by other software, unless they don't use it either. It may be that longs are always written as LTF8 ? I think we should try to resolve that before we commit code and tests that aren't used anywhere. If we are missing functionality, we can keep this and then create a ticket that we need to hook it up. Do you know if/how htslib uses these ?

public void runTest(List<Integer> ints) throws IOException {

int[] inputArray = ints.stream().mapToInt(Integer::intValue).toArray();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use try-with-resources for these to ensure they're closed even if an exception is thrown.

ByteArrayOutputStream baos = new ByteArrayOutputStream();
CramIntArray.write(inputArray, baos);

ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try-with-resources

return (0xFF & (int) buffer.get()) |
(0xFF & (int) buffer.get()) << 8 |
(0xFF & (int) buffer.get()) << 16 |
(0xFF & (int) buffer.get()) << 24;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in this class I'd be in favor of renaming the int32 overloads in to readInt32 so they're symmetric with the writeInt32 ones. WDYT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that

* @return a long value read
* @throws IOException as per java IO contract
*/
public static long int64(final InputStream inputStream) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming comment here if we keep these.

}

baos.close();
bais.close();
Copy link
Collaborator

@cmnbroad cmnbroad Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add something to verify that these methods are really respecting little endianness.

I'd prefer to see this factored so there is one test method per method (or one test method per read/write overload/pair), rather than mixing them all up, with common code factored out as appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return params;
}

static List<Integer> int32Tests() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice consolidation!

- split CramInt tests into multiple
- try-with-resources
- rename to readInt32
@jmthibault79
Copy link
Contributor Author

Comments addressed

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbergelson more tests @jmthibault79 LGTM - thx!

return buffer.get() | buffer.get() << 8 | buffer.get() << 16 | buffer.get() << 24;
public static int readInt32(final ByteBuffer buffer) {
return (0xFF & (int) buffer.get()) |
(0xFF & (int) buffer.get()) << 8 |
Copy link
Contributor

@pshapiro4broad pshapiro4broad Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to int should be unnecessary here, as byte is promoted to int before the bit shift operation is applied.

Also am not sure why 0xff & is necessary as buffer.get() returns byte. I must be missing something here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the int casts - good catch.

0xFF & is still necessary because int-promotion extends the negative sign into higher bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I now see why this is necessary, I forgot that byte isn't an unsigned type. Why, Java? Why?

}

private static <T> Object[][] asDataProvider(List<T> list) {
Object[][] params = new Object[list.size()][] ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space

Suggested change
Object[][] params = new Object[list.size()][] ;
Object[][] params = new Object[list.size()][];


private static <T> Object[][] asDataProvider(List<T> list) {
Object[][] params = new Object[list.size()][] ;
for (int i=0; i<params.length; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space

Suggested change
for (int i=0; i<params.length; i++)
for (int i = 0; i < params.length; i++)


public class ExternalCompressionTest extends HtsjdkTest {
public static final File BZIP2_FILE = new File("src/test/resources/htsjdk/samtools/cram/io/bzip2-test.bz2");
public static final byte [] TEST_BYTES = "This is a simple string to test compression".getBytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd spacing, I would expect to not have a space before []

Suggested change
public static final byte [] TEST_BYTES = "This is a simple string to test compression".getBytes();
public static final byte[] TEST_BYTES = "This is a simple string to test compression".getBytes();

@@ -40,7 +40,10 @@ public static int int32(final byte[] data) {
* @return an integer value read from the buffer
*/
public static int int32(final ByteBuffer buffer) {
return buffer.get() | buffer.get() << 8 | buffer.get() << 16 | buffer.get() << 24;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test case that fails without this change?

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jmthibault79 jmthibault79 merged commit 37069a3 into master Oct 19, 2018
@jmthibault79 jmthibault79 deleted the jt_io_ut branch October 19, 2018 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants