-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cram implementation reports validation errors at container granularity #1091
Cram implementation reports validation errors at container granularity #1091
Conversation
17dd6f2
to
b3e4f77
Compare
Codecov Report
@@ Coverage Diff @@
## master #1091 +/- ##
==============================================
+ Coverage 68.7% 68.703% +0.003%
- Complexity 8060 8061 +1
==============================================
Files 542 542
Lines 32728 32725 -3
Branches 5537 5536 -1
==============================================
- Hits 22484 22483 -1
Misses 8043 8043
+ Partials 2201 2199 -2
|
@@ -106,7 +106,7 @@ public CRAMIterator(final SeekableStream seekableStream, final CRAMReferenceSour | |||
this.containerIterator = containerIterator; | |||
|
|||
firstContainerOffset = containerIterator.getFirstContainerOffset(); | |||
records = new ArrayList<SAMRecord>(10000); | |||
records = new ArrayList<SAMRecord>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many records are expected in each container? This could have some performance effect, but I understand that there is no explanation for the magic number. Should it be extracted as an default constant at the class level or maybe to an argument to the iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magicDGS hello, thanks for your review, I've extracted it as constant at the class level
10k is/used to be container default size but you are right of course.
…On Tue, 20 Feb 2018, 08:26 Daniel Gómez-Sánchez, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/main/java/htsjdk/samtools/CRAMIterator.java
<#1091 (comment)>:
> @@ -106,7 +106,7 @@ public CRAMIterator(final SeekableStream seekableStream, final CRAMReferenceSour
this.containerIterator = containerIterator;
firstContainerOffset = containerIterator.getFirstContainerOffset();
- records = new ArrayList<SAMRecord>(10000);
+ records = new ArrayList<SAMRecord>();
How many records are expected in each container? This could have some
performance effect, but I understand that there is no explanation for the
magic number. Should it be extracted as an default constant at the class
level or maybe to an argument to the iterator?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1091 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAoz5FYaGogrKr0Jfg5LVsc1I4V_R8ROks5tWsgXgaJpZM4SLsja>
.
|
…erator.next() -Added corresponded tests
ea5521b
to
410011b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass, with a few requests.
@@ -43,6 +43,7 @@ | |||
|
|||
public class CRAMIterator implements SAMRecordIterator { | |||
private static final Log log = Log.getInstance(CRAMIterator.class); | |||
private static final int DEFAULT_CONTAINER_CAPACITY = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are existing constants named DEFAULT_RECORDS_PER_SLICE
and DEFAULT_SLICES_PER_CONTAINER
(defined in CRAMContainerStreamWriter
!). We're planning to do a refactoring of this code soon to fix a lot of this (i.e., those constants and much of this code should live in the Container and Slice classes, and since its reading an existing container, the size of the list can be based on the size of the container being read rather than constant). For now, if anything, I'd say to just use the existing DEFAULT_RECORDS_PER_SLICE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad hello, thanks for your review, all requested changes are done
* | ||
* @param samRecord - validated record | ||
*/ | ||
public void validateRecord(SAMRecord samRecord){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be public, or even a separate method at all since its only called once.
*/ | ||
public void validateRecord(SAMRecord samRecord){ | ||
final List<SAMValidationError> validationErrors = samRecord.isValid(); | ||
SAMUtils.processValidationErrors(validationErrors, samRecordIndex, validationStringency); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that in this context, samRecordIndex bears any relationship to the record being processed.
|
||
|
||
/** | ||
* @author [email protected], EPAM Systems, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test file, so it doesn't matter, but block comments like "author" need to go after the main javadoc or else the entire javadoc text becomes part of the author block.
|
||
public class CRAMIteratorTest extends HtsjdkTest { | ||
private static final File refFile = new File("src/test/resources/htsjdk/samtools/cram/ce.fa"); | ||
private static final File cramFile = new File("src/test/resources/htsjdk/samtools/cram/ce#supp.3.0.cram"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this file has some records that fail validation. It would be better to make a copy of it with a name that indicates that, and reinforce it by using a variable name here that reflects it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, although many of the test are written this same way, it would be preferable for each test to do its own setup rather than relying on class-level state.
public class CRAMIteratorTest extends HtsjdkTest { | ||
private static final File refFile = new File("src/test/resources/htsjdk/samtools/cram/ce.fa"); | ||
private static final File cramFile = new File("src/test/resources/htsjdk/samtools/cram/ce#supp.3.0.cram"); | ||
ReferenceSource source = new ReferenceSource(refFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't publish a style guide, but for new code we try to use "final" everywhere possible (paramaters, locals, etc.).
getCramFileIterator(cramFile, source, ValidationStringency.STRICT); | ||
|
||
while (cramIter.hasNext()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't publish a style guide, but we generally try to put opening braces on the line with the previous statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other comment I forgot to make; since this test is expected to throw an exception, it would be a good idea to use a try/finally block to close the reader.
|
||
private SAMRecordIterator getCramFileIterator(File cramFile, | ||
ReferenceSource source, | ||
ValidationStringency valStrigency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valStrigency -> valStringency
} | ||
|
||
@Test(expectedExceptions = SAMException.class) | ||
public void shouldThrowExceptionIfCRAMFileContainsInvalidRecods() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to throwOnRecordValidationFailure
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay on reviewing this. The changes are pretty nicely simplified now - just a couple of test cleanup comments.
final ReferenceSource source = new ReferenceSource(refFile); | ||
final SAMRecordIterator cramIteratorOverInvalidRecords = | ||
getCramFileIterator(cramFileWithInvalidRecs, source, ValidationStringency.STRICT); | ||
try{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try-with-resources
final File cramFileWithInvalidRecs = new File("src/test/resources/htsjdk/samtools/cram/ce#containsInvalidRecords.3.0.cram"); | ||
final ReferenceSource source = new ReferenceSource(refFile); | ||
final SAMRecordIterator cramIteratorOverInvalidRecords = | ||
getCramFileIterator(cramFileWithInvalidRecs, source, ValidationStringency.STRICT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code up to this point is duplicated in both test methods. Can you factor that out (maybe into getCramFileIterator
).
return iterator.next(); | ||
SAMRecord samRecord = iterator.next(); | ||
if (validationStringency != ValidationStringency.SILENT) { | ||
SAMUtils.processValidationErrors(samRecord.isValid(), samRecordIndex++, validationStringency); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since samRecordIndex
is a class not a local variable, can you add a comment to the declaration saying that it is only used when Validation Stringency is not SILENT, and otherwise it isn't valid. (this was always true even before this PR, but it would be good to make that clear).
public class CRAMIteratorTest extends HtsjdkTest { | ||
|
||
@Test(description = "This test checks that records validation is deferred until they are retrieved") | ||
public void notThrowOnOpeningContainerWithInvalidRecords() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this to "noValidationFailureOnContainerOpen".
getCramFileIterator(cramFileWithInvalidRecs, source, ValidationStringency.STRICT); | ||
|
||
Assert.assertTrue(cramIteratorOverInvalidRecords.hasNext()); | ||
cramIteratorOverInvalidRecords.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAMRecordIterator is closable, so you should be able to use try-with-resources here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmnbroad Hello! Thank you for review, requested changes are done
@AntonMazur I think this is pretty close now, but it fails to compile on travis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AntonMazur Sorry again for the long delay on getting this branch reviewed. It looks pretty good with the last set of changes, however there is one comment thats a bit awkwardly written. Since the tests didn't run last time this was submitted, it would be good to update that, and then rebase onto current master since this branch is pretty far behind. This will rerun the CI tests to make sure everything passes, then we can get this merged. Alternatively, I'd be happy to make that last change for you if you prefer - just let me know. Thanks again for doing this.
@AntonMazur Just pinging you one more time on this. I'll plan to update and rebase this branch if I don't hear back from you soon, since I thinks pretty close, but has conflicts that need to be resolved. |
f80ba81
to
cba9220
Compare
…ation_errors_at_container_granularity # Conflicts: # src/main/java/htsjdk/samtools/CRAMIterator.java
6439592
to
cb13b65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @cmnbroad Can you make the fields we talked about final in a follow up?
Description
Associated issue: #1076
Motivation
ArrayList<SAMRecord> records;
Solution
The first problem was solved by deletion of validation from cramIterator.nextContainer() method, and addition it in cramIterator.next(), for validation records when they retrieves.
And the second problem was solved by deletion of objects reservation.
Checklist