-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple VCFSimpleHeaderLine of same type #1531
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1531 +/- ##
============================================
Coverage 69.407% 69.407%
- Complexity 8924 8945 +21
============================================
Files 602 604 +2
Lines 35521 35649 +128
Branches 5904 5924 +20
============================================
+ Hits 24654 24743 +89
- Misses 8533 8555 +22
- Partials 2334 2351 +17
|
Hey @lbergelson is this a PR your team would be willing to review? Many thanks if so! |
@clintval Yeah, we're definitely going to get to it. Reviews have been really slow lately because working from home in the pandemic is killing my productivity. |
@mjhipp Thanks for taking a stab at this. While it does fix the duplicate line problem, it changes current behavior in a way that we should at least think about. Currently, any ID header line other than INFO/FORMAT (i.e., ALT, META, SAMPLE, PEDIGREE, or any other custom line) can be retrieved via The real issue is that If we're going to break compatibility to fix this, I'd be inclined to favor doing the more complete fix though it's likely to be a bit more work and might impact other methods. Or maybe we should just resurrect the old PR and wait for that. @lbergelson any thoughts ? Having said all that, we should fix this somehow... |
A nested map is one possibility for this. Where the outer map is from line type ("ALT", "PEDIGREE", etc) to an inner map, and the inner map is ID to metadata line. This would scale for any reasonable amount of line types, without having to hard code them (defining This would require a similar change to methods, but without synthetic key names. Would add a new map at the top: private final Map<String, Map<String, VCFSimpleHeaderLine>> mIdMetaData = new LinkedHashMap<String, Map<String, VCFSimpleHeaderLine>>(); Adding a new line would involve checking for the which would change the /**
* @param key the header key or field type
* @param id the header id
* @return the meta data line, or null if there is none
*/
public VCFSimpleHeaderLine getIdHeaderLine(final String key, final String id) {
return mIdMetaData.get(key).get(id);
} (probably on multiple lines with a null check in there)
With the above changes, could possibly also change the existing |
I made the changes above. I think now the only backwards compatibility issue is that |
if (mOtherMetaData.containsKey(key)) { | ||
return mOtherMetaData.get(key); | ||
} else if (mIdMetaData.containsKey(key)) { | ||
// Get the first item in the linked hash map | ||
Map<String, VCFSimpleHeaderLine> fieldMetaData = mIdMetaData.get(key); | ||
if (fieldMetaData.keySet().size() > 0) { | ||
return fieldMetaData.get(fieldMetaData.keySet().iterator().next()); | ||
} | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for backwards compatibility, where previously a VCFSimpleHeaderLine
could be accessed by this method (only one per field type). It may be better to let this return null
and force users to use getIdHeaderLine()
to find VCFSimpleHeaderLine
s. This would be more consistent with getOtherHeaderLines
/getIdHeaderLines
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this method makes sense by itself, it introduces a new inconsistency with getOtherHeaderLines()
, since it has a different idea of what "other" means (it returns lines that will never be returned by the existing method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjhipp Again, thanks for attempting to fix this. I added a couple of more inline comments on your most recent commit, but I'm a bit skeptical that we can fix this issue without introducing a different set of new issues (especially given that we don't have good unit tests for the existing methods).
Before we go any further we should get @lbergelson's thoughts on whether we want trade off some new inconsistency/weirdness in order to fix the duplicate line issue. If so, I think we'd need to add a bunch more unit tests to verify the behavior of whatever tradeoffs we make, but I'll hold off on further comment until he chimes in.
if (mOtherMetaData.containsKey(key)) { | ||
return mOtherMetaData.get(key); | ||
} else if (mIdMetaData.containsKey(key)) { | ||
// Get the first item in the linked hash map | ||
Map<String, VCFSimpleHeaderLine> fieldMetaData = mIdMetaData.get(key); | ||
if (fieldMetaData.keySet().size() > 0) { | ||
return fieldMetaData.get(fieldMetaData.keySet().iterator().next()); | ||
} | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this method makes sense by itself, it introduces a new inconsistency with getOtherHeaderLines()
, since it has a different idea of what "other" means (it returns lines that will never be returned by the existing method).
mIdMetaData.forEach((k, v) -> lines.addAll(v.values())); | ||
return lines; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class already has a public method List<VCFIDHeaderLine> getIDHeaderLines()
that returns all ID lines. I think it will be confusing to also have this new method, with a slightly different name and signature that returns only the ID header lines that are not INFO, FORMAT, etc.
After further discussing this internally and thinking more about it, I'm inclined to wait for a more complete fix to VCFHeader and address the numerous issues all at once rather than continue trying to make incremental fixes. I've resurrected and rebased my old #835 branch and will resubmit a new draft PR so we can try to find a path forward and get that merged. |
Description
Closes #277
Closes #500
Currently, it is not possible to add multiple
VCFSimpleHeaderLine
s to a VCFHeader if they are of the same type (see issues linked above). A specific example, is addingALT
lines, which are included in 4.2 spec (that shows a header with multiple ALT lines on page 11).To fix this, I added a new map and methods for working with
SimpleVCFHeaderLine
s that are not of the default types.I have seen another fix to this (#835), but it was closed due to large scope. I am hoping this change is small and useful enough to be considered. Thanks!
Things to think about before submitting: