-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tribble/Tabix index path support #810
Tribble/Tabix index path support #810
Conversation
@magicDGS Something went wrong here, tests are failing all over. |
I figured it out already, I'm trying to find the problem. |
Very awful error from my side, sorry. The Now I guess that it's solved. Could you have a quick review, @lbergelson? |
Codecov Report
@@ Coverage Diff @@
## master #810 +/- ##
===============================================
- Coverage 64.867% 64.846% -0.021%
- Complexity 7175 7187 +12
===============================================
Files 526 527 +1
Lines 31731 31769 +38
Branches 5424 5424
===============================================
+ Hits 20583 20601 +18
- Misses 8997 9019 +22
+ Partials 2151 2149 -2
Continue to review full report at Codecov.
|
This is still WIP, because I found some other paths in the indexing code that should use Path for fully support. So this should wait for review, @lbergelson. |
82a554f
to
f293192
Compare
This should be ok now, @lbergelson. Previously failing tests are passing in my computer. |
Can you have a look to this one @lbergelson? This is going to be important for my upcoming PR with tribble writing support with indexing on the fly. Thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magicDGS This is useful work. Thanks for doing this. I have a a few comments. I think it's probably worth breaking implementations of Index to avoid having weird defaults methods there, but lets see if @droazen thinks otherwise. He's usually more conservative about these things than I am.
Would it be possible for you to add a test that creates an index on a non-file path? There's an example of how to do this using an in memory filesystem called jimfs in AbstractFeatureReaderTest
.
The basic idea is that you create a new jimfs filesystem, then you can get paths from that filesystem (ex: "jimfs://filesystemname/path/to/my/in/memory/file"). You may have to copy a file from the local system into jimfs if you want to create an index from it, but that's easy to do using the standard Files operations.
Just be sure that the jimfs filesystem gets closed after your test is done, because it will stay in memory until the jvm shuts down otherwise.
* @return a non-null File representing the index | ||
*/ | ||
public static File indexFile(final File file) { | ||
return indexFile(file.getAbsoluteFile(), STANDARD_INDEX_EXTENSION); | ||
} | ||
|
||
/** | ||
* Return the name of the tabix index file for the provided vcf {@code filename} | ||
* Return the name of the index file for the provided {@code filename} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to change filename -> path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Does not actually create an index | ||
* @param filename name of the vcf file | ||
* @param path name of the path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"name of the path" sounds like it's something distinct from the path itself, which is strange because "name of the file" is fine. Maybe just "the path"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -201,7 +213,11 @@ public boolean isCurrentVersion() { | |||
} | |||
|
|||
public File getIndexedFile() { | |||
return indexedFile; | |||
return getIndexedPath().toFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should deprecate this method because it's no longer safe to call in all instances. It also needs a comment saying that it can fail with an exception if the path can't be represented as a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -251,7 +271,7 @@ private void writeHeader(final LittleEndianOutputStream dos) throws IOException | |||
dos.writeInt(MAGIC_NUMBER); | |||
dos.writeInt(getType()); | |||
dos.writeInt(version); | |||
dos.writeString(indexedFile.getAbsolutePath()); | |||
dos.writeString(indexedPath.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use indexedPath.toURI().toString()
since it's more likely to roundtrip correctly for non-file paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -67,7 +70,7 @@ | |||
private final static long NO_TS = -1L; | |||
|
|||
protected int version; // Our version value | |||
protected File indexedFile = null; // The file we've created this index for | |||
protected Path indexedPath = null; // The file we've created this index for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically a breaking change. I'm going to say it's fine though. I don't believe there are any subclasses of AbstractIndex in the wild that we're going to be breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I haven't realize that it is breaking compatibility. If we do not want to break compatibility, I can kept the field with the deprecated annotation and set it when the used constructor is a File
. Let me know if I should do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's tricky. Anything field that's visible to a subclass of a non-final class is technically a breaking change since someone could be relying on it if they implement a subclass. It's why I'm so strongly in favor of making classes final and of making variable private with accessors if they need to be accessed.
@@ -56,13 +57,18 @@ | |||
MathUtils.RunningStat stats = new MathUtils.RunningStat(); | |||
long basesSeen = 0; | |||
Feature lastFeature = null; | |||
File inputFile; | |||
// TODO: actually this field is not needed | |||
Path inputPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a breaking change, but I don't see the harm in it since I really doubt anyone is subclassing this thing. 👍 to delete the field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, because it is package protected, I do not think that is going to harm at all (unless someone is subclassing in the same package outside HTSJDK).
If we are breaking compatibiity anyway, as it is unused, why don't we just remove the field (no the constructor arg)? It is only used for get the index creators...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the unused field in the commit for breaking compatibility.
* @param indexPath Where to write the index. | ||
* @throws IOException if the index is unable to write to the specified path. | ||
*/ | ||
public default void write(final Path indexPath) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better idea to have the default throw OperationUnsupportedException
than to have it call toFile()
which will work most of the time but sometimes fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* @param featurePath | ||
*/ | ||
public default void writeBasedOnFeaturePath(Path featurePath) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of want to break compatibility and force everyone to implement the Path functions. It would be so much cleaner to have the file versions be default implementations that call the path version. @droazen What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect no on in the wild implements Index. At least it's not done in gatk/picard/igv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using OperationUnsupportedException for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the interface in the commit for breaking compatibility.
* | ||
* @param tabixPath Where to write the index. | ||
*/ | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the old file version delegate to this implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @param featurePath Path being indexed. | ||
*/ | ||
@Override | ||
public void writeBasedOnFeaturePath(final Path featurePath) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I added two commits, @lbergelson:
Regarding the test, I don't know if I will have time today. I will try, but if not I will add them by tomorrow. You can review the current changes anyway, in case that something else should be changed. |
I've just realized that something is broken with the changes, @lbergelson. Do you have an idea of what is happening? It looks that is breaking indexing on the fly for gzip VCF result (not the indexing itself). The most likely reson is the change to delegate the file constructor to the path one for files, because I can add a simple constructor to |
Ops, I realized that it was an error not related with the BGZ output stream. That's what happens when you program right after waking up... Fixed in the next commit (and removed the commit with the BGZ changes). |
1f23883
to
777d9a2
Compare
777d9a2
to
b12c9fa
Compare
@@ -219,8 +219,8 @@ public void write(final Path tabixPath) throws IOException { | |||
*/ | |||
@Override | |||
public void writeBasedOnFeaturePath(final Path featurePath) throws IOException { | |||
if (Files.isRegularFile(featurePath)) return; | |||
write(IOUtil.getPath(Tribble.tabixIndexPath(featurePath))); | |||
if (!Files.isRegularFile(featurePath)) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the refactoring problem, I overseen this. @lbergelson, I think that this behavior should be documented and/or at least log a warning to avoid getting crazy about what happened with the index. I will rather prefer if it fails with and IO/Tribble exception, but that will break compatibility with public methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, good catch. That's terrible behavior. I wonder why it was done that way? Definitely needs to at least log a warning. I would also prefer your suggestion to fail with a TribbleException, but I think we'd need to investigate if that would break existing downstream code somehow. For now, could you have it log a warning, and open a new ticket to change the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added warning log here and in AbstractIndex
.
@@ -73,41 +73,42 @@ | |||
/** | |||
* Writes the index into a file. | |||
* | |||
* Default implementation delegates in {@link #write(Path)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to nitpick the language, but delegates to instead of delegates in is more idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I really appreciate to have comments to improve my English skills. Done!
|
||
/** | ||
* Write an appropriately named and located Index file based on the name and location of the featureFile. | ||
* If featureFile is not a normal file, the index will silently not be written. | ||
* | ||
* Default implementation delegates in {@link #writeBasedOnFeaturePath(Path)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@magicDGS Looks good. 👍 Once we have the tests for the two index types. |
Three more commits:
Thanks for reviewing, @lbergelson - Back to you! |
👍 |
Thanks! |
Description
Motivation
While trying to index a
Path
(no convertible toFile
) using theIndex
interface, the client should:LittleEndianOutputStream
by its own. This may lead to wrong indexes because they aren't block-compressed.In addition, this is part of a bigger change for include support of writing in whatever filesystem (and also reading).
Changes
Tribble
and unused imports.Index
interface: methods with default values usePath.toFile()
AbstractIndex
andTabixIndex
Checklist