Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InputStream support to VCFFileReader #723

Open
btiernay opened this issue Oct 12, 2016 · 5 comments
Open

Add InputStream support to VCFFileReader #723

btiernay opened this issue Oct 12, 2016 · 5 comments
Assignees

Comments

@btiernay
Copy link

This is an enabler for remote streaming of VCF files from remote locations or even stdin. A common use case would be reading from an S3 bucket a slice of a VCF file without having to first transfer the entire file locally. The current File requirement is quite limiting for these types of usages.

@lbergelson
Copy link
Member

Hi @btiernay. One alternative to streaming from stdin would be to implement or find a java.nio.FileSystemProvider for S3 buckets. We've implemented support for reading VCFs from anything that can be represented by a java.nio.Path. There's an existing implementation for Google's object store. The nice thing about doing it that way is that it will automatically have support for things like the vcf index so you can read just the chunks of the file you want.

@lindenb
Copy link
Contributor

lindenb commented Apr 26, 2017

@btiernay @lbergelson related PR: #837

@delocalizer
Copy link
Contributor

@lbergelson a while back you mentioned above functionality to read VCFs (and indexes?) from Path; is this the stuff in yf_vcf_from_path branch and is there a timeline on merging to master? We'd love this feature.

@lbergelson
Copy link
Member

@delocalizer You can currently load a vcf file and index from a path by using something along the lines of

AbstractFeatureReader.getFeatureReader(path.toAbsolutePath().toUri().toString(), new VCFCodec())

It takes a String for complicated legacy reasons, but it will resolve the string with the appropriate FileSystem. That overload will try to autodiscover the index, but there are other overloads that let you explicitly pass in an index, or not require one, or pass in other options.

We use this in gatk's FeatureDataSource which also does other things like automatic codec discovery and asynchronous prefetching from remote paths.

@delocalizer
Copy link
Contributor

Thanks @lbergelson; this is good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants