Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UpdateVcfSequenceDictionary should handle stdin and stdout gracefully #507

Open
nh13 opened this issue Apr 12, 2016 · 25 comments
Open

UpdateVcfSequenceDictionary should handle stdin and stdout gracefully #507

nh13 opened this issue Apr 12, 2016 · 25 comments

Comments

@nh13
Copy link
Collaborator

nh13 commented Apr 12, 2016

Currently it will blow up trying to build the writer:

Exception in thread "main" java.lang.IllegalArgumentException: Must specify file or stream output type.
    at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.build(VariantContextWriterBuilder.java:407)
    at picard.vcf.UpdateVcfSequenceDictionary.doWork(UpdateVcfSequenceDictionary.java:92)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
@ronlevine
Copy link
Contributor

@yfarjoun Can you either give me picard write permission or assign me to this task? It would be much easier if I had write permission, I'd prefer to work in a branch rather than a fork.

@yfarjoun
Copy link
Contributor

I'm pretty sure you have right access.

@ronlevine
Copy link
Contributor

ronlevine commented Jan 27, 2017

@yfarjoun I do not have write access. I tried pushing my branch and got the following:

remote: Permission to broadinstitute/picard.git denied to ronlevine.
fatal: unable to access 'https://github.com/broadinstitute/picard.git/': The requested URL returned error: 403

I am not listed as a member: https://github.com/broadinstitute/picard/network/members

@ronlevine
Copy link
Contributor

@nh13 I can see a use case for stdout but is there one for stdin?

@nh13
Copy link
Collaborator Author

nh13 commented Jan 27, 2017

Yes, streaming support would be desirable as I don't see any reason it would need to read the input twice.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 28, 2017 via email

@ronlevine
Copy link
Contributor

ronlevine commented Jan 28, 2017

I'm not sure what you are referring to by the "ssh link".

I cloned picard

git clone https://github.com/broadinstitute/picard.git

then cd'ed to the picard directory

MITAdmins-Air:picard ronlevine$ git remote -v
origin	https://github.com/broadinstitute/picard.git (fetch)
origin	https://github.com/broadinstitute/picard.git (push)
stable	[email protected]:broadinstitute/gsa-stable.git (fetch)
stable	[email protected]:broadinstitute/gsa-stable.git (push)
unstable	[email protected]:broadinstitute/gsa-unstable.git (fetch)
unstable	[email protected]:broadinstitute/gsa-unstable.git (push)

I created a branch and could not push it to the origin repository.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 28, 2017 via email

@ronlevine
Copy link
Contributor

Cloned [email protected]:broadinstitute/picard.git, created a branch, made changes then tried to push them, but was denied

MITAdmins-Air:picard ronlevine$ git push -u origin rhl_uvsd_stdin_stdout_507
Enter passphrase for key '/Users/ronlevine/.ssh/id_rsa': 
ERROR: Permission to broadinstitute/picard.git denied to ronlevine.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 28, 2017 via email

@ronlevine ronlevine self-assigned this Jan 28, 2017
@ronlevine
Copy link
Contributor

@yfarjoun 👍

@ronlevine
Copy link
Contributor

ronlevine commented Jan 29, 2017

@nh13 Can you give me an explicit example of a command line using stdin? I assume the following is the stdout case:
java -jar ./build/libs/picard.jar UpdateVcfSequenceDictionary I=input.vcf SD=sequencDictionay.vcf O=/dev/stdout

@nh13
Copy link
Collaborator Author

nh13 commented Jan 29, 2017

Should be simple to make one up using cat or the like.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 29, 2017 via email

@ronlevine
Copy link
Contributor

ronlevine commented Jan 31, 2017

@yfarjoun I grabbed the following lines of code and substituted InputVCFfIle.vcf

final FileInputStream stream = new FileInputStream(InputVCFfIle.vcf);
final Field fdField = FileDescriptor.class.getDeclaredField("fd");
fdField.setAccessible(true);
final File inputStream = new File("/dev/fd/" + fdField.getInt(stream.getFD()));

VCFFileReader successfully reads the header during it's construction but it's iterator is unable to read the rest of the file.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 31, 2017 via email

@ronlevine
Copy link
Contributor

ronlevine commented Jan 31, 2017

Exactly. HTSJDK's TribbleIndexedFeatureReader.readHeader, which is used by the VCFFileReader , ctor closes it. For streaming, it can not be closed then opened again. It's done that same way for the sequence dictionary. And, there is a file type check for the sequence dictionary, which will fail for this approach since the file name would be /dev/fd/number.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 31, 2017 via email

@ronlevine
Copy link
Contributor

ronlevine commented Jan 31, 2017

That's the plan. Handling stdout is trivial. I think the stdin issues such as having the option to not close the stream after reading the header and determining the file type by examining the first line of the file instead of by file extension should be done in HTSJDK.

@yfarjoun
Copy link
Contributor

yfarjoun commented Jan 31, 2017 via email

@ronlevine
Copy link
Contributor

There is an existing HTSJDK issue, Add InputStream support to VCFFileReader.

@sooheelee
Copy link
Contributor

Listen, I may be off base here but can I ask why we don't have a separate centralizing I/O actor, like BWA does? Given the larger file sizes now, would this type of feature make sense? If not, why not?

@ronlevine
Copy link
Contributor

Does it handle input streams? It would have to decode that stream type on the fly. samtools/htsjdk#837 is handling the input stream. Can you point me to the BWA I/O actor?

@sooheelee
Copy link
Contributor

Someplace in https://github.com/lh3/bwa. Sorry I can't be more specific.

@mjhipp
Copy link
Contributor

mjhipp commented Mar 4, 2024

Adding a comment that HTSJDK has tried to add this functionality (samtools/htsjdk#1245) but there are issues with the reader that still persist (#1776).

The issue linked specifies SortVcf, but it is actually present in other tools that use the underlying htsjdk VCF input reader in a similar way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants