Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read VCF From StringIO or Buffer? #47

Closed
danielecook opened this issue Apr 14, 2017 · 7 comments
Closed

Read VCF From StringIO or Buffer? #47

danielecook opened this issue Apr 14, 2017 · 7 comments

Comments

@danielecook
Copy link

I'm writing an AWS lambda function that fetches a VCF remotely and converts it to JSON - using cyvcf2 to read and format input. I want to avoid having to write the VCF that is fetched remotely to disk and then having cyvcf2 read it. Is there a way for cyvcf2 to read a VCF from a string buffer (or stdout from subprocess?).

@brentp
Copy link
Owner

brentp commented Apr 14, 2017

it can accept data from stdin, but not from a StringIO or buffer.

Can you make your program accept stdin?

there might be a way to make htslib accept a file-handle, but right now, all the file processing is internal to htslib so it just takes a string. Let me know if you know if you can't get it to work and I'll have a look:

https://github.com/samtools/htslib/blob/develop/htslib/hfile.h#L71

@WortJohn
Copy link

WortJohn commented Dec 8, 2017

I input it from stdin in python script, like

parser.add_argument('infile', help='vcf file from VarDict, "stdout" or "a file"', nargs='?', default=sys.stdin)
but "TypeError: coercing to Unicode: need string or buffer, file found" raises . @brentp

@brentp
Copy link
Owner

brentp commented Dec 8, 2017

you have to use the string "/dev/stdin" or "-"

@WortJohn
Copy link

WortJohn commented Dec 8, 2017

Thanks for your help. I solve it when i change "sys.stdin" to "-" .

@wdecoster
Copy link
Contributor

Hi @brentp, as this issue is quite old - is this something that was eventually implemented?
I am in a similar scenario, where I have a base64 encoded VCF in memory, which I decode, write to disk, parse with cvycf2, and then remove the file. But as that is both a potential security and privacy issue - is this something that I could avoid?

Thanks,
Wouter

@brentp
Copy link
Owner

brentp commented May 30, 2024

Hi @wdecoster I was just able to get this to work:

import os
import io
import cyvcf2

vcf_string = r"""##fileformat=VCFv4.1
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   17363   .   C   T   62  PASS    AC=2
"""

r, w = os.pipe()

with os.fdopen(w, 'w') as w_fd:
    w_fd.write(vcf_string)


with os.fdopen(r, 'r') as r_fd:
    vcf = cyvcf2.VCF(r_fd)
    for v in vcf:
        print(v, end="")

perhaps this can work for you?

@wdecoster
Copy link
Contributor

Yes, thank you! That works wonderfully :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants