Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the variants endpoint of htsget protocal supported? #1187

Closed
jimmyhli opened this issue Sep 28, 2020 · 37 comments
Closed

Is the variants endpoint of htsget protocal supported? #1187

jimmyhli opened this issue Sep 28, 2020 · 37 comments
Milestone

Comments

@jimmyhli
Copy link

jimmyhli commented Sep 28, 2020

The documentation doesn't seem to mention the support the Variants of an htsget API server

I assume the variants support of htsget will just follow the same as https://github.com/igvteam/igv.js/wiki/Variant-Track?

image

btw, the sourceType is likely outdated

@jrobinso
Copy link
Contributor

jrobinso commented Sep 29, 2020

There is no htsget support for variants, if it works its an accident. Are there any servers out there? htsget has been the definition of slow-burner.

BTW 'ga4gh" api is not supported anymore, that's the old ga4gh schema not htsget.

@brainstorm
Copy link

brainstorm commented Mar 9, 2021

Are there any servers out there?

Yes, there are: https://htsget.ga4gh.org/variants/service-info

/cc @jb-adams @ohofmann

@jrobinso
Copy link
Contributor

jrobinso commented Jun 9, 2021

@brainstorm Sorry maybe I wasn't clear. Are there any public endpoints I can use? I don't see how the json you reference addresses the question but maybe I'm missing something. I'm not interested in running a server myself.

@brainstorm
Copy link

brainstorm commented Jun 10, 2021

Far point plus it seems like the documentation link on that JSON result is 404'ing (/cc @jb-adams), which makes understanding how those endpoints work unnecessarily cumbersome.

A clear and stable example that can be accessed easily from the service-info endpoint would help a ton to casual developers on the public server, @jb-adams.

That being said, here you have an example of use while the service-info documentation for this public endpoint gets fixed:

https://htsget.ga4gh.org/variants/1000genomes.phase1.chrY

The way I got to that test URL is by looking at the htsget-refserver config over here:

https://github.com/ga4gh/htsget-refserver/blob/develop/deployments/ga4gh/prod/config-server.json#L57

Notice that the regexp allows you to query other chromosomes.

Hope it's clearer now? This should help test #1344 and fix our UMCCR backed in umccr/data-portal-client#64.

@jrobinso
Copy link
Contributor

@brainstorm That's helpful but I don't know how you can derive that from the info json.

As far as I know there are only 2 services described in the API, reads and variant (http://samtools.github.io/hts-specs/htsget.html).. If the endpoint you list above follows the recommended pattern I suppose the "id" of the data is 1000genomes.phase1.chrY.

What I'm really looking for now is a "reads" endpoint that is stable and that I can test against. I think Google had one at one time but I can't find it anymore. I might guess that https://htsget.ga4gh.org/reads/<id>. might work, but of course one needs to know an .

@brainstorm
Copy link

brainstorm commented Jun 10, 2021

Yeah, the reads endpoint as you were getting at follows pretty much the same logic with regexps from the config file I referred to.

I.e, check this one for reads:

https://htsget.ga4gh.org/reads/giab.NA12878.NIST7086.1

I had exactly the same questions with the public htsget reference server, so it clearly needs a better/clearer way to onboard new htsget client developers, /cc @jb-adams

@jrobinso
Copy link
Contributor

OK that worked, finally something to test against. Its been more than 3 years since that was implemented and there isn't much evidence of it being used, but its nice to get a minimal test for reads working again.

@jb-adams
Copy link

The documentation page is now being hosted on the same server as the htsget service itself. Please try this new documentation link: https://htsget.ga4gh.org/docs/index.html

The page gives a breakdown of "Reads Datasets" and "Variants Datasets" and some of the ids the reference server supports.

Please let me know if this documentation is sufficient to construct test queries, or if more info is needed.

@brainstorm
Copy link

brainstorm commented Jun 10, 2021

@jb-adams, the service info endpoint documentation URL is still 404'ing:

Screen Shot 2021-06-10 at 10 20 27 pm

Would you mind redeploying the official GA4GH public htsget server accordingly to the new docs to avoid confusion?

@jb-adams
Copy link

@brainstorm good catch :) I will try to redeploy the server today

@jrobinso
Copy link
Contributor

@brainstorm @jb-adams Thanks for the pointers to test data and variant spec. I'm looking at the variant service now. Right away I notice it has the same, in my opinion, fundamental problem as the reads service. I raised this years ago, before it was even part of GA4GH. It was going to be discussed but I never learned of the results if any. The basic problem is there is no way to discover what referenceNames are present in a dataset, and calling the service with a wrong reference name is a 400 error! This is a very nasty thing to do to clients. As everyone knows there are 2 common reference naming conventions, "chr1" and "1", and using the "wrong" one WRT any given dataset will throw an error. Ditto, I assume, querying for a genomic range that is outside of the dataset. This is not an error IMO, there's simply nothing "there" so an object noting an empty result should be returned.

The example, BTW, uses "chr1" but that is going to give a 400 error with any of the example datasets.

I realize this isn't the right forum to raise this. Where would the right forum be?

@jrobinso
Copy link
Contributor

jrobinso commented Jun 12, 2021

@jb-adams. Where is the official doc? The 1.4.1 page you reference above does not seem to describe a "ticket". Coming in through the GA4GH home page I get a pointer to https://samtools.github.io/hts-specs/htsget.html which does describe a "ticket".

I'm finding strange behavior with the reference variant service and don't know if its a misunderstanding on my part of the spec, a bug in the reference server, or both. Does the reference server have a git project where issues can be raised? A couple of confusing things, I'm getting different "tickets" if I do or do not include a reference range in the initial query. I would expect to get back, perhaps, more URLs if I do not than if I do, but they are actually different in form. The second problem I have is retrieving the vcf header, the parameter. "class=header" does not seem to work. Anyway this isn't the place to discuss that, but where is?

Thanks for all your hard work!

@jrobinso
Copy link
Contributor

@jb-adams final info for this evening, this is what I'm trying to get working

https://htsget.ga4gh.org/variants/1000genomes.phase1.chr8?format=VCF&referenceName=8&start=128732400&end=128770475

Just can't get there. Following the URLs from the ticket gets me a portion of the VCF file containing that region, with random bits before and after, but no VCF header. Its almost like its just a raw tabix query. The "class=header" directive doesn't seem to do anything.

@victorskl
Copy link

victorskl commented Jun 12, 2021

@jrobinso, please see if this help.

referenceName=1 is equal to bcftool view --region xx flag, behind the scene, IIRC.

  • Get by ID
curl -s -X GET "https://htsget.ga4gh.org/variants/giab.NA12878" | jq
  • Get ticket for Region 1. Response contains header block and body block for client to fetch.
curl -s -X GET "https://htsget.ga4gh.org/variants/giab.NA12878?format=VCF&referenceName=1&start=0&end=57224045" | jq
  • Fetch header
curl -s -H "HtsgetBlockClass: header" -H "HtsgetCurrentBlock: 0" -H "HtsgetTotalBlocks: 2" -X GET "https://htsget.ga4gh.org/variants/data/giab.NA12878" --output giab_NA12878__header.vcf
  • Check header size
wc -c giab_NA12878__header.vcf
    5724 giab_NA12878__header.vcf
  • Fetch body
curl -s -H "HtsgetCurrentBlock: 1" -H "HtsgetTotalBlocks: 2" -X GET "https://htsget.ga4gh.org/variants/data/giab.NA12878?end=57224045&referenceName=1&start=0" --output giab_NA12878__body.vcf
  • Check body size
wc -c giab_NA12878__body.vcf
 27594276 giab_NA12878__body.vcf
  • Assemble the blocks
cat giab_NA12878__header.vcf giab_NA12878__body.vcf > giab_NA12878.vcf
  • Check/view them
bcftools stats giab_NA12878.vcf | less
bcftools view giab_NA12878.vcf | less
  • Or, open it in IGV desktop

igv_htsget_vcf

@brainstorm
Copy link

I realize this isn't the right forum to raise this. Where would the right forum be?

It depends, for the htsget spec itself, I've just opened samtools/hts-specs#578 for your remarks and good points about refName discoverability for clients.

For things that pertain to the particular reference implementation of htsget, I'd head to:

https://github.com/ga4gh/htsget-refserver

I agree with you that htsget should allow some sort of basic discovery/enumeration of refNames, I'm just wondering to what extent we should include this as a basic (and incomplete) mechanism for refNames only or if there'll be feature creep with other facets of the formats and datasets that would be interesting to make them discoverable... and the multiple rabbit holes that'd entail.

@jmarshall
Copy link

jmarshall commented Jun 12, 2021

@jrobinso wrote:

The basic problem is there is no way to discover what referenceNames are present in a dataset
[…]
The second problem I have is retrieving the vcf header, the parameter. "class=header" does not seem to work

As you've realised in this second comment, …?class=header requests are htsget's mechanism for referenceName discovery (as was advocated for by @jrobinso in samtools/hts-specs#311 (comment) 😄).

If class=header isn't causing the returned ticket to represent just a VCF header, that would appear to be an issue in the server implementation you're testing against.

Making a header query gave me approximately the sort of ticket I was expecting:

$ curl -s 'https://htsget.ga4gh.org/variants/1000genomes.phase1.chr8?class=header' | python3 -mjson.tool
{
    "htsget": {
        "format": "VCF",
        "urls": [
            {
                "url": "https://htsget.ga4gh.org/variants/data/1000genomes.phase1.chr8?class=header",
                "headers": {
                    "HtsgetBlockClass": "header",
                    "HtsgetCurrentBlock": "0",
                    "HtsgetTotalBlocks": "1"
                },
                "class": "header"
            }
        ]
    }
}

i.e., a single URL representing the VCF headers.

However when I then tried to retrieve that URL (using those HtsgetXYZ HTTP headers) — expecting to get the actual VCF headers — I instead got another ticket, which does seem broken.
This was user error: I edited the curl command to add -H HtsgetBlockClass etc etc but did not notice that the URL returned in the ticket subtly differs from the first URL (by an additional …/data/…). With the URL also adjusted,

$ curl -H 'HtsgetBlockClass: header' -H 'HtsgetCurrentBlock: 0' -H 'HtsgetTotalBlocks: 1' \
       'https://htsget.ga4gh.org/variants/data/1000genomes.phase1.chr8?class=header'

returns text VCF headers as expected.

(It's slightly odd that the different URL returned within the ticket still has ?class=header on the end, and in this example the same VCF header data is returned if you edit it out.)

@jrobinso
Copy link
Contributor

@jmarshall thanks for the response, and for the "class=header" implementation. Yes I did request that, and its an indirect way to get reference names for alignments, which works perfectly for me. However the reference names are not in a VCF header, at least they aren't required to be.

@jrobinso
Copy link
Contributor

@victorskl Thanks, yes that is helpful. I will give this another try next week.

@jrobinso
Copy link
Contributor

@brainstorm re "rabbit holes", a call with a referenceName that isn't in the dataset is an actual error, if its an error the legal names should be discoverable. I've long had a workaround in place obviously, and there is a solution now for BAM/CRAM with the header option, so I'll let this one die.

@victorskl
Copy link

victorskl commented Jun 12, 2021

For header only, I tried this way and it works for me, too.

  • Get ticket for header only
curl -s "https://htsget.ga4gh.org/variants/1000genomes.phase1.chr8?class=header" | jq
{
  "htsget": {
    "format": "VCF",
    "urls": [
      {
        "url": "https://htsget.ga4gh.org/variants/data/1000genomes.phase1.chr8?class=header",
        "headers": {
          "HtsgetBlockClass": "header",
          "HtsgetCurrentBlock": "0",
          "HtsgetTotalBlocks": "1"
        },
        "class": "header"
      }
    ]
  }
}
  • Fetch VCF header data (include Htsget* headers hinted by server + data fetch endpoint is /variants/data/)
curl -s -H "HtsgetBlockClass: header" -H "HtsgetCurrentBlock: 0" -H "HtsgetTotalBlocks: 1" -X GET "https://htsget.ga4gh.org/variants/data/1000genomes.phase1.chr8?class=header" --output 1000genomes.phase1.chr8__header__only.vcf
  • Check header file size
wc -c 1000genomes.phase1.chr8__header__only.vcf
   11442 1000genomes.phase1.chr8__header__only.vcf
  • Check headers
bcftools view -h 1000genomes.phase1.chr8__header__only.vcf | less

However the reference names are not in a VCF header, at least they aren't required to be.

Yep; I reckon, this is due to this underlay dataset itself?

curl -s "https://htsget.ga4gh.org/variants/1000genomes.phase1.chr8" | jq

i.e. note in filename .chr8 concordance to

...
##reference=GRCh37
##contig=<ID=8>
...

Would you mind try giab.NA12878 dataset?

  • Get ticket (header only)
curl -s -X GET "https://htsget.ga4gh.org/variants/giab.NA12878?class=header" | jq
{
  "htsget": {
    "format": "VCF",
    "urls": [
      {
        "url": "https://htsget.ga4gh.org/variants/data/giab.NA12878?class=header",
        "headers": {
          "HtsgetBlockClass": "header",
          "HtsgetCurrentBlock": "0",
          "HtsgetTotalBlocks": "1"
        },
        "class": "header"
      }
    ]
  }
}
  • Fetch header
curl -s -H "HtsgetBlockClass: header" -H "HtsgetCurrentBlock: 0" -H "HtsgetTotalBlocks: 1" -X GET "https://htsget.ga4gh.org/variants/data/giab.NA12878?class=header" --output giab.NA12878__header__only.vcf
  • Check file size
wc -c giab.NA12878__header__only.vcf
    5724 giab.NA12878__header__only.vcf
  • View headers
bcftools view -h giab.NA12878__header__only.vcf | less

And this dataset work better, I reckon? At least, we use this variant dataset for Htsget + Passport experiment.

htsget_passport_vcf

HTH

@jrobinso
Copy link
Contributor

I have this working via node unit tests, however its not working in the browser because there doesn't seem to be CORS headers on the responses, at least for this URL. @jb-adams I will raise on issue on the test server repo

curl -i 'https://htsget.ga4gh.org/reads/giab.NA12878.NIST7086.1?class=header'
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Mon, 14 Jun 2021 17:30:54 GMT
Content-Type: application/vnd.ga4gh.htsget.v1.2.0+json; charset=utf-8
Content-Length: 226
Connection: keep-alive

{"htsget":{"format":"BAM","urls":[{"url":"https://htsget.ga4gh.org/reads/data/giab.NA12878.NIST7086.1?class=header","headers":{"HtsgetBlockClass":"header","HtsgetCurrentBlock":"0","HtsgetTotalBlocks":"1"},"class":"header"}]}}

jrobinso added a commit that referenced this issue Jun 14, 2021
* Support variant endpoint (see issue #1187)
* htsget refactoring
@jrobinso
Copy link
Contributor

Variant support has been added. The configuration below should work once CORS is implemented on the reference server.

    var options = {
        genome: "hg19",
        locus: "chr8:128,734,098-128,763,217",
        tracks: [
            {
                type: 'variant',
                name: 'giab.NA12878',
                sourceType: 'htsget',
                format: "vcf",
                url: 'https://htsget.ga4gh.org/variants/',
                id: 'giab.NA12878'
            },
            {
                type: 'alignment',
                name: 'giab.NA12878.NIST7086.1',
                sourceType: 'htsget',
                format: 'bam',
                url: 'https://htsget.ga4gh.org/reads/',
                id: 'giab.NA12878.NIST7086.1'
            }
        ]
    };

    igv.createBrowser(div, options)

@brainstorm
Copy link

@jb-adams Time to review and merge ga4gh/htsget-refserver#24 ? :)

@jrobinso
Copy link
Contributor

jrobinso commented Jun 15, 2021

@victorskl @brainstorm I'm not sure I'm happy with the form of the config above, I'm mulling over changing it to a complet url string, rather than a url + ID, this would be more consistent with other URL based sources. I would of course try to figure out how to do it in a backward compatible way, but with almost no servers extant I think @victorskl might be the only actual user of htsget through igv.js at the moment. Any thoughts?

So for example.

            {
                type: 'variant',
                name: 'giab.NA12878',
                sourceType: 'htsget',
                format: "vcf",
                url: 'https://htsget.ga4gh.org/variants/giab.NA12878'
            }

@victorskl
Copy link

victorskl commented Jun 15, 2021

[...] changing it to a complet url string, rather than a url + ID [...] Any thoughts?

Just reminder, currently is url, endpoint and id according to the Wiki entry.

I think, this is ok. I would say, it will become a complete url config instead of (base) url, endpoint and id split. Track config is typically dynamic loading track in most app, I reckon. At least, this is the case in our data portal client -- having UI dialog to allow User select BAM or VCF file, etc. Hence, endpoint (Base URL) is more or less constant config somewhere. And then, string concat is inevitable. And probably it will happen in a function if become absolute url.

@jrobinso
Copy link
Contributor

I will support the following options going forward, not backward compatible but we're allowed to break it once.

{
   type: 'alignment',
   sourceType: 'htsget',
   format: 'bam',
   url: 'https://htsget.ga4gh.org/reads/giab.NA12878.NIST7086.1',
   name: 'NA12878'
}
{
   type: 'alignment',
   sourceType: 'htsget',
   format: 'bam',
   endpoint: 'https://htsget.ga4gh.org/reads/',
   id: 'giab.NA12878.NIST7086.1',
   name: 'NA12878'
}

@jb-adams
Copy link

| @jb-adams Time to review and merge ga4gh/htsget-refserver#24 ? :)

I merged this into develop, but I am not seeing any CORS headers in the response when I run the server under default settings.

e.g. a GET request to http://localhost:3000/reads/service-info does not respond with a header of Access-Control-Allow-Origin: http://localhost

Is this something I need to configure on my end? I am simply running the new build with all defaults (no config)

@jb-adams
Copy link

I'm finding strange behavior with the reference variant service and don't know if its a misunderstanding on my part of the spec, a bug in the reference server, or both. Does the reference server have a git project where issues can be raised?

Yes, any issues with the reference server can be raised here -> https://github.com/ga4gh/htsget-refserver/issues.

If it's an issue with the htsget spec itself (such as no way to discover referenceNames), then that issue is best raised on the hts-specs repo (where the htsget spec is housed) -> https://github.com/samtools/hts-specs/issues

The reference server aims to stay closely aligned with the spec, and not implement any features that are not described in the spec

A couple of confusing things, I'm getting different "tickets" if I do or do not include a reference range in the initial query.

So long as you are able to concatenate each filepart returned by each ticket, and the final concatenated result is a valid VCF, this is expected behaviour. Non-reference range queries are simpler to process, and generally the server just refers the client to the data source (such as an S3 URL), because no VCF processing is required. For reference range requests, the server itself needs to process and stream back only the requested ranges, and the server does this by splitting tickets according to genomic reference.

The second problem I have is retrieving the vcf header, the parameter. "class=header" does not seem to work. Anyway this isn't the place to discuss that, but where is?

This may be an error with the server, and it would be good to raise an issue on the repo

@jb-adams
Copy link

@jmarshall @jrobinso I'm not quite seeing why the ?class=header parameter request is said to be broken.

curl -s 'https://htsget.ga4gh.org/variants/1000genomes.phase1.chr8?class=header' | python3 -mjson.tool
{
    "htsget": {
        "format": "VCF",
        "urls": [
            {
                "url": "https://htsget.ga4gh.org/variants/data/1000genomes.phase1.chr8?class=header",
                "headers": {
                    "HtsgetBlockClass": "header",
                    "HtsgetCurrentBlock": "0",
                    "HtsgetTotalBlocks": "1"
                },
                "class": "header"
            }
        ]
    }
}

If I construct a request in Postman using the provided URL and apply all 3 HtsgetXYZ HTTP headers, I receive the VCF header for the underlying object:

image

@jrobinso
Copy link
Contributor

@jb-adams I think that was user error on my part, @jmarshall might have found a different issue. I had 2 expectations that were false, (1) I expected to receive the header, not a "ticket" for the header, (2) I expected to receive the header as text, not as bgz compressed data. Its now working fine with my newly calibrated expectations, well with the exception of the CORS issue raised elsewhere.

@jrobinso
Copy link
Contributor

@jb-adams RE the different urls for whole file vs range, a word of caution. Many organizations strip range headers from outgoing client requests, if the entire file is requested a url for the entire file (rather than multiple parts with range requests) will have a higher chance of success. We actually run a service for IGV desktop to work around this, its so common, and the service still gets many hits.

@jmarshall
Copy link

@jb-adams: This was user error on my part too, as described in an update to #1187 (comment). This was my fault, but for pedagogical purposes you might consider making the different URLs used by the example server a bit more visually obviously different!

@jrobinso: We considered and rejected having this request return the header directly rather than via a ticket; see samtools/hts-specs#322 (comment) onwards. The htslib client implementation sniffs the returned data to determine whether it is a ticket, but other clients may not; so it was considered that clients making an htsget request — therefore expecting a ticket — would be reasonably surprised and disgruntled to receive something other than a ticket!

My curl command did receive the header as text, and when I asked for BCF I got “file format: 'BCF' not supported”. Was what you got either binary BCF data after decompression or definitely bgzipped? Might it instead be that the text response had been plain gzipped by a proxy on the way back to you or something like that? (If so, that's still something unexpected that the spec might need to mention…)

As for range headers: refget uses range headers quite heavily, and one anticipated htsget server implementation is to return a ticket with an array of urls pointing to the same large complete vcf.gz file but differing in the "headers": { "Range": "…" } range headers used to return different blocks from it. So it might be useful to file an hts-specs issue pointing out this concern and saying something about what kind of organisations are doing this crazy thing 😄

@jrobinso
Copy link
Contributor

jrobinso commented Jun 15, 2021

@jmarshall Sorry if I'm causing confusing, its the BAM header that is bgzipped, not the variant file. I expected to get a plain text SAM header from the /reads/ endpoint with class=header, but OTOH the request is for "bam" format so my expectation is not reasonable. It is bgzipped, but that's what I asked for.

I don't think stripping range and other headers is that unusual, crazy is a value judgement. The consortium of Boston hospitals here does that for example, client requests are all run through a proxy, for example something called "Squid", and non-whitelisted headers are stripped. The range header is almost never whitelisted by default. We have managed to get our servers whitelisted from this stripping, but its a hassle. One value add htsget potentially provides over just hosting indexed files is a means to do range queries on bam and vcf files without the use of "range" headers, if the ticket returns urls with these headers that value add is negated. This is not a spec thing, its an implementation consideration, if implementing a server that is going to provide public data I would avoid use of "range headers", or any header not necessary, because the callback requests won't necessarily have them.

@jmarshall
Copy link

Thank you for clarifying that that bit was about reads and BAM, not variants and VCF. It's slightly unfortunate that you can't currently ask for ?class=header&format=SAM — the historical reason SAM is not a valid value for htsget's format is that it's uncompressed and would be a silly choice for sending large amounts of sequencing data over the wire. (OTOH the good news is that you are guaranteed to be able to extract the list of referenceNames from a BAM header even if it does not contain @SQ SAM headers.)

That was indeed a tongue-in-cheek value judgement, hence the smiley. If you would like the htsget spec maintainers to be aware of those header-related implementation considerations and for them to consider adding a note discouraging such implementations, please raise your second paragraph as an hts-specs issue.

@victorskl
Copy link

I merged this into develop, but I am not seeing any CORS headers in the response when I run the server under default settings.

e.g. a GET request to http://localhost:3000/reads/service-info does not respond with a header of Access-Control-Allow-Origin: http://localhost

Is this something I need to configure on my end? I am simply running the new build with all defaults (no config)

Thanks for merging, Jeremy @jb-adams

Nop; you do not need to configure if you just run it locally with default settings. Default setting allows CORS from http://localhost.

Please try as follows:

git clone https://github.com/ga4gh/htsget-refserver.git
cd htsget-refserver
git checkout develop
go build -o ./htsget-refserver ./cmd
./htsget-refserver

Verify CORS with curl as follows:

  • Note that response header contains Access-Control-Allow-Origin: http://localhost
curl -s -v -H "Origin: http://localhost" -X GET http://localhost:3000/reads/service-info | jq

*   Trying ::1:3000...
* Connected to localhost (::1) port 3000 (#0)
> GET /reads/service-info HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.77.0
> Accept: */*
> Origin: http://localhost
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: http://localhost
< Content-Type: application/vnd.ga4gh.htsget.v1.2.0+json; charset=utf-8
< Vary: Origin
< Date: Wed, 16 Jun 2021 00:40:37 GMT
< Content-Length: 627
<
{ [627 bytes data]
* Connection #0 to host localhost left intact
{
  "id": "htsgetref.reads",
  "name": "GA4GH htsget reference server reads endpoint",
  "type": {
    "group": "org.ga4gh",
    "artifact": "htsget",
    "version": "1.2.0"
  },
  "description": "Stream alignment files (BAM/CRAM) according to GA4GH htsget protocol",
  "organization": {
    "name": "Global Alliance for Genomics and Health",
    "url": "https://ga4gh.org"
  },
  "contactUrl": "mailto:[email protected]",
  "documentationUrl": "https://ga4gh.org",
  "createdAt": "2020-09-01T12:00:00Z",
  "updatedAt": "2020-09-01T12:00:00Z",
  "environment": "test",
  "version": "1.4.1",
  "htsget": {
    "datatype": "reads",
    "formats": [
      "BAM"
    ],
    "fieldsParameterEffective": true,
    "tagsParametersEffective": true
  }
}
curl -s -v -H "Origin: http://localhost" \
  -H "Access-Control-Request-Method: GET" \
  -H "Access-Control-Request-Headers: X-Requested-With" \
  -X OPTIONS \
  http://localhost:3000/reads/service-info

*   Trying ::1:3000...
* Connected to localhost (::1) port 3000 (#0)
> OPTIONS /reads/service-info HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.77.0
> Accept: */*
> Origin: http://localhost
> Access-Control-Request-Method: GET
> Access-Control-Request-Headers: X-Requested-With
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: X-Requested-With
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Origin: http://localhost
< Access-Control-Max-Age: 300
< Vary: Origin
< Vary: Access-Control-Request-Method
< Vary: Access-Control-Request-Headers
< Date: Wed, 16 Jun 2021 00:43:41 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact
  • Simulate Browser Preflight OPTION request for different origin e.g. http://example.com
  • Note that response header does not contain Access-Control-Allow-*
  • This is the situation that get blocked by CORS Preflight request in Browser
curl -s -v -H "Origin: http://example.com" \
  -H "Access-Control-Request-Method: GET" \
  -H "Access-Control-Request-Headers: X-Requested-With" \
  -X OPTIONS \
  http://localhost:3000/reads/service-info

*   Trying ::1:3000...
* Connected to localhost (::1) port 3000 (#0)
> OPTIONS /reads/service-info HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.77.0
> Accept: */*
> Origin: http://example.com
> Access-Control-Request-Method: GET
> Access-Control-Request-Headers: X-Requested-With
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Vary: Origin
< Vary: Access-Control-Request-Method
< Vary: Access-Control-Request-Headers
< Date: Wed, 16 Jun 2021 00:47:33 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact

HTH

@jrobinso
Copy link
Contributor

Hi all, this thread has now diverged a bit, but variants are supported now in master so I'm closing this. See dev/htsget.html for a browser example, and test/testHtsgetReader.js for some (minimal) node unit tests.

@jrobinso jrobinso added this to the 2.9.0 milestone Jun 16, 2021
@jimmyhli
Copy link
Author

jimmyhli commented Jun 16, 2021

Thanks everyone for your work into this. I clearly missed out on a long and interesting conversation 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants