Skip to content

Commit

Permalink
htsget: 'class' protocol attributes for header & body (#322)
Browse files Browse the repository at this point in the history
htsget protocol responses may include the class attribute to distinguish which URL parts constitute the BAM/CRAM/VCF header and which the data body. Clients may also request only one or the other.
  • Loading branch information
Cristina Yenyxe Gonzalez Garcia authored and mlin committed Apr 24, 2019
1 parent ad374ab commit 476afba
Showing 1 changed file with 44 additions and 5 deletions.
49 changes: 44 additions & 5 deletions htsget.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,27 @@ The server SHOULD reply with an `UnsupportedFormat` error if the requested forma
</td></tr>
<tr markdown="block"><td>

`class`
_optional string_
</td><td>

Request different classes of data.
By default, i.e., when `class` is not specified, the response will represent a complete read or variant data stream, encompassing SAM/CRAM/VCF headers, body data records, and EOF marker.

If `class` is specified, its value MUST be one of the following:
<table>
<tr><td>

`header`
</td><td>

Request the SAM/CRAM/VCF headers only.

The server SHOULD respond with an `InvalidInput` error if any other htsget query parameters other than `format` are specified at the same time as `class=header`.
</td></tr>
</table>
</td></tr>
<tr markdown="block"><td>
`referenceName`
_optional_
</td><td>
Expand Down Expand Up @@ -307,6 +328,17 @@ _optional object_

For HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is `{"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}`, then the client must supply the headers `Range: bytes=0-1023` and `Authorization: Bearer xxxx` with the HTTPS request to the URL.
</td></tr>
<tr markdown="block"><td>

`class`
_optional string_
</td><td>

For file formats whose specification describes a header and a body, the class indicates which of the two will be retrieved when querying this URL. The allowed values are `header` and `body`.

Either all or none of the URLs in the response MUST have a class attribute.
If `class` fields are not supplied, no assumptions can be made about which data blocks contain headers, body records, or parts of both.
</td></tr>
</table>

</td></tr>
Expand All @@ -329,24 +361,28 @@ An example of a JSON response is:
"format" : "BAM",
"urls" : [
{
"url" : "data:application/vnd.ga4gh.bam;base64,QkFNAQ=="
"url" : "data:application/vnd.ga4gh.bam;base64,QkFNAQ==",
"class" : "header"
},
{
"url" : "https://htsget.blocksrv.example/sample1234/header"
"url" : "https://htsget.blocksrv.example/sample1234/header",
"class" : "header"
},
{
"url" : "https://htsget.blocksrv.example/sample1234/run1.bam",
"headers" : {
"Authorization" : "Bearer xxxx",
"Range" : "bytes=65536-1003750"
}
},
"class" : "body"
},
{
"url" : "https://htsget.blocksrv.example/sample1234/run1.bam",
"headers" : {
"Authorization" : "Bearer xxxx",
"Range" : "bytes=2744831-9375732"
}
},
"class" : "body"
}
]
}
Expand All @@ -364,7 +400,10 @@ An example of a JSON response is:
3. Client fetches the data blocks using the URLs and headers.
4. Client concatenates data blocks to produce local blob.

While the blocks must be finally concatenated in the given order, the client may fetch them in parallel.
While the blocks must be finally concatenated in the given order, the client may fetch them in parallel and/or reuse cached data from URLs that have previously been downloaded.

When making a series of requests to fetch reads or variants within different regions of the same `<id>` resource, clients may wish to avoid re-fetching the SAM/CRAM/VCF headers each time, especially if they are large.
If the ticket contains `class` fields, the client may reuse previously downloaded and parsed headers rather than re-fetching the `header`-class URLs.

### HTTPS data block URLs

Expand Down

0 comments on commit 476afba

Please sign in to comment.