Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods sam_hdr_get and sam_hdr_set #1052

Closed
wants to merge 2 commits into from

Conversation

valeriuo
Copy link
Contributor

The samFile::bam_header can be used by applications to carry around a pointer to the header structure, corresponding to that particular file. These methods allow for access to this internal pointer.

@valeriuo
Copy link
Contributor Author

Rebased. To be reviewed and squashed.

@valeriuo valeriuo requested a review from daviesrob March 19, 2020 11:17
@jmarshall
Copy link
Member

jmarshall commented Mar 19, 2020

This field was only recently introduced for use (internally) with iterators on SAM files (4c2c536). Does letting user code change it interfere with that?

Why can't it be cleared via sam_hdr_set(fp, NULL, 1)?

What is the reason for now providing this functionality to user code?

@valeriuo
Copy link
Contributor Author

What is the reason for now providing this functionality to user code?

It is needed by samtools/samtools#1211, as a more efficient way of carrying around a reference to a header struct for a dynamic number of output files.

This field was only recently introduced for use with iterators on SAM files (4c2c536). Does letting user code change it interfere with that?

It doesn't in this case. It is only used here on the write side, whereas the iterator functionality implies r mode. The user code had the means of changing this field before, as the samFile struct is public.

Why can't it be cleared via sam_hdr_set(fp, NULL, 1)?

I didn't see a use case for this, but can be amended.

@jmarshall
Copy link
Member

The user code had the means of changing this field before, as the samFile struct is public.

struct htsFile is certainly not public. As the maintainer note in htslib/hts.h says, it's not an opaque struct merely because various inline functions (mostly historically?) access its fields. That note should be accompanied by a big USER CODE DO NOT USE THIS -- USE THE ACCESSOR FUNCTIONS INSTEAD comment, but by accident that was never added (though it should be).

It is needed by samtools/samtools#1211

It is desired by the samtools PR, but as this adds general API functionality to HTSlib it needs to be considered on its own merits.

The legacy samtools API embedded its header struct in its samfile_t, implicitly read and wrote the headers during samopen(), and implicitly used that header in its samread()/samwrite() functions. The HTSlib API right from the start pulled the headers out, requiring an explicit sam_hdr_read/_write() call and the headers to be explicitly passed to sam_read1/_write1().

This is a big step towards reverting that (the next step will be for sam_write1(fp, NULL, b) to use sam_hdr_get() implicitly, on the grounds that it is convenient), so the pros and cons of “implicitly attached headers” need to be consciously weighed.

@daviesrob
Copy link
Member

It could be argued that the samtools API was better. I've never entirely seen the point of having to pass in the header to sam_read1() and sam_write1(), especially as they don't work properly (at least for SAM files) if you supply anything other than the header that was at the top of the file. And of course you can't pass it to the iterators, which is why it had to be added to htsFile for SAM files.

Maybe a better interface would be a sam_open_write_with_header(name, hdr, thread_pool) which combines sam_open(), hts_set_thread_pool() and sam_hdr_write(), and also allows you to call sam_write1(fp, NULL, b)? That would somewhat de-faff writing sam files and make header-related misuse less likely. The open function would need a better name, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants