Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write JSON representation for the returned BAM header #2

Closed
brainstorm opened this issue Apr 5, 2022 · 0 comments
Closed

Write JSON representation for the returned BAM header #2

brainstorm opened this issue Apr 5, 2022 · 0 comments
Labels
gsoc Google Summer of Code 2022 first good issues

Comments

@brainstorm
Copy link
Owner

brainstorm commented Apr 5, 2022

Hello prospective GSoC 2022 candidate!

This challenge, should you take it, consists on extending the method bam_header_as_json so that the BAM header's return value is a well-formed and formatted JSON. The particular final JSON representation is majorly up to you as there can be as many representations as use cases (and perhaps even opinions?). While some previous knowledge of the bioinformatics formats is valuable, it is not required for this task.

If you manage to run the example in this repo the output should look like this (the whole header in a message JSON key):

Screen Shot 2022-04-05 at 2 39 49 pm

Which is quite far from ideal in terms of representation and formatting, isn't it? Think about the applications that might consume that output and try to weed out some specific detail... at present, that could be impractical to say the least.

According to the official hts spec, the SAM (uncompressed BAM) header, there are multiple fields in the header, some of them are even formalized as protobuffer structs by Google. Is a direct representation useful and flexible enough for multi-format export functionality? How would you represent an IR (Internal Representation) for those fields so that future format exporters can read from that definition with performance and ease of use in mind?

If testing/deploying your solution in this AWS-coupled code gets on the way, feel free to create your own repository and present the solution there, together with some tests.

Happy hacking!

@brainstorm brainstorm added the gsoc Google Summer of Code 2022 first good issues label Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Google Summer of Code 2022 first good issues
Projects
None yet
Development

No branches or pull requests

1 participant