This repository has been archived by the owner on Jun 21, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Point to a real S3 object, still need to solve 'failed to fill whole …
…buffer' though, I suspect it might be the Cursor? Reading https://rust-lang.github.io/rfcs/0980-read-exact.html, /cc @zaeleus
- Loading branch information
9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cursor position is moved, in this case, to the end when writing to the buffer; so there's nothing for the reader to consume. Try resetting the position to the start before reading:
s3_obj_buffer.set_position(0)
.9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch, thanks, that did the trick:
9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
io::Result<String>
type ofread_header
is a bit rough for pretty-printing. I'm thinking about PR-ing Noodles so thatread_header
it returns astruct
instead and so that can be easily consumed by SerDe?Is that a change you'd welcome? ;)
/cc @victorskl
9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bam::Reader::read_header
intentionally returns a raw string. The implementation ofstd::str::FromStr
forsam::Header
can be used to parse the header:Output
Serialialization is done via
std::fmt::Display
, e.g.,Output
9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zaeleus, I understand that, I went through a similar
std::fmt::Display
situation with YaSerDe... but my question was more w.r.tserde::ser::Serialize
, namely, if I wanted to serialize the SAM header to JSON, like this:I'm met with the following non implemented trait (understandably):
My guess is that you'll like this to be a future third party crate impl (NoodlesSerDe?) instead of bolted into your crate(s)?
I'm just trying to get a pulse on how lean vs feature-rich you want Noodles to be or become. Since you are the maintainer, I will totally respect your philosophy and thinking from now on in either way since I have an intuition about the tradeoffs, I just want to see where you err at the moment.
Or perhaps there's already a way to do this? If that's the case I'm learning as I go through your codebase, so bear with me. Also, there's no rush at all to reply on a weekend, btw ;)
/cc @victorskl
9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you're asking. No, there is currently no implementation of this in noodles. You would have to write your own serializer for each header type. This itself isn't hard but may be tedious, given the number of optional fields in each type.
This seems too niche to include as a standard feature. Is there an application or spec that reads SAM headers as something serialized other than tab-delimited text?
9e7a200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've been discussing with @chris-zen on how hard would it be to annotate the structs for SerDe in a third party crate. Not urgent at all, just curiosity level at this point in time.
Indeed, other groups have used Apache Parquet in the past, such as ADAM, Presto/Athena UDFs and also other formats such as ORC, which could be used by Spark bioinformatics tooling.
In a less "big data" mindset, simple examples that spit out JSON can be fairly amenable for small WASM prototypes, genome visualizers and whatnot. Hope that makes sense?