-
Notifications
You must be signed in to change notification settings - Fork 369
Consider parsing models with binrw
#117
Comments
I wouldn't really agree about the expert part, but I can tell you what I think. About binrw, I actually already checked it out and commented briefly: #79 (comment) It basically wants to be using a file that's seekable, it didn't seem like it had first class support for stuff like memory mapping. I'd also say I don't really like the derive macro style for parsers. It's great when the format aligns nicely with your structures, but I think it's hard to work with when there's a mismatch. And if you commit to that sort of thing, you don't necessarily know until later on that there's going to be a problem because (presumably) it worked okay for your first parser or you wouldn't have stuck with it. In that same thread I linked to, you can see the start of an attempt to port the Having to manually thread the parsed input is irritating but there are ways to avoid it a lot of the time and it's not something that'll actually stop you from doing what you want. I also looked at
|
The documentation states that
which seems fine to me. We'd just memory-map the file, wrap a
I've generally had a good experience with them; I find a declarative structure much easier to parse through than parsing code where it's applicable. I think it's a worthwhile experiment to see if we can fit GGML/GGMJ/GGJT into Regarding If |
Sorry, I wasn't clear. That's what I was talking about when I said it didn't have first class support (not working with it directly, requiring a wrapper). I wasn't saying it couldn't work at all.
Tell me about it! https://github.com/KerfuffleV2/mdoexperiments Also, I wasn't just being modest when I said I wasn't a parsing expert. I honestly haven't done much parsing stuff with binary files and Rust. So maybe my lack of enthusiasm for I usually like stuff I can control more directly though.
The two things aren't really mutually exclusive. Could use |
The ggml file format is really simple to parse. Every format has <20 Adding a 3rdparty library would make the code harder to understand. As for code reuse, using callback like |
The reason I'd like to investigate a declarative format is that it makes it really simple to see the differences between the formats, and to write code that can read/write them. You can see the data as it's laid out in the file without having to read through all of the parsing code, at least in theory. |
finally i ran out of time and end up parse it myself hard coded 😅 shame on me. |
Something I've noticed in PRs is that we all keep reimplementing the same read/write logic for models (#83, #84, #85, #114). I'd like to minimise this duplication by using
binrw
to describe the file format in a declarative fashion, making it trivial to support the multiple formats and to read and write to each of them.I believe that this should be possible and that there aren't any showstoppers, but I could be wrong about this. @KerfuffleV2 is probably the relevant domain expert on parsing models right now - would appreciate your input!
The text was updated successfully, but these errors were encountered: