-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema types rebased to use codegen types for the data #107
Conversation
... This is probably about to get plowed under. We're planning significant interface changes for a v0.7 release, and that will affect the codegen outputs in this PR significantly. I don't think saving the snapshot of the old (current) codegen will provide any significant utility, so, this will get rebased and so in the future, the git history will only show the new codegen (after v0.7). I'll just leave this PR up as it stands for a while, but it'll become unmergable in the near future, and I might not get back to it for a while. |
…r the fully validated data which is implemented by retaining and accessing into the raw data.
Move the existing setup from the schema-schema "demo" dir to here; and rig it up with go generate conventions that I'm hoisting back from mvdan's https://github.com/ipld/go-ipld-adl-hamt/blob/master/gen.go . Move the parse tests with it.
filenames change due to #105 . gofmt also applied for the first time. from here on out: `go generate` should just cause these files to be automagically updated and formatted.
a123674
to
8681ab7
Compare
Takebacksies: I'm going to try merge this shortly after all. I reevaluated my concern that this was going to be an annoyingly large hunk to carry around in git: it is not. Measurements indicated it's roughly a hundred kilobytes in .git/objects when all is said and done. I daresay we can live with that. (We also reevaluated how big the interface changes for v0.7 will be, and it's less than originally planned, meaning the delta we expect in the near future for that is also much less than feared, even further reducing the concern about git history bloat.) |
I've updated the code to actually be nicely regenerable using standard |
Since #121, presence of fields is actually checked... but that code also doesn't understand implicit fields yet, which makes us need a lot of filler. Also the lack of the "members" field for unions? That was just plain wrong. Good think we're catching things like that now.
Cannot quite wire that up yet because of some other still incomplete features.
e47543b
to
ad7aa3e
Compare
We have here a new schema package which is based on the codegen'd golang types which recognize documents matching the schema-schema.
Meaning we're one step quite drastically closer to self-hosting tools for handling IPLD Schemas, among other things.
This comes in two packages:
schemadmt
" package -- where "dmt" stands for "Data Model Tree" (think: like AST, except this one is divorced from syntax) -- is 99% codegen, and a few helper methods manually attached to that.This package (like any codegen output) can be combined with your choice of IPLD codec in order to parse a schema!
schema2
" package -- temporary name: this will replace the currentschema
package when done -- which has types created from theschemadmt
data, but further validating it, and adding even more helpful accessor methods.This is necessary to do as a separate package because some of the validations it performs are actually graph properties (meaning they're unsuitable for validation by the schema rules alone). We also add a bunch of methods which do cyclic lookups, so, again this means pointers are suitable and the data model tree alone needs decorators to enable this.
This is a draft at present. However, it's already very close to ready to replace the current
schema
package.When it's ready to do so, the rest of the codegen shouldn't change very noticably at all -- we're still advertising all the same access methods as before.
Forgive me for tooting my own horn a bit here, but... rewriting this against the codegen'd types was just a beautiful experience.
The autocompletion for navigating the data was helpful, and the type checking helped steer things to correct completion on several occasions.
Being able to write a type switch for the union members, for example, and getting nice compile-time errors if you got a wrong type in there? Brilliant.
The code quality increased drastically.
In one case, the increased strictness also guided directly to the creation of a TypeReference type, which is now the thing to use for map keys whenever dealing with a mixture of named and anonymous types --
and now all of the new schema handling code works with this type, and in doing so, did almost all of the work necessary to actually get us anonymous types (previously not implemented); and all of the places where type names vs inline defintions are converted to type references are super clear.
Caveat to reviewers: I did burn out on trying to DRY in some of this code. I know.
The
BuildTypeSystem
function and all of its validation logic is fairly alarming (in particular, some of the deeper cases in unions which require looking at member type info).These can almost certainly be refactored in the future -- but I think I'd like to go about this by getting more test coverage first; merging a checkpoint; and then tearing at that in separate PRs later, with those test cases to insure the simplifications actually keep all the details correct.
Some parts of the logic probably can be DRY'd be a little application of cleverness; some parts probably can't, but even those might be better to break down into smaller functions just to keep variable scopes smaller and readability improved.
For this first pass, though, grinding things out in the most skullduggerish way possible seemed best. I figure lexical linearity is easier to review than cleverness.
(We may also want to benchmark the
BuildTypeSystem
function and/or read its assembly to make sure it's reasonably fast.We'll probably have codegen outputs invoking this function in the future at init time, if we want codegen packages to have their full type info available.
This isn't done yet, but is probably a feature in our near future.)
Some variable naming is also admittedly atrocious.
I tried to keep some kind of convention going with "dmt" suffixes for any variables that are the raw (well, relatively raw; they're still schema-lensed) data trees, but it's not always consistent, especially around typenames.
There's also an awful lot of "something2" variables where a type switch destructures something and I couldn't immediately think of a better name as I was hacking along.
These could also get better in future refactors (and maybe this will be particularly opportune to do at the same time as some of the refactors suggested in the previous paragraph, which would presumably lead to smaller functions).
Some other things still in the "future work" category:
I expect to rebase this several times before it reaches a finished state. (Hopefully, we might also cram some codegen output size reduction patches in before this as well. But we'll see.) So, early review welcome at this point, but do expect changes.