Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema/dmt: first pass at a parser #253

Merged
merged 1 commit into from
Sep 28, 2021
Merged

schema/dmt: first pass at a parser #253

merged 1 commit into from
Sep 28, 2021

Conversation

mvdan
Copy link
Contributor

@mvdan mvdan commented Sep 15, 2021

(see commit message)

@mvdan mvdan requested a review from warpfork September 15, 2021 19:38
Copy link
Collaborator

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My skim of the parser has been fairly fast, but it looks like a parser, and it looks like it has quite good error handling (line and offsets, etc) as a bonus... and if it parses the schema-schema, that is indeed quite a bunch of practical exercise and test already. Amazing!

@warpfork
Copy link
Collaborator

It looks like we don't test that the parser of the schema-schema DSL is functionally equivalent to the schema-schema JSON load, yet?

Is it time to test that? It might be as easy as feeding to the two of them to the printer.Sprint feature and then doing a big string diff.

@mvdan
Copy link
Contributor Author

mvdan commented Sep 15, 2021

That sounds like a good idea. I'd probably say that's second PR territory, because it's likely going to find a few more bugs, and this PR is big enough as it is :)

@warpfork
Copy link
Collaborator

SGTM. My thumb is up!

@mvdan
Copy link
Contributor Author

mvdan commented Sep 17, 2021

Waiting on ipld/ipld#137 (comment) to get a decision.

schema/dmt/parse.go Outdated Show resolved Hide resolved
@warpfork
Copy link
Collaborator

I would probably move the whole Parse function family to another package. schema/dsl? It returns a schema/dmt.Schema, but most of the rest of its thinking is framed in terms of the DSL. It's also too easy to see schema/dmt.Parse and think "ah, that must take JSON or some other standard codec and parse to Data Model", which is of course not what this code does.

@mvdan
Copy link
Contributor Author

mvdan commented Sep 20, 2021

I don't have a strong opinion on the names of schema vs DMT vs DSL, but I do think we should avoid adding more packages into the mix. go-ipld-prime already has dozens of packages, and I fear that adding more packages and placing the high-level APIs there will just further confuse the new users.

If you're worried about getting confused by the fully qualified name schemadmt.Parse, perhaps schemadmt.ParseDSL? It reads somewhat redundant to me, as practically all parsers read some form of language syntax specific to their purpose, but ultimately I can't argue against being explicit in APIs.

@warpfork
Copy link
Collaborator

warpfork commented Sep 20, 2021

I'm not so worried about more packages. Worried about undiscoverability if high-level APIs end up hidden in deeply nested packages: yes. But we already have that problem with schema/dmt; I think putting the DSL parser stuff in schema/dsl is precisely equally bad, and not worse. (And yet has bonuses of explicitness, and draws an accurate graph that the DSL depends on the DMT, and not vice-versa.)

The solution to discoverability issues for this stuff seems to me to usually shake out to making a top-level package that just focuses purely on the discoverability concern. To pursue that: one option would be to make an ipld.ParseSchemaDSL function all the way in the repo root. The other option: is to try to put discoverability routes in the schema package... but the trouble there is of course we kind of lost that battle already, because I didn't realize how much Fun this was going to be when I started the schema package ages ago. (I wonder if it would be a good idea to try to reclaim the ground, though, with copious aliasing, like I did for the repo root package. It's possible.)

schemadmt.ParseDSL would still also read better to me than schemadmt.Parse, and definitely be a step in the right direction. But I think I still at would least weakly favor packages.

@mvdan
Copy link
Contributor Author

mvdan commented Sep 20, 2021

To pursue that: one option would be to make an ipld.ParseSchemaDSL function all the way in the repo root.

Oh, I like that. I had been thinking about what to name a "parse and compile schema" API, and where to put it. Now it seems clear it should go in the root package. If anyone wants something more specific, such as parsing the DSL but keeping it in DMT non-compiled form, then they could drop down to one of the sub-packages.

@warpfork
Copy link
Collaborator

Yep. And the root ipld package, fwiw, does already have a dependency on the schema package. So it's not at all unreasonable to add a ipld.ParseSchemaDSL there.

(A little more backstory: The ipld package already uses schema in tasks like feature-detecting typed nodes for purposes like the Marshal functions. To tie together some threads, this was made possible in #228, and usages of the schema package such as those Marshal functions appeared in #232 . I mention all this because it's the success of 228 that makes me float the idea we could pivot the schema package similarly, if we decided to do so. 😃 )

@mvdan
Copy link
Contributor Author

mvdan commented Sep 20, 2021

I still don't have a strong opinion on whether to split DSL from DMT. You're right that they could be split, and they sit in different levels. On the other hand, the new DMT package will have a very small API, and I'm not a fan of splitting a package with a small API into two packages with even smaller APIs. They would both be relatively small amounts of code, too.

I think I lean slightly towards splitting the packages, if anything because schemadsl.Parse feels like a slightly better name than schemadmt.ParseDSL. In such a split, would we then keep "compile" as schemadmt.Compile?

To get to the tip of the bikeshedding: most end users will want to go from a schema file to a compiled Schema they can use with e.g. bindnode. So they really want a "Parse and Compile" API. But I really don't like ipld.ParseAndCompileSchema. Maybe ipld.LoadSchema or ipld.BuildSchema?

I mention all this because it's the success of 228 that makes me float the idea we could pivot the schema package similarly, if we decided to do so.

My intuition is that we should probably not pull that trick more than once in this module. The existing codebase has enough packages and dependency tree magic as it is :)

Plenty of TODOs scattered around,
but it's complete enough that it can parse the schema-schema.

We include ParseBytes for the upcoming go:embed usecases,
and ParseFile for the common non-embed usecase.

The parser is largely inspired by github.com/ipld/go-ipld-schema/parser,
with some notable differences.

First, it follows a top-down recursive descent design,
with methods such as "consume token" and "peek token".
This also enables us to provide consistently good errors.

Second, it's not line-based, using bufio.Reader instead.
This keeps the code simpler, as we aren't forced to tokenize by line,
and we also avoid potential mistakes when slicing tokens by hand.

Third, it parses straight into the new bindnode DMT types,
which allows for Parse and Compile to be chained together.

Note that the parser lives in schema/dsl,
which sits on top of schema/dmt with the Go types and schema definition.
This is a nice separation at a conceptual level,
and also means one can use the DMT without pulling in the DSL parser.

Finally, we bump the ipld submodule to get Eric's schema-schema changes
that mean implicit scalar values are no longer quoted in the DSL.

Updates #188.
@mvdan
Copy link
Contributor Author

mvdan commented Sep 28, 2021

Moved the Parse API with its test to schema/dsl. Following schema/dmt being named schemadmt, the package is called schemadsl.

Note that Compile remains in schemadmt. That seems like the right place to me: it has nothing to do with the DSL.

Copy link
Collaborator

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks even better than ever -- let's merge it!

@warpfork warpfork merged commit df8e909 into ipld:master Sep 28, 2021
@mvdan mvdan deleted the dmt-parse branch October 15, 2021 10:39
@aschmahmann aschmahmann mentioned this pull request Dec 1, 2021
80 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants