-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactor: extract datamodel package. #228
Conversation
Yeah, that Back to the top though:
Can you justify this goal? Maybe there's some example code you could write, or point to that uses just the root package, or uses it for the majority of work that would make this a worthy goal? I know it's an internal view looking at the diff here but it really seems that the clarity from this change comes from the package separation, but now you also have many of those pieces hoisted up to the root, kind of undoing that clarity. You also have the mix of root vs specific package usage as I pointed out inline which suggests that maybe the distinction isn't even that clear or worthwhile? What would you lose if you removed the root package hoisting entirely? Are they just there now to fill in a blank space left after refactoring? |
The aliasing process is working extremely well. I've added aliasing and stubs to make the I've also added a few more aliases back into the root package, for error types, and some of the storage function types (namely, I'm testing that this is effective by attempting to run go-ipld-adl-hamt against this new version of the library, and it's... it's working. I've actually got go-ipld-adl-hamt tests passing with this refactored library, without any changes, and that's even before re-running its codegen(!). Aliases are very very cool. I think I might actually be able to land this, and tag it as a |
No. :) I'm just stating that as pretty much axiom. (Oh, okay, fine...) I've seen a lot of cases lately where someone asks for a feature that we already have, it's just hidden in a sub-package, and the discovery process just... doesn't happen, or gets lost in the maze. Or, that someone mistakes an internal API for one they should be using directly. Or someone want to add something to a very core API, but really what they're wanting is a convenience function, and it's not obvious that putting that in the most obvious place would be bad 🙃 I want to reduce the event frequency of all three of those things to near zero. I think "put the functions people want to use, and should see first, in the root package" is the way to do it.
True, for some of these values, it will be temporary. For example: I'm going to be including aliases to a bunch of error types in this PR. They will be transitional. I think I may remove those in a near future release. (As you say, having something like those symbols able to be referred to as more than one name... seems actively bad for clarity.) In some cases I think the alias can stay, and it's good docs.
So, yes :) I haven't filled in anything new in the root package yet. I'm going to hold off on that until proving this can fully land. But I want to, very very soon. I have a bunch of uncommitted work already that wants to go into this new space. |
Goal: make the root package of this repo into a place where we can put helper methods and powerful features (which depend on other packages!), rather than having it hamstrung by being the very bottom of the package dependency graph, and thus forcing all the interesting features and useful synthesis functions into difficult-to-discover subpackages. Also, the result just frankly reads well.
Evidently, it fits there without issue. (The name stutter here is nonzero, but just calling it "Context" alone seems... worse, given how much meaning that word has in golang.)
Many of these will be transitional only, I think, but we'll see what the appropriate timescale is for that. It may be fine to leave them for quite a while. With these aliases, I've been able to substitute this branch in to... approximately all of our downstreams I know about, and all their tests still pass. That includes packages that have codegen outputs committed already (!). That includes go-ipfs, and thus includes all of the transitives between us and it. I think that means we're truly able to claim this will be a "no breaking changes" scenario when we merge this and it next comes time to tag a release. (Golang's alias feature is kind of incredible. I'm really happy.)
We can do all our work without it, necessarily. Let's do so. This could have happened opportunistically, as it was forced, too -- in fact, that's what the prior commits were all doing -- and we'd just expect that to happen more and more as we add new features and helper methods and glue into the root package. However, it's just as easy (in fact, probably easier) to do it proactively, all at once, right now.
I chased the remaining uses of the root package within our repo down. You were right @rvagg , it instantly made things better. The new one is definitely smoother. |
These are based on manual review of the change in the godoc's list of exported symbols. (I haven't actually seen them used anywhere in downstreams in a way that their omission is a breaking change, but there's always a chance.) DeepEqual actually completely belongs here. ADL... I'm dubious on. I don't really think that should be in the datamodel package, either, actually. But let's roll with it. ErrHashMismatch... well, all those other errors already coming along for "a transitionary period", so we might as well apply that pattern consistently, and include this one too.
Also, fix up its docs. Phew, those had old and badly misleading links.
I've started other work further downstream of this which is now starting to add new helper methods and easy-use stuff to the root package, and I'm still really happy with this. (Using those, in turn, in further downstream projects, has already taken a bunch of stuff that was previous five lines of annoying glue, and turned it into one-liners that read something like And I'm also still confident this is going to not be a breaking change for downstreams. (Almost unbelievably.) Gonna merge and keep rolling forward with this, and adding the new facades of my dreams. 🎉 |
Goal: make the root package of this repo into a place where we can put
helper methods and powerful features (which depend on other packages!),
rather than having it hamstrung by being the very bottom of the package
dependency graph, and thus forcing all the interesting features and
useful synthesis functions into difficult-to-discover subpackages.
This in turn should make a lot of UX improvements possible.
Also, extracting a bunch of the key interfaces to a package called
datamodel
has a result that just frankly reads well. I was surprised how instantly I liked it. Probably a good sign.First of all, I want to apologize for the monster diff.
Second of all, I want you to know, reader, that all this came from the desire for a
ToString
method.Yes, I wanted to printf-debug. And yes... it lead me to do... this.
I jest. But also tell the truth. But also, honestly think this has become a critically important refactor
in order to unblock new features, unblock fixing some nasty UX bugs, radically increase usability and discoverability,
and ultimately, actually make the whole system easier to gracefully evolve with fewer breaking changes in the long run.
How did I get here? How will this really cause so many victories? Read on.
ToString
I'm not kidding about the
ToString
thing. This example will actually be instructive.I was toodling along trying to write some code using this library.
As is not uncommon, I wanted to "printf" some data, because in the heat of hammering out new code,
I just wanted to see what the data was doing.
Ow. This is actually not a feature we have a good function for right now.
One of the best things we can do is probably pipe the data in question into a json codec, and print that (which totally works, but... you know, uufdah).
And I actually wanted more than that: I was working with schema'd data, and the number-one, most key thing I wanted in my printf debug was... the type info.
I think to myself: okay, this is not the first time I've had this want.
It's certainly not going to be the last time.
Other people using IPLD and these libraries are going to want this too.
And when we do write a function for this, it really ought to be darned easy to discover.
So I start trying to write that function.
I think maybe I'll try to make it a package-scope function, and put it in the root package.
This seems right, right? It should be applicable on any Node (we want it to be able to do a debug render of anything, not have to hope node implementations provide a good printf implementation of themselves), so that means it's a function, not a method.
And putting it in the root package seems desirable, because, well, discoverability.
And then I stub my toes, repeatedly, instantly.
schema.TypedNode
, so I can simply print its type name.ipld
package can't importschema
because A) it's not supposed to, for logical scope reasons, and B) it'd be an import cycle.ipld
package can't import a codec. Cycles! Also, good lord, the bloat. HavingNode
depend onjson
is a critical fail; absolutely not.Okay shoot. So, the package graph here has rapidly squeezed me into:
If this was the only time this had happened, I'd say maybe "meh" and let's hack out a kludge for this case, and move on.
It is not the only time this has happened. We have repeatedly hit toe-destroyers like this when wanting to provide basic features.
We sometimes dodge it by adding the feature in another deeper package somewhere, to break the cycle problems,
but even then, it's often a weakened solution, and other users of the library don't discover it quickly, either, because it's tucked away somewhere deep.
So what happens if we slide all the key interfaces currently in the root package over to a new package, for
datamodel
...and then add features like
ToString
to the now-vacated root package, and let them go wild importingdatamodel
, andschema
, maybe even some string escaping utils fromjson
for the joy of handy debugging tools?Does that work? Does it help?
It does, and it does.
Another example feature: Copy
I'd like to offer a
Copy
operation. (Yes, nevermind that most of this library is oriented around immutability, so that sounds pointless. Somehow, it still comes up. In particular, when you want to copy some datamodel content into a different node implementation, for memory layout reasons. But we digress; that's unnecessary detail.)This doesn't hit exactly the same problem that
ToString
did -- it doesn't end up wanting to refer to any features beyond essential data model stuff.It hits on another problem instead: more of a semantic confusion one.
Copy
operation that does feature detection: if it sees a node has some fastpath for this, and use it if so, and fall back to a data-model copy walk if not. (The fastpath here is basicallyNodeAssembler.AssignNode
already, but nevermind that.)Copy
operation that's just the fallback: it only does the dumb walk-and-copy, and never usesAssignNode
. Why? Because there's a lot ofAssignNode
implementations out there that have to do this fallback. (It's in codegen outputs a lot.) We should have that function somewhere reusable.So this isn't a hard blocker. We could just name them different things.
But what do we call them?
And what are the odds of a user making a mistake, if they're right next to each other?
And how bad are the consequences of that mistake? (Spoiler: kinda bad. You probably get an infinite recursion bug if you pick the wrong one as your util to use inside your
AssignNode
implementation.)Now imagine how we need
Copy(dest NodePrototype, src Node) (Node, error)
andCopyIntoAssembler(dest NodeAssembler, src Node) error
and other variants like that,all adding more clutter to the namespace too, for the sake of offering all the needed performance-conscious vs convenience options. (Not imaginary: we need these.)
Now what do you suppose are the odds of a user either making a mistake, or becoming just plain exhausted, trying to pick the "right" function?
Splitting out the
datamodel
package offered a good solution to this too.The dumb walk-and-copy
Copy
function goes in thedatamodel
package;the smart feature-detecty
Copy
function goes in the rootipld
package, where we'll be adding all our other new smart feature-detecty functions.Neat. Solved. Clear. Consistent.
(More notes coming about the feature-detecty stuff coming up in a few sections, too.)
Making things discoverable
People frequently ask for "helper" functions, or facades, a something do just "quick do x".
This usually goes the same way as the
ToString
story above:So let's stop having that problem.
The root of all of these problems is because we put the
ipld.Node
interface in our root package, and because everything depends on it, it's a huge albatross for putting anything else in the root package.If we just move that to a subpackage -- here, now to the
datamodel
package -- everything goes peachy.Now we can add whatever facades and helpers we want into the root package, and we can assume people will find them.
Imported the root package will have a big transitive dependency sprawl -- but we can advise people to use the specific subpackages if they care.
And also, we can pull all this off with minimal breakage to existing users, because we can spam aliases.
This seems like a win-win-win to me.
Evolving
Now about that biggest claim: that ultimately, in the long run, this is actually going to make
the whole system become easier to evolve, with fewer breaking changes.
Yep.
Okay, the magic for that isn't done in this commit. But I'm clearing the way for it.
The future is feature detection. The
ToString
function (again, key: function, not method) is an example of what I'd like to do more of.Right now, we lean way too heavily on the Node interface.
It's a fairly monolithic interface.
It's a pain in the butt to implement all its methods, if you want to make a new one.
(And while that's supposed to be rare(ish), it certainly happens. Also: holy smokes, it just makes codegen emit huge hulks of stubs.)
But most problematically, we basically can't ever change it again at this point.
If we add anything, or change anything, it's a huge ecosystem-breaking change.
Anyone who's made an ADL? Broken. Anyone who's used codegen? Broken.
There's enough Node implementations in the wild already that we're completely ossified at this point.
(Fortunately, we seem to have hit pretty close to the mark in designing the Node interface, before ossification set in.
Still: nothing's perfect; and yes, we do still want to add features sometimes.)
So what should we do about this? Feature detection.
We will want to start writing more operations as package-scope functions, which take their subject Node as a parameter.
Inside those functions, we can do interface casts to see if the Node supports the operation, or, if it has any fastpaths for that operation (ooh!),
or, maybe it's something we can fall back to synthesizing out of one of the Node's other supported operations.
This will rock in several ways:
KeysIterator
function, and it can peek at the Node, and see if it has explicit support for that feature; if it does, we use that; if it doesn't, we use the general MapIterator, and it's probably slower (it might be allocating data we're just throwing away), but it'll work.LookupBy*
methods onNode
right now, for various performance reasons? What if we could offer all those, but also, if you were implementing node, you could just bother with one, and let our operations do feature-detection and synthesize the rest? We can do that now. We might actually be able to take methods out of theNode
contract in the future.I think this pattern of feature detecting operation functions will unlock a ton of usability improvements in short order.
And while we didn't technically have to reorganize the package graph to unlock this... honestly, I think it really helps.
I'm not sure I would've been smart enough to see this outcome without having embarked on the reorganization, and I suspect future users would've also had just as many discoverability hurdles. Fixing that is worth a lot.
Frankly, it just reads nice.
I was really happy reading
datamodel.Node
anddatamodel.Kind
all over the place.It's what we say when we're describing these things to other humans in conversation, and often even in written prose.
It's good that the code now says the same things.
Some other targets-of-opportunity
errors
I moved a few error types around slightly during this work.
A couple things moved from
datamodel
toschema
, because 100% of their references were over there.A couple things disappeared, because it turned out there weren't even being used.
Oh, and
ErrHashMismatch
moved tolinking
, because that's the only place that used it.These just happened to be especially easy to notice since I was already fixing package graphs.
I don't think any of these changes will be particularly disruptive;
frankly, our error types haven't been consistent enough yet that I think anyone in their right mind could be relying on them.
linking
I moved the
LinkSystem
type to thelinking
package.This just seems like common sense.
I think a lot of package imports at the tops of files started immediately making sense when I did this.
There's still an alias to
LinkSystem
in the root package, so this shouldn't be disruptive.codecs
I finally got rid of those dang
Marshal
andUnmarshal
functions that exposedrefmt
API types from thecodec
package.Turned out only one or two tests were using them, and they weren't that hard to update to use the modern
Encode
andDecode
functions.I don't think anyone else outside the repo has been using these for a good while already, so this shouldn't be disruptive.
basicnode
I moved
node/basic
tonode/basicnode
. Same naming convention asbindnode
users. Means last segment of import path and actual package name are the same, which is a happier thing.This is probably actually a pretty huge breakage from the POV of many downstreams.
I should either revert this bit, or we should add a whole facade package.
(Or we could decide to bite the bullet and power through it. But I don't see that as a likely choice.)
Phew
That's it, phew.
This proposal may seem a bit sudden. It is. I was going to do a spike to prototype it, and then ended up thinking "what the heck, let's just Do The Thing and see if it works" (frankly expecting to fail), and lo and behold, it did work. So now here with are with a cool thing, and the choices are: run with it, or, throw it away. (Maintaining it as a branch is not on the table.)
I suspect this might be a bit controversial, just because of the sheer diff size. That said: please don't trust your first instincts from that. Golang's alias features really give us a lot of power to make a change this big, and yet make it smoothly.
And yet, because it's big, I'd also like to try to land it fairly decisively. Almost any other work in this repo will generate merge conflicts with this now.
I'm pretty confident this is a good direction forward. All the tests are green already. Codegen's even updated. Let's go?