Individual breaches, Clay edition #1169

philipcmonk · 2019-05-14T06:58:03Z

This rather a large PR; I'm happy to walk reviewers through it in person if that's easier. I think most people have some context for these changes.

A few changes here, most of which are due to a single issue. There was a 1-line fix or a 2000-line fix. I chose the latter, of course. The most important changes are these:

`%drip`

(only the changes in sys/vane/behn.hoon)

While debugging an indivdual breach bug, it turned out the issue was that clay was getting the sunk notification before ames, so clay's subscribers were trying to restart syncs before ames cleared its state -- which clears the attempted restarts. One solution would be to reverse the order of the sunk notifications (the 1-line fix), but the real issue is that when clay gives a subscription update, it shouldn't do that in the same event.

This is a textbook case for what we've informally called "toss". In reality, there are two related things we've called "toss". I've named one of them "drip", and that's the one I've implemented. "toss" means to "pass" in a later event (often the next event, but not necessarily if other stuff is in the queue). "drip" means to "give" in a later event.

Eventually, these should be implemented in arvo alongside give and pass. Then, in places where you want to give in the next event, you change:

[duct %give %writ ~]

For

[duct %drip %writ ~]

For now, I've implemented it in behn, with a "slip" calling convention, so it becomes:

[duct %slip %b %drip !>([%writ ~])]

Note that the sign will come into the receiving vane from behn now. If all instances of that gift will be dripped through behn, the the receiving vane can switch the associated sign from the sending vane to behn. If not all will be dripped, then it must include a sign entry for both vanes. This wrinkle will go away when drip is implemented in arvo.

Clay drip

All subscription updates from clay (%writ and %wris gifts) are now dripped through behn. Thus, a commit can't fail because a subscriber crashed. Additionally, this make it much more likely that a commit or a merge will take more than one arvo event (since both involve several round-trips to ford, involving drips). It has always been the case that a commit could take more than one event, but it was pretty rare so some code assumed it didn't.

Relevant to individual breaches, this makes it so that any clay changes resulting from a sunk notification trigger kiln in the next event, so Ames has already had time to handle the sunk notification.

Transactions in clay

Several parts of clay assumed that a commit would take exactly one arvo event. Since that's no longer the case, we needed a way to keep an in-progress clay commit and maintain a queue of pending commits or merges which will be started after the current one finishes. In other words, we needed "macrotransactions" -- transactions which take more than one arvo event.

We use the "clad" monad, which is very similar to the "ph" monad, to provide this transaction layer. All effects are stored until the transaction is completed, when they are all emitted at once. In case of upgrade, if you don't write any special upgrade logic in +load, the current write transaction will continue un-upgraded until it finishes (so a copy of the old kernel will stay until then). If you changed enough that you prefer to kill and restart the current transaction, you may do so with confidence that you won't leave it in an inconsistent state.

Overall, I believe this makes clay drastically less fragile. In particular, the data flow in the code is much more readable, and I'm no longer afraid that small changes will introduces subtle bugs in far-away sections of the code. Also, I was able to deleted hundreds of lines of comments describing the invariants required in the code because they were no longer necessary.

This is the vast majority of the changes in clay.hoon.

Individual breaches for clay

This implements the "forget they ever existed" interpretation of breaches for clay. In particular, we cancel all clay subscriptions to that ship, wipe our clay memory of it, and clear the ford cache entirely. This code is in clay.hoon in +handle-task under case %sunk (you can just grep for "sunk"). There are some related changes in lib/hood/kiln.hoon to properly restart syncs when the other ship sinks.

The main thing I've tested is that when a parent breaches, the child stays synced to their new instance.

Limitations:

This doesn't fix the ames/jael bootstrapping issues. In practice, this means that to breach a parent, you must either (1) make sure the old parent keeps running until it pushes the jael breach notification to all its children or (2) have all its children switch to another authority for jael, eg with |eth-manage %look. The first is not possible if you destroyed your ship too badly, and second isn't 100-year-compatible. This will be my next task.

add +bake

This reverts commit 2a0b2d7.

…aches

%acme: use cert from clay files

jtobin · 2019-05-15T11:35:27Z

@philipcmonk I've retargeted this at master, which apparently conflicts in a few places. I took a quick peek to see if I could just handle said conflicts outright, but they look substantial enough that I didn't touch anything for the time being.

philipcmonk · 2019-05-15T18:06:06Z

Transactions can only produce effects and not direct state changes, because once you're in clad the clay state is COW-frozen at what it was at the beginning of the transaction, right?

They can produce both effects and values. In this case, for example, we produce an updated dome (desk state) and rang (global object store). The code that evaluates the monad (+take-commit/+take-merge) can then integrate that with your state. In this case, we replace the dome (which means we need to have a write lock on the dome) and we +uni:by the rang (which means if we want to garbage-collect the rang, we should probably do it between transactions).

(I'm still side-eyeing the opaque wireless continuation-pile model of a side-effecting asynchronous computation, but it does seem reasonable for something morally equivalent to a single ford build).

Wireless computing is the future. Well, at least when you're inside the monad (the monad interpreter uses wires, of course). Although, I think I might add a path field to the notes we can produce, which will be appended to the wire for convenient |verb. But our old habit of including state in wires was very bad not only because it was slow, but because it left all the non-upgradable state lying around the system. With this approach, at least all the non-upgradable state is in the vane responsible for it, which can deal with it appropriately.

philipcmonk · 2019-05-15T18:06:39Z

@jtobin whoops, I'll merge from there.

…reaches

ohAitch · 2019-05-15T18:45:10Z

They can produce both effects and values. In this case, for example, we produce an updated dome (desk state) and rang (global object store). The code that evaluates the monad (+take-commit/+take-merge) can then integrate that with your state. In this case, we replace the dome (which means we need to have a write lock on the dome) and we +uni:by the rang (which means if we want to garbage-collect the rang, we should probably do it between transactions).

"Produces an effect which is evaluated by clay itself" is what I was gesturing at wrt indirect state changes, nod

But our old habit of including state in wires was very bad not only because it was slow, but because it left all the non-upgradable state lying around the system. With this approach, at least all the non-upgradable state is in the vane responsible for it, which can deal with it appropriately.

My concern is not upgradable state, but upgradable code: it used to be possible to add a printf in the middle of a transaction(or expand a step's input type according to an interface change), now it is not. I agree that the /next-state-name/<@p>/<@p>/<@ud>/<@asdf> pattern is undesirable once there are more than a couple steps, and should be replaced with /[%next-state-name]/<transaction-id> (map transaction-id formal-data) pair; but clad also gets rid of explicitly named states, leaving no affordances to rearrange the state flow graph or even the code handling each transition.

philipcmonk · 2019-05-15T18:55:47Z

Adding printfs without restarting a transaction is an interesting use I hadn't thought of, but in general upgrading code isn't always possible in the middle of a transaction, and our usual approach has been "ignore any ships in the middle of a transaction and hope they don't sink", which is worse. My guess is that in practice restarting the transaction on upgrade will be fine for debugging.

belisarius222 · 2019-05-15T20:03:20Z

I think I introduced the unix sync bug when I redid the mime cache a few months ago. Sorry! I didn't realize it broke syncing to Unix.

The reason I messed with the mime cache was to reduce restart time. When you restart Urbit (at least on master), I believe it rebuilds every mounted file in the filesystem.

This meant restarting a ship could take in the tens of seconds, if I recall correctly. I feel lile there should be a way to unbork unix syncs without losing the caching.

joemfb · 2019-05-15T20:20:31Z

It should be noted that this PR is now also a proposal to merge next into master, as this work was done on top of next.

philipcmonk · 2019-05-15T22:53:32Z

The reason I messed with the mime cache was to reduce restart time. When you restart Urbit (at least on master), I believe it rebuilds every file in the filesystem.

Ah, you're correct. It looks like sync portion of restarts take about 5-20 seconds without the mime cache, and about 2 seconds with it. I re-added it, updating it on every %ergo as described above. This fixes the bug, gives us the better start times, and does so more consistently (there were times before when the cache wouldn't be full unless you changed files, so startup would be very slow). Fun fact: hoon files are read in as /text/plain but written as /text/x-hoon, which broke the cache lookup often.

As an aside, I dislike how ford's build times seem to be extremely variable. Maybe it's missing caches for spurious reasons?

philipcmonk · 2019-05-15T22:55:50Z

It should be noted that this PR is now also a proposal to merge next into master, as this work was done on top of next.

Is next deprecated? Should everything branch off of and PR to master?

jtobin · 2019-05-16T00:38:47Z

It should be noted that this PR is now also a proposal to merge next into master, as this work was done on top of next.

Is next deprecated? Should everything branch off of and PR to master?

next is officially deprecated. I'll actually go ahead and delete the branch soon enough.

ohAitch · 2019-05-16T00:42:15Z

There seems to be a fairly frequent oscillation between master being stable+deployed and having a "devel" branch, and deploys being tied to stable releases with master containing whatever code is desired for deploy sometime soon. Is there a collection anywhere of arguments for the respective development models?

…

On Wednesday, 15 May 2019, Jared Tobin ***@***.***> wrote: It should be noted that this PR is now also a proposal to merge next into master, as this work was done on top of next. Is next deprecated? Should everything branch off of and PR to master? next is officially deprecated. I'll actually go ahead and delete the branch soon enough. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1169?email_source=notifications&email_token=AAOFPBU3BB7IN3TUNQR6KHLPVSUJRA5CNFSM4HMW3DB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVQKLFY#issuecomment-492873111>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOFPBRKKH4VPTUPBUJR43LPVSUJRANCNFSM4HMW3DBQ> .

belisarius222 · 2019-05-16T01:03:08Z

Some of Clay's Ford builds have around 1500 sub-builds, which has a tendency to flush the Ford cache. This is a common problem in caching, and systems that can handle this sort of thing well are called scan-resistant. Ford's cache replacement is a basic clock algorithm and is not scan-resistant.

jtobin · 2019-05-16T01:34:38Z

There seems to be a fairly frequent oscillation between master being stable+deployed and having a "devel" branch, and deploys being tied to stable releases with master containing whatever code is desired for deploy sometime soon. Is there a collection anywhere of arguments for the respective development models?

Not to derail the conversation around the actual contribution here too much, but the idea is to move to the latter scheme you've mentioned -- i.e. to scrap the non-master branches that have semantic operational meaning ascribed to them (so: next, hotfix) and simply target PRs at master. Master should always build and generally work, but it may not at any given moment correspond to a proper release, or what is deployed. Releases are to be accomplished by tags, as they are at present.

(Hotfixes, which are really just "unscheduled releases," can be accomplished by applying commits to the relevant release tag in a distinct branch or what have you.)

This generally just feels simpler than the "dev branch" model, which has felt superfluous to me in all projects I've worked on that use it, but it arguably requires a technical release manager of sorts to make sure things go smoothly. I've volunteered for that job, so hopefully things go smoothly.. 😄

joemfb

LGTM!

Sorry for the delayed review. FWIW, I have no objections to merging next into master now. The only issue I can see is that %crud error printing in %dill now assumes the tweaked event-replacement log from cc-release, and that's a minor inconvenience at most.

A tangent on restart and caching: it should be generally knowledge that all the vere caches are completely cleared on restart. This is done out of an abundance of caution after the livenet memory issues from earlier this year.

These isolated clay transactions seem like a big improvement. From a style standpoint, it feels like more of the data transformation could be factored out of the bind invocations, which would bring them closer together and make the I/O and state transitions even clearer. And it would be nice to bundle up +clad with its interpreter(s) (or to factor out some kind of generic interpreter). But these are just nits ... Nice work!

philipcmonk · 2019-05-16T21:08:27Z

And it would be nice to bundle up +clad with its interpreter(s) (or to factor out some kind of generic interpreter).

Addressed to some extent in #1174.

Clad improvements

philipcmonk and others added 30 commits April 4, 2019 10:56

add +wrap-test-http

42bcfd6

wip

126bf88

wip better azimuth mock

555032c

better azimuth mock

0734958

stash

9cc3179

monads, monads everywhere

4a1cd1d

add +bake

1be2c1f

Merge pull request #1148 from urbit/philip/bake

0333528

add +bake

remove do notation rune

eac24b8

add =%

2a0b2d7

wip

75f30e4

add ;<

ce45aae

add ;<

c66008f

Revert "add =%"

634d933

This reverts commit 2a0b2d7.

Implement %pem mark

2f5c0be

Implement installing certs from clay using %acme

789ee53

wrappable eth node mock

e369560

port ph tests to monadic style

04f1ed9

remove non-monadic ph tests and organize

1ec78c7

add docs for ph monad

c4090f4

travis

9c30081

renames

4d280b1

travis

f8ace59

Merge remote-tracking branch 'origin/next' into philip/individual-bre…

22b3775

…aches

travis

8d61a64

slow

d4a9b7a

mainnet azimuth

a737436

four minute ping to eth

9a83984

Explicitly note %install-from-clay as temporary

717bfeb

Merge pull request #1151 from urbit/acme-local

e770e9f

%acme: use cert from clay files

jtobin changed the base branch from next to master May 15, 2019 11:28

style

8a56dd8

philipcmonk force-pushed the philip/individual-breaches branch from a425322 to c9c371d Compare May 15, 2019 18:27

Merge remote-tracking branch 'origin/master' into philip/individual-b…

8cd2614

…reaches

philipcmonk force-pushed the philip/individual-breaches branch from c9c371d to 8cd2614 Compare May 15, 2019 18:37

joemfb mentioned this pull request May 15, 2019

Unsunk ship issues #1168

Closed

re-add cross-commit mime cache

e615814

joemfb approved these changes May 16, 2019

View reviewed changes

philipcmonk added 2 commits May 16, 2019 11:50

add paths to notes for documentation

cba357f

factor out the essential clad evaluation

e412bf8

philipcmonk mentioned this pull request May 16, 2019

Clad improvements #1174

Merged

jtobin and others added 2 commits May 17, 2019 20:56

Merge branch 'master' into philip/individual-breaches

98a3bcc

Merge pull request #1174 from urbit/philip/more-clay-monads

959b474

Clad improvements

jtobin merged commit ec1b5a2 into master May 17, 2019

jtobin deleted the philip/individual-breaches branch May 17, 2019 21:31

philipcmonk mentioned this pull request May 23, 2019

Clay cleanup #1175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Individual breaches, Clay edition #1169

Individual breaches, Clay edition #1169

philipcmonk commented May 14, 2019

jtobin commented May 15, 2019

philipcmonk commented May 15, 2019

philipcmonk commented May 15, 2019

ohAitch commented May 15, 2019

philipcmonk commented May 15, 2019

belisarius222 commented May 15, 2019 •

edited

Loading

joemfb commented May 15, 2019

philipcmonk commented May 15, 2019

philipcmonk commented May 15, 2019

jtobin commented May 16, 2019

ohAitch commented May 16, 2019 via email

belisarius222 commented May 16, 2019

jtobin commented May 16, 2019

joemfb left a comment

philipcmonk commented May 16, 2019

Individual breaches, Clay edition #1169

Individual breaches, Clay edition #1169

Conversation

philipcmonk commented May 14, 2019

%drip

Clay drip

Transactions in clay

Individual breaches for clay

jtobin commented May 15, 2019

philipcmonk commented May 15, 2019

philipcmonk commented May 15, 2019

ohAitch commented May 15, 2019

philipcmonk commented May 15, 2019

belisarius222 commented May 15, 2019 • edited Loading

joemfb commented May 15, 2019

philipcmonk commented May 15, 2019

philipcmonk commented May 15, 2019

jtobin commented May 16, 2019

ohAitch commented May 16, 2019 via email

belisarius222 commented May 16, 2019

jtobin commented May 16, 2019

joemfb left a comment

Choose a reason for hiding this comment

philipcmonk commented May 16, 2019

`%drip`

belisarius222 commented May 15, 2019 •

edited

Loading