Skip to content
This repository has been archived by the owner on Aug 14, 2019. It is now read-only.

Clay cleanup #1175

Merged
merged 10 commits into from
May 24, 2019
Merged

Clay cleanup #1175

merged 10 commits into from
May 24, 2019

Conversation

philipcmonk
Copy link
Contributor

I'm a programmer today because a man at my church gave me a copy of Linux on a CD when I was a kid and told me to learn C. He also told me a story of when he was a kid. There was a small anthill where he was playing in his backyard. He tried destroying the anthill with a stick, but it kept coming back. Rather than figure out exactly how to remove this anthill surgically, he poured a flammable liquid (gasoline? I forget) in the anthill. He figured he'd pour until it was full, then light it. But it took a lot longer than he thought, and he poured in a lot of gasoline. Finally, he lit it, and he says it blew up much of his backyard. He didn't have ant problems for a long time.

This is the approach I've taken with Clay, both in #1169 and here. There was some bug in +wake, but that part of the code was the most impenetrable part of clay. The commit/merge flow was more fragile, but at least you could tell what it was doing. But +wake was a 200 line impenetrable, stateful function made worse by being almost duplicated in +start-request, except that one had some small but important differences. Rather that try to find the bug, I re-wrote both of these into a single pure function, +try-fill-sub, which is still long and somewhat complex, but it's stateless, drastically more readable than it was, and commented every few lines. As a bonus, the bug is gone. I don't know exactly what the bug was, but there were three or four I noticed and fixed as I was refactoring it.

All that to say, I was hoping this would be a small PR that showed a clean difference between the inside-out state machines of old clay for validating foreign updates and their monadic equivalents. But alas, it's a 1000 lines. Should be a lot easier to follow than the last PR, though. Most of the monadifying is in two commits. This is foreign simple requests and full updates: 80e22ab, and this is mounting a desk: 5a1cccd

In summary, this contains four changes:

  • Convert the "foreign requests" flow to the clad transaction monad. Foreign requests are when we ask for a particular piece of information from another ship's clay, and we need to validate the result.
  • Convert the "foreign updates" flow to the clad transaction monad. Foreign updates are when we ask for the entire state of a foreign desk (and validate it). This is needed for merging, for example.
  • Convert the "mount" flow to the clad transaction monad. This involves a round-trip to ford to serialize all our files to the %mime mark. Because this populates the mime cache, this has to have a lock on the desk state (dome), so we put it in the same queue as committing and merging. This should make our sync with unix more robust.
  • Refactor +wake and +start-request to share code, be more readable, and be simpler. In particular, we now use the same code to check if we can fill a subscription when we first receive it as when we try to update it later on. No monads, here, just good old-fashioned renaming and refactoring. Found several bugs; killed them.

This puts a bow on the changes I wanted to make to Clay in this pass, so I'll move on to Jael next. I'm much more confident in Clay than I was before, and I wouldn't be afraid to hack on it now. One change I would love is change the foreign update flow to only send the data we don't already have. Right now, it has a conservative heuristic for guessing whether the subscriber has the blobs referenced in the update, and it sends them if it doesn't know. But there's plenty of real-world situations where the subscribee doesn't know that the subscriber already has the blobs. It'd be nice to have an additional round-trip, so we first send just the hashes of the data, and the susbscriber asks for the data of whichever of the hashes it doesn't have yet.

@@ -3499,7 +3687,7 @@
^- (map path lobe)
?: =(0 yon) ~
:: we use %z for the check because it looks at all child paths.
?: |(?=(~ for) (may-read u.for %z yon pax)) ~
?. |(?=(~ for) (may-read u.for %z yon pax)) ~
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joemfb, I'm not sure, but I think this line was what caused the non-determinism of the %child-sync test before. If stuff (setting permissions, starting the merge, etc) happened in a particular order, it triggered this bug so the child wouldn't get updates from its parent.

Copy link
Member

@joemfb joemfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted some minor PR nits. Still absorbing the overall changes.

sys/vane/clay.hoon Outdated Show resolved Hide resolved
sys/vane/clay.hoon Outdated Show resolved Hide resolved
sys/vane/clay.hoon Outdated Show resolved Hide resolved
@philipcmonk
Copy link
Contributor Author

Thanks, fixed those three and also a bug where I was cancelling some ducts more than once when a ship sinks.

@jtobin
Copy link
Contributor

jtobin commented May 23, 2019

      $d  ~|  %totally-temporary-error-please-replace-me  !!
      $p  ~|  %requesting-foreign-permissions-is-invalid  !!
      $t  ~|  %requesting-foreign-directory-is-vaporware  !!
      $u  ~|  %prolly-poor-idea-to-get-rang-over-network  !!
      $v  ~|  %weird-shouldnt-get-v-request-from-network  !!
      $z  ~|  %its-prolly-not-reasonable-to-request-ankh  !!

satisfying

@belisarius222
Copy link
Contributor

Now that the foreign-update code is monadified, will it work if we reload the vane in the middle of a foreign-update transaction? I'm wondering if there are any edge cases where we've received some Ames message that we won't receive again, and if we throw away our partially done foreign update after receiving the message, we'll lose the update forever.

@philipcmonk
Copy link
Contributor Author

If you don't put in any special +load handling, this will currently just let the transaction complete (so you'll keep a pointer to the old kernel until it's done). You're right, though, that we should save the ames messages of the currently running transaction so that we can restart it if needed. This also applies to the foreign-request flow.

I guess the general principle is "if you need to be able to restart a transaction, you must save all the initial arguments to the transaction". In this case, for %info, %merg, and %mont we save the whole task. For foreign-update and foreign-request it's probably better to save [index=@ud (unit rand)] from deserializing the ames message. If you have a queue of requests, one way to implement this is to not remove from the queue until the transaction is completed.

Copy link
Member

@joemfb joemfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

FWIW, I've read this through I couple of times and it appears to make sense, but I'm still not sure that I could write it myself ... I need to take +clad for a drive before I'll be able to review these more confidently

Copy link
Contributor

@belisarius222 belisarius222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I didn't inspect every line closely, but the structure looks reasonable and you addressed the one concern I had.

@jtobin jtobin merged commit 90e44af into master May 24, 2019
@jtobin jtobin deleted the philip/clay-foreign branch May 24, 2019 11:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants