Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name disambiguation duct tape and wood glue #905

Merged
merged 83 commits into from
Nov 25, 2020

Conversation

jjcnn
Copy link
Contributor

@jjcnn jjcnn commented Nov 13, 2020

This is the PR that connects Disambiguation.ml to the checker and runner pipelines. In addition, the PR adds the executable src/runners/disambiguate_state_json.ml which is able to migrate existing contract states from the current ambiguous names to disambiguated names, either using IPC or using state jsons.

Note that there is still an outstanding issue regarding the migration tool. The issue is illustrated by the testcase simple-dex_1, which hashes a custom ADT value using sha256, and stores the hash as a map key in the contract state. The ADT value now uses a different constructor name, so the computed hash changes, which may make it impossible to perform the migration in the first place. I'll be looking into that next, but this PR should not be merged until we're certain it won't affect existing contracts.

It's pretty important that we get this one right, so please be thorough when reviewing this PR. It's a big one, and not doubt extremely tedious, but most of the files have been changed contain relatively straightforward changes. I have listed the important changes below. Please let me know if you have any questions or comments at all.

Overall design:

Once disambiguation has taken place, all names in the AST are of type GlobalName.t (see below), which are either simple names or names qualified by the address that defines the name (<defining address>.<name>).

Three types of entities must now be referenced using qualified names:

  • User-defined type names (type TName = ...)
  • User-defined type constructor names (type ... = | CName of ...)
  • Top-level library variables (let libvarname = ....). Library variables must be qualified because their local name might be qualified by a namespace (X.libvarname), and after disambiguation the namespace disappears, which means that if the name is kept unqualified, then it may now be shadowed by a locally defined (simply-named) variable libvarname.

The syntax of a Scilla contract does not change, since all disambiguation happens behind the scenes. However, because messages containing user-defined ADT values are constructed using global names (otherwise the recipient contract won't be able to parse the value correctly), the Zilliqa javascript API, and therefore all dApps, now needs to use qualified ADT names when invoking transitions that take user-defined ADT values as parameters.

In principle we can avoid this by introducing a convention that if a parameter has the type of a user-defined ADT, then we assume that any unqualifed constructor name is defined in the contract's own library. However, this doesn't solve the problem completely, because the incoming name may be qualified by a namespace, which we are unable to parse because the parser expects global names. Also, if we are to do this we will have to be 100% certain that outgoing messages are constructed correctly, because the recipient has to take it on trust that incoming messages are meaningful.

Important changes:

  • FlattenedLiteral (literals using the old names of the form <namespace>.<name>) has been removed. In checker phases before Disambiguation the module has been replaced by LocalLiteral, which in effect does the same thing as FlattenedLiteral. In phases after Disambiguation the module has been replaced by GlobalLiteral, which uses globally unambiguous names in the form <defining address>.<name>.

  • Additionally, a GlobalName contains a user-friendly string for use in error reporting. The string is the same as the flattened name. Importantly, the user-friendly string is localised, which means that the name might be different depending on whether you use the string from the definition of the name or from the use of the name. For instance, DatatypeDictionary is populated using localised error strings from the place where the datatypes are defined, but often the error message needs to use the localised name from where a contract uses the datatype. This is especially important when using the functions get_all_adts and get_all_ctrs in DatatypeDictionary, since we often don't have access to the use of a name in those cases. I'll create an issue on this, though I don't know how to fix it easily.

  • The eliminate_namespaces pass (in RunnerUtil.ml) that used to flatten names into "namespace.name" has been removed, because Disambiguate.ml now performs that task.

  • The disambiguation pass has been rewritten because the version that was merged initially used, unsurprisingly, the wrong scoping rules. The changes look more complex than they actually are, though. The big change is how the various name dictionaries are constructed and used.

  • Disambiguation requires the current _this_address and the extlibs entry from init.json as parameters, so the code that imports libraries has been rewritten. We still allow init jsons without a _this_address entry for offline development, in which case we use the contract's filename as _this_address. (During this work I have discovered that we actually read each init.json twice, which is not the end of the world, but which ought not to happen. I'll create an issue on this).

  • I have had to remove a few more .mli files, because their associated .ml files are parameterised. I don't think there is a way to capture the various type signatures when the underlying implementation is a functor rather than an unparameterised module.

  • As usual I can't figure out how to edit ParserFaults.messages, so I have just removed the conflicting entries.

Migration of existing contract states:

  • The executable disambiguate_state_json.ml contains a copy of most of the parsing code for jsons, because the input to the migration script uses the flattened, localised names from before. The disambiguation happens during parsing rather than as a separate pass, because that was the simplest way to do it. Output is performed using the code that is to be used in production.

  • The migration tool can accept either an input state json or a socket for IPC access. In both cases the script can output a migrated state json, but an input state json is required in this case, because the _balance field cannot be accessed using IPC. When migrating the mainnet we won't have access to an input state json, but since we won't need an output state json, and since the value of the _balance field does not need to be migrated, this should be fine.

  • The migration tool as been tested by migrating all the state jsons in our test suite using IPC by hand. I'm pretty sure it works except for the hashing problem I mentioned earlier, but I would really like to test it on the mainnet contracts just to be on the safe side.

Downstream changes:

  • The javascript API and any other code that sends messages to contracts needs to change so that the fully qualified names are used in user-defined ADT parameters. I know that Unstoppable Domains has a contract in which one transition accepts a user-defined ADT value, so we need to explicitly tell them to update any off-chain code that invokes that transition (I wouldn't be surprised if they don't actually use that transition - it looks pretty useless).

  • We also need to make sure to announce this change to the entire community, so that their offchain apps can be updated accordingly if indeed they use transitions that accept user-defined ADT parameters.

jjcnn added 30 commits June 2, 2020 22:59
…ithub.com/Zilliqa/scilla into name_disambiguation_duct_tape_and_wood_glue
@jjcnn
Copy link
Contributor Author

jjcnn commented Nov 15, 2020

Addendum to the migration tool:

  • The migration tool also migrates init jsons. This is needed in case a contract uses a user-defined ADT as a contract parameter. I can't imagine that's a common occurrence, but better safe than sorry.

@jjcnn
Copy link
Contributor Author

jjcnn commented Nov 16, 2020

As of Friday 13 Nov 2020 there were no contracts on the mainnet that applied a hash function to a user-defined ADT value, so the hashing issue will not cause any problems for us.

"errors": [
{
"error_message":
"Entry sort_uint32 in library TestLib3 conflicts with entry in library TestLib1",
"error_message": "Variable sort_uint32 imported from multiple sources",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a private discussion: it is possible to keep the error message more informative at least for offline development.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 2836148

It's not an ideal way to generate error messages, but I don't want to copy the code that performs the duplicate entry check.

@@ -1,8 +1,7 @@
{
"errors": [
{
"error_message":
"Following a field definition, the parser expects another field definition, transition, procedure or end of file.\n",
"error_message": "Syntax error, state number 289",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also discussed in private, I'm leaving a comment so that we don't forget about this issue.

{ "constructor": "EarmarkedCoin", "tags": [ "Money", "NotMoney" ] }
"earmarked-coin.EarmarkedCoin": [
{
"constructor": "earmarked-coin.EarmarkedCoin",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an issue the namespace earmarked-coin contains a hyphen, is it?

Copy link
Contributor Author

@jjcnn jjcnn Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a namespace - it's an "offline address". :-)

Since this is output from the checker for offline development it will at most make a difference to IDEs and similar tools, and they can just ban the use of hyphens in file names if they want to.

Once the contract is being deployed the constructor will be named "<_this_address>.EarmarkedCoin", which definitely (= almost certainly) works.

Copy link
Contributor

@anton-trunov anton-trunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjcnn @vaivaswatha I went through the whole thing and left some comments. It's a technical approval from me, although I'd prefer @vaivaswatha to have a pass and give the final approval.

@vaivaswatha
Copy link
Contributor

Sorry @anton-trunov @jjcnn I've been busy with polynetwork support and haven't been able to look at this PR. I'll review it over the weekend and should have comments (if any) before next week.

@jjcnn jjcnn merged commit aced3e8 into master Nov 25, 2020
@jjcnn jjcnn deleted the name_disambiguation_duct_tape_and_wood_glue branch November 25, 2020 21:47
@jjcnn
Copy link
Contributor Author

jjcnn commented Nov 25, 2020

(During this work I have discovered that we actually read each init.json twice, which is not the end of the world, but which ought not to happen. I'll create an issue on this).

It turns out that we actually don't. I got confused about reading the init file of the module we are checking/deploying/invoking, and the init files of the libraries that that module imports.

No issue created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants