- 
                Notifications
    
You must be signed in to change notification settings  - Fork 370
 
CIP-???? | Deterministic universal almost-unique Plutus Constructors #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| --- | ||
| CIP: ? | ||
| Title: Deterministic universal almost-unique Plutus Constructors | ||
| Category: Plutus | ||
| Status: Proposed | ||
| Authors: | ||
| - Niels Mündler <[email protected]> | ||
| Implementors: [Niels Mündler <[email protected]>] | ||
| Discussions: | ||
| - https://github.com/cardano-foundation/CIPs/pull/608 | ||
| Created: 2023-10-20 | ||
| License: CC-BY-4.0 | ||
| --- | ||
| 
     | 
||
| <!-- Existing categories: | ||
| 
     | 
||
| - Meta | For meta-CIPs which typically serves another category or group of categories. | ||
| - Wallets | For standardisation across wallets (hardware, full-node or light). | ||
| - Tokens | About tokens (fungible or non-fungible) and minting policies in general. | ||
| - Metadata | For proposals around metadata (on-chain or off-chain). | ||
| - Tools | A broad category for ecosystem tools not falling into any other category. | ||
| - Plutus | Changes or additions to Plutus | ||
| - Ledger | For proposals regarding the Cardano ledger (including Reward Sharing Schemes) | ||
| - Catalyst | For proposals affecting Project Catalyst / the Jörmungandr project | ||
| 
     | 
||
| --> | ||
| 
     | 
||
| Note: I will use record / Plutus Data exchangibly throughout the document. | ||
| 
     | 
||
| ## Abstract | ||
| Plutus Constructor IDs are currently heavily focused around their origin in Haskell. They are usually used to distinguish different constructors of a single declared datatype. | ||
                
       | 
||
| In contrast, one may introduce universally recognized datatypes that are identified by a unique constructor id and can be expected to behave in a specified way (i.e. contain specific fields with specific types). | ||
| For this purpose, we introduce a generic way to compute an almost unique, deterministic and universal constructor id for objects based on their name and field types. | ||
| Note that it is not expected that every language adopts this standard as a default (i.e. for Haskell-like languages there might not be much use of it). | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is Haskell different here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Python it is common to declare Sum Types / Unions after the declaration of the specific types i.e. class A
class B
class C
AB = Union[A, B]
ABC = Union[A, B, C]In Haskell the definition of the Sum Types / Union at the same time declares the involved alternatives, hence all involved alternatives are known at the time of declaration (and known to be distinct) data AB = A | B
data ABC = A | B | C -- I guess this would throw an error for redeclaring the constructors A and B? | 
||
| However, it is rather a recommendation for a choice in case interoperable datatypes with unique constructor ids are useful to an application (i.e. oracles) or language design (i.e. imperative languages). | ||
| 
     | 
||
| ## Motivation: why is this CIP necessary? | ||
| 
     | 
||
| The current approach to constructor ids is heavily focused around the Haskell-ish way of defining record types. | ||
| An object can be one of a set of predefined set of entities, distinguished by constructor ids. I.e. the optional Redeemer type is either `Some Redeemer` or `None`. | ||
| Because we know that anything of optional integer type can be either two of these, only two numbers (0/1) are required to distinguish them. | ||
| If we introduce a third constructor (i.e. `Some Datum`), potentially all other constructors change and the two implementations are not compatible anymore. | ||
| 
     | 
||
| Moreover there are other Plutus language frontends that allow freely declaring objects and mixing them into Union types (such as OpShin), which is akin to the imperative style of declaring classes. | ||
| This allows for example to declare a universally accepted type `Nothing` that can be freely mixed with `Redeemer` and `Datum` into `Union[Nothing, Redeemer, Datum]`. | ||
| The only requirement to ensure that this works properly is that all records that are mixed into the Union have distinct contstructor ids. | ||
| This is currently implemented manually, which is tedious and a potential source of errors. | ||
| 
     | 
||
| ## Specification | ||
| <!-- The technical specification should describe the proposed improvement in sufficient technical detail. In particular, it should provide enough information that an implementation can be performed solely on the basis of the design in the CIP. This is necessary to facilitate multiple, interoperable implementations. This must include how the CIP should be versioned. If a proposal defines structure of on-chain data it must include a CDDL schema in it's specification.--> | ||
| The deterministic, universal and almost-unique Plutus constructors are computed recursively based on the type definition of a record. | ||
| We first compute a string `ustr(X)` based on the type definition of X. Then we perform a sha256 hash on the UTF8 encoding of this string and interpret the resulting hex digest as a big endian encoded integer. | ||
| The integer is taken modulo 2^32. The resulting integer is the almost-unique, universal, deterministic constructor id of the plutus datum. | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any discussion on what happens in case of collision? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes collision seems bad here. I don't think it could lead to an attack, but I'm not 100% sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case of collision during declaration of Sum Type, the compiler has to deny compilation and ask the user to manually declare a constructor id for the involved types. I think this is rare enough to have practically no impact on computation. Regarding attacks I don't think this schema is more vulnerable than any other schema. In the current Plutus schema constructor id overlaps are practically omnipresent (though never in any sum types that occur in the compiled contract)  | 
||
| 
     | 
||
| The following function describes how to compute `ustr(X)` for a type recursively. | ||
| 
     | 
||
| ``` | ||
| ustr(bytes) := "bytes" | ||
| ustr(integer) := "int" | ||
| // This covers the case where the structure of the object is now known from the perspective of the class, i.e. when any BuiltinData is allowed | ||
| ustr(PlutusData) := "any" | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not "data"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably an oversight but not too relevant  | 
||
| // This covers the case where the type of the elements in the list are not known in advance | ||
| ustr(list) := "list" | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why isn't that  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More generally, this CIP is committing to a type-definition language that might not be appropriate for everyone, as witnessed by quirks like this. Moreover, we already have at least two type-definition languages that we could use: 
 Why not use one of those? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 Hm, I had this comment as well, but either I failed to hit "comment" or GitHub lost it (it's been glitchy lately for me). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a valid point. I am currently looking towards making this be compatible with the CIP57 definitions. However CIP57 does not make reproducibility a big thing (i.e. concrete ordering of JSON map elements does usually not matter) however here it is relevant - a re-definition or at least specifictation of a "canonical" blueprint from which to hash is unavoidable.  | 
||
| 
     | 
||
| ustr(list<X>) := "list<" + ustr(X) + "> | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're not parsing these so I think it's fine, but probably worth clarifying that it's not a problem if e.g. type names contain   | 
||
| 
     | 
||
| ustr(map<X,Y>) := "map<" + ustr(X) + "," + ustr(Y) + ">" | ||
| 
     | 
||
| ustr(union<X,Y,...,Z>) := "union<" + ustr(X) + "," + ustr(Y) + "," + ... + "," + ustr(Y) + ">" | ||
| 
     | 
||
| ustr(constr(name)<id, fields[f1:X,f2:Y,...,fn:Z]>) := | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, ustr converts only types into strings, concrete values are not relevant.  | 
||
| "cons[" + name + "](" + str(id) + ";" | ||
| + f1 + ":" + ustr(X) + "," + f2 + ":" + ustr(Y) + "," + ... + "," + fn + ":" + ustr(Z) + | ||
| ")" | ||
| ``` | ||
| 
     | 
||
| Where `name` and `f1` to `fn` refer to the name of the record and the names of its fields respectively. | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why include names? If one group of developers has  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commented later, but: you have to have either the name or the id of the constructor, otherwise you can't distinguish two constructors with the same fields. And the whole point of this proposal is to not set the ids, so it has to be names. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
 I mean, you can get the hash of a constructor from the structure of the data type (  | 
||
| Since the constructor id of a records is not known when computing its constructor id, the constructor id string is set to `_` for this computation. | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it's not known, why include it in there in the first place? The entire concept feels awkwardly circular even though you get out of the infinite recursion with that wildcard. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree its a bit weird... but I do want to distinguish between classes with same names and fields but different constructor ids to avoid nasty suprises to the user. class A: B = A class A: class X: If it is pulled out of the ustr for constructors then we loose modularity of the function 🤔  | 
||
| As an example, the constructor id of record `A` with fields `b` (record `B`, constructor id 5 with one integer field `i`) and `c` (integer) would result in `ustr(A) = ustr(constr(A)<_,fields[b:B,c:integer]>) = "cons[A](_;b:" + ustr(constr(B)<5,fields[i:integer]>) + ",c:int)" = "cons[A](_;b:cons[B](i:int),c:int)"`. | ||
| 
     | 
||
| ## Rationale: how does this CIP achieve its goals? | ||
| <!-- The rationale fleshes out the specification by describing what motivated the design and what led to particular design decisions. It should describe alternate designs considered and related work. The rationale should provide evidence of consensus within the community and discuss significant objections or concerns raised during the discussion. | ||
| 
     | 
||
| It must also explain how the proposal affects the backward compatibility of existing solutions when applicable. If the proposal responds to a CPS, the 'Rationale' section should explain how it addresses the CPS, and answer any questions that the CPS poses for potential solutions. | ||
| --> | ||
| We definetly want a few properties on the CONSTR_IDs | ||
| 
     | 
||
| - _small_: ideally the constr_id integer should be as small as possible, as smaller integers are encoded more efficiently in CBOR and save the end user minutxo and txfees (constr_ids are encoded as the cbor tag up to 7 bit size, after that encoded as generic integer) | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you run any experiments on whether using your version makes scripts more expensive (including deserialization time)? I'd expect them to become, but not sure about the scale, perhaps not by a lot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If anything this proposal seems vanishingly unlikely to generate tags that are small? We're taking the result mod 2^32, so I'd expect to probably get uniform numbers over that range, which are going to be way higher than 2^7. More generally, since this proposal wants global ids, there can only be 7 types globally that get the small ids. So I think this will definitely perform worse on space, but that might not matter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Types will definitely perform worse on space, I assume most (i.e. roughly 50%) of tags will have size around 2^32. Small refers to these tags being smaller than i.e. 64 bytes (like a script or datum hash). Likely doing modulo 2^64 would not make a big difference on size/cost either but improve uniqueness, so I am looking into adding this as a change.  | 
||
| - _unique_: There should be as little overlap with other values as possible, so that we can group together classes in unions without having to worry about setting/overwriting the constr id. This is reflected by the unique choice of identifiers in `ustr`. | ||
| - _deterministic_: Datatypes that are defined in libraries may be imported in arbitrary contexts. the constr_id must therefore not depend on i.e. what other Unions the datatype is being used in or what other datatypes are declared in its surroundings. This rules out the Haskell approach and any automatically incrementing global counters. | ||
                
       | 
||
| 
     | 
||
| Note that the implementation first computes a `ustr` in human readable form and then transforms it into an integer. This is intentional, since the alternatives (directly computing a large unique number or similar approaches) are much more difficult to debug. | ||
| 
     | 
||
| To ensure that this does not only take the structural definition but also the intended usage into account, names of records are taken into account for the computation. | ||
                
       | 
||
| 
     | 
||
| There is no issue with backwards compatability when adopting this implementation as an opt-in choice for users. | ||
| PlutusTx and most other languages allow explicitly setting the constructor id of objects anyways. | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but I feel like we've always viewed constructors ids as constructor indices. We're discussing the possibility of converting  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is nothing that necessitates any particular interpretation of the integers in a  The point about conversion to SOPs is a good one. If we are able to offer a fast conversion from  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds like it can become a major issue for the compatability with native SOP.  | 
||
| Note that due to determinism, types defined this way can be supported in third party languages as well by hard coding the computed constructor id and overwriting the default of the implementation language. | ||
| 
     | 
||
| 
     | 
||
| ## Path to Active | ||
| 
     | 
||
| ### Acceptance Criteria | ||
| - Implementation in at least one Smart Contract Language | ||
| 
     | 
||
| ### Implementation Plan | ||
| - Implementation in pycardano / OpShin. See the reference implementation [here](https://github.com/Python-Cardano/pycardano/pull/272). | ||
| 
     | 
||
| ## Copyright | ||
| 
     | 
||
| [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode) | ||
| 
     | 
||
| 
     | 
||
Uh oh!
There was an error while loading. Please reload this page.