CIP-???? | Deterministic universal almost-unique Plutus Constructors #608

nielstron · 2023-10-20T09:22:40Z

OpShin has recently gone the path of redefining the way it uses constructor ids internally to distinguish types. Since the implementation is language agnostic, we provide the specification here for other languages or applications to adopt as they see suitable.

(rendered proposal in branch)

rphair

Thanks @nielstron - seems detailed & well presented enough to be technically considerable.

First can you please review these guidelines for Plutus proposals - since they have to adhere to some process considerations to facilitate Plutus acceptance & implementation? To begin with, there are a few items missing from Path to Active (edit: confirmed no core changes)

~~Meanwhile~~ maybe @michaelpj @L-as @KtorZ can review the idea itself, plus anyone else they can tag from the Plutus teams (I may not be current on best people to ask; it's been a while since someone submitted a Plutus CIP).

CIP-????/README.md

rphair · 2023-10-24T15:41:50Z

@nielstron I've put this on the agenda for initial review at next CIP meeting: https://hackmd.io/@cip-editors/76

effectfully

This subverts my entire intuition on what a Data object means. Which is fine, evidently my intuition wasn't covering some of the ways people use Data objects. I'll bring this CIP to the team and get back to you once we've discussed it nternally. Thank you for submitting it!

effectfully · 2023-11-07T01:32:52Z

CIP-????/README.md

+
+ustr(union<X,Y,...,Z>) := "union<" + ustr(X) + "," + ustr(Y) + "," + ... + "," + ustr(Y) + ">"
+
+ustr(constr(name)<id, fields[f1:X,f2:Y,...,fn:Z]>) :=


So ustr is used to encode both types and terms?

No, ustr converts only types into strings, concrete values are not relevant.

effectfully · 2023-11-07T01:34:42Z

CIP-????/README.md

+```
+
+Where `name` and `f1` to `fn` refer to the name of the record and the names of its fields respectively.
+Since the constructor id of a records is not known when computing its constructor id, the constructor id string is set to `_` for this computation.


If it's not known, why include it in there in the first place? The entire concept feels awkwardly circular even though you get out of the infinite recursion with that wildcard.

I agree its a bit weird... but I do want to distinguish between classes with same names and fields but different constructor ids to avoid nasty suprises to the user.

class A:
CONSTR_ID = 0

B = A

class A:
CONSTR_ID = 1

class X:
x: Union[A, B]

If it is pulled out of the ustr for constructors then we loose modularity of the function 🤔

effectfully · 2023-11-07T01:38:59Z

CIP-????/README.md

+    ")"
+```
+
+Where `name` and `f1` to `fn` refer to the name of the record and the names of its fields respectively.


Why include names? If one group of developers has data Option a = Some a | None and another has data Maybe a = Just a | Nothing why not let those two data types to be considered the same, if they mean the same thing and are encoded the same way via PlutusData anyway?

Commented later, but: you have to have either the name or the id of the constructor, otherwise you can't distinguish two constructors with the same fields. And the whole point of this proposal is to not set the ids, so it has to be names.

Commented later, but: you have to have either the name or the id of the constructor, otherwise you can't distinguish two constructors with the same fields. And the whole point of this proposal is to not set the ids, so it has to be names.

I mean, you can get the hash of a constructor from the structure of the data type (\a -> Either a () for both the example types) and the id of the constructor. You don't need to globally specify the id this way, just for the purpose of computing a hash (and the way hashes are computed can be arbitrary as long as they're almost unique). Or am I misunderstanding it?

effectfully · 2023-11-09T04:12:09Z

CIP-????/README.md

+
+Note that the implementation first computes a `ustr` in human readable form and then transforms it into an integer. This is intentional, since the alternatives (directly computing a large unique number or similar approaches) are much more difficult to debug.
+
+To ensure that this does not only take the structural definition but also the intended usage into account, names of records are taken into account for the computation.


Aha, I see. It is rather strange in that whoever introduces a data type first decides for everybody else how they should name its constructor and fields. I'm also not sure how much safety it adds, Names are not type safety. I do normally believe that nominal > structural, but the underlying PlutusData is structural anyway and it seems potentially irritating to enforce the same names for all parties, plus names don't guarantee specific semantics anyway. Dunno.

I think you have to have the names if you're not including an id. Otherwise you can't distinguish the constructors of

data Foo = A Int Int | B Int Int

I don't understand the stated reasoning (why does it matter to "take the intended usage into account"?), but I think you do need it.

effectfully · 2023-11-09T04:15:05Z

CIP-????/README.md

+-->
+We definetly want a few properties on the CONSTR_IDs
+
+- _small_: ideally the constr_id integer should be as small as possible, as smaller integers are encoded more efficiently in CBOR and save the end user minutxo and txfees (constr_ids are encoded as the cbor tag up to 7 bit size, after that encoded as generic integer)


Have you run any experiments on whether using your version makes scripts more expensive (including deserialization time)? I'd expect them to become, but not sure about the scale, perhaps not by a lot.

If anything this proposal seems vanishingly unlikely to generate tags that are small? We're taking the result mod 2^32, so I'd expect to probably get uniform numbers over that range, which are going to be way higher than 2^7.

More generally, since this proposal wants global ids, there can only be 7 types globally that get the small ids. So I think this will definitely perform worse on space, but that might not matter.

Types will definitely perform worse on space, I assume most (i.e. roughly 50%) of tags will have size around 2^32. Small refers to these tags being smaller than i.e. 64 bytes (like a script or datum hash). Likely doing modulo 2^64 would not make a big difference on size/cost either but improve uniqueness, so I am looking into adding this as a change.

effectfully · 2023-11-09T06:28:24Z

CIP-????/README.md

+Plutus Constructor IDs are currently heavily focused around their origin in Haskell. They are usually used to distinguish different constructors of a single declared datatype.
+In contrast, one may introduce universally recognized datatypes that are identified by a unique constructor id and can be expected to behave in a specified way (i.e. contain specific fields with specific types).
+For this purpose, we introduce a generic way to compute an almost unique, deterministic and universal constructor id for objects based on their name and field types.
+Note that it is not expected that every language adopts this standard as a default (i.e. for Haskell-like languages there might not be much use of it).


How is Haskell different here?

In Python it is common to declare Sum Types / Unions after the declaration of the specific types i.e.

class A class B class C AB = Union[A, B] ABC = Union[A, B, C]

In Haskell the definition of the Sum Types / Union at the same time declares the involved alternatives, hence all involved alternatives are known at the time of declaration (and known to be distinct)

data AB = A | B data ABC = A | B | C -- I guess this would throw an error for redeclaring the constructors A and B?

effectfully · 2023-11-09T06:29:44Z

CIP-????/README.md

+<!-- The technical specification should describe the proposed improvement in sufficient technical detail. In particular, it should provide enough information that an implementation can be performed solely on the basis of the design in the CIP. This is necessary to facilitate multiple, interoperable implementations. This must include how the CIP should be versioned. If a proposal defines structure of on-chain data it must include a CDDL schema in it's specification.-->
+The deterministic, universal and almost-unique Plutus constructors are computed recursively based on the type definition of a record.
+We first compute a string `ustr(X)` based on the type definition of X. Then we perform a sha256 hash on the UTF8 encoding of this string and interpret the resulting hex digest as a big endian encoded integer.
+The integer is taken modulo 2^32. The resulting integer is the almost-unique, universal, deterministic constructor id of the plutus datum.


Any discussion on what happens in case of collision?

Yes collision seems bad here. I don't think it could lead to an attack, but I'm not 100% sure.

In case of collision during declaration of Sum Type, the compiler has to deny compilation and ask the user to manually declare a constructor id for the involved types. I think this is rare enough to have practically no impact on computation. Regarding attacks I don't think this schema is more vulnerable than any other schema. In the current Plutus schema constructor id overlaps are practically omnipresent (though never in any sum types that occur in the compiled contract)

effectfully · 2023-11-09T06:31:58Z

CIP-????/README.md

+Note: I will use record / Plutus Data exchangibly throughout the document.
+
+## Abstract
+Plutus Constructor IDs are currently heavily focused around their origin in Haskell. They are usually used to distinguish different constructors of a single declared datatype.


This section lacks an elaborate example, like an actual Data object. It took me a while to figure out what you mean by a constructor ID and I'm a Plutus developer.

effectfully · 2023-11-09T06:47:10Z

CIP-????/README.md

+To ensure that this does not only take the structural definition but also the intended usage into account, names of records are taken into account for the computation.
+
+There is no issue with backwards compatability when adopting this implementation as an opt-in choice for users.
+PlutusTx and most other languages allow explicitly setting the constructor id of objects anyways.


Yes, but I feel like we've always viewed constructors ids as constructor indices. We're discussing the possibility of converting Data objects to SOPs via a builtin and this can only work if constructor ids are interpreted as indices. I'll ask the team about the perspective that you bring, it's certainly new to me.

There is nothing that necessitates any particular interpretation of the integers in a Constr. We have generally assumed they will be indices: in particular, favouring small numbers is a reflection of that (also see https://www.ietf.org/id/draft-bormann-cbor-notable-tags-09.html#name-enumerated-alternative-data).

The point about conversion to SOPs is a good one. If we are able to offer a fast conversion from Data to SOPs, then you really will want to use indices rather than arbitrary ids (since if you want to case analyze the constructor with tag n, you need to provide alternatives for all of the n-1 previous tags too!).

This sounds like it can become a major issue for the compatability with native SOP.

michaelpj

This seems fine, if it's what you want. It doesn't seem massively appealing to me, but that's okay if other people want to use it. Naming data types and constructors via their transitive structural hash a la unison isn't a crazy idea.

I think the proposal could do with discussing what (to me) is the obvious alternative: wrapping. In the Haskell world, if I want to have either a type T or a type U, I write

data TorU = ItsT T | ItsU U

This corresponds to adding another layer of constructor tagging to tell me which one I'm looking at, so there's no problem if T and U use the same tags (and indeed, the actual implementation of Haskell very much does use constructor tags in this way). This doesn't seem much worse to me than anonymous unions, and it avoids the problem of tag clashes entirely.

So the usefulness of this proposal is limited to cases where you really want use values of various overlapping types interchangeably, and wrapping isn't acceptable. The use cases aren't really clear enough for me to say whether or not that's common.

michaelpj · 2023-11-14T14:23:06Z

CIP-????/README.md

+
+Note that the implementation first computes a `ustr` in human readable form and then transforms it into an integer. This is intentional, since the alternatives (directly computing a large unique number or similar approaches) are much more difficult to debug.
+
+To ensure that this does not only take the structural definition but also the intended usage into account, names of records are taken into account for the computation.


I think you have to have the names if you're not including an id. Otherwise you can't distinguish the constructors of

data Foo = A Int Int | B Int Int

I don't understand the stated reasoning (why does it matter to "take the intended usage into account"?), but I think you do need it.

michaelpj · 2023-11-14T14:27:15Z

CIP-????/README.md

+<!-- The technical specification should describe the proposed improvement in sufficient technical detail. In particular, it should provide enough information that an implementation can be performed solely on the basis of the design in the CIP. This is necessary to facilitate multiple, interoperable implementations. This must include how the CIP should be versioned. If a proposal defines structure of on-chain data it must include a CDDL schema in it's specification.-->
+The deterministic, universal and almost-unique Plutus constructors are computed recursively based on the type definition of a record.
+We first compute a string `ustr(X)` based on the type definition of X. Then we perform a sha256 hash on the UTF8 encoding of this string and interpret the resulting hex digest as a big endian encoded integer.
+The integer is taken modulo 2^32. The resulting integer is the almost-unique, universal, deterministic constructor id of the plutus datum.


Yes collision seems bad here. I don't think it could lead to an attack, but I'm not 100% sure.

michaelpj · 2023-11-14T14:28:08Z

CIP-????/README.md

+ustr(bytes) := "bytes"
+ustr(integer) := "int"
+// This covers the case where the structure of the object is now known from the perspective of the class, i.e. when any BuiltinData is allowed
+ustr(PlutusData) := "any"


why not "data"?

Probably an oversight but not too relevant
cf #608 (comment)

michaelpj · 2023-11-14T14:29:01Z

CIP-????/README.md

+// This covers the case where the structure of the object is now known from the perspective of the class, i.e. when any BuiltinData is allowed
+ustr(PlutusData) := "any"
+  // This covers the case where the type of the elements in the list are not known in advance
+ustr(list) := "list"


why isn't that list<data>?

More generally, this CIP is committing to a type-definition language that might not be appropriate for everyone, as witnessed by quirks like this.

Moreover, we already have at least two type-definition languages that we could use:

CDDL (since Data is a subset of CBOR)

CIP-57

Why not use one of those?

why isn't that list<data>?

Hm, I had this comment as well, but either I failed to hit "comment" or GitHub lost it (it's been glitchy lately for me).

I think this is a valid point. I am currently looking towards making this be compatible with the CIP57 definitions. However CIP57 does not make reproducibility a big thing (i.e. concrete ordering of JSON map elements does usually not matter) however here it is relevant - a re-definition or at least specifictation of a "canonical" blueprint from which to hash is unavoidable.

michaelpj · 2023-11-14T14:31:38Z

CIP-????/README.md

+  // This covers the case where the type of the elements in the list are not known in advance
+ustr(list) := "list"
+
+ustr(list<X>) := "list<" + ustr(X) + ">


We're not parsing these so I think it's fine, but probably worth clarifying that it's not a problem if e.g. type names contain < or other special characters.

michaelpj · 2023-11-14T14:32:33Z

CIP-????/README.md

+    ")"
+```
+
+Where `name` and `f1` to `fn` refer to the name of the record and the names of its fields respectively.


Commented later, but: you have to have either the name or the id of the constructor, otherwise you can't distinguish two constructors with the same fields. And the whole point of this proposal is to not set the ids, so it has to be names.

michaelpj · 2023-11-14T14:39:24Z

CIP-????/README.md

+-->
+We definetly want a few properties on the CONSTR_IDs
+
+- _small_: ideally the constr_id integer should be as small as possible, as smaller integers are encoded more efficiently in CBOR and save the end user minutxo and txfees (constr_ids are encoded as the cbor tag up to 7 bit size, after that encoded as generic integer)


If anything this proposal seems vanishingly unlikely to generate tags that are small? We're taking the result mod 2^32, so I'd expect to probably get uniform numbers over that range, which are going to be way higher than 2^7.

More generally, since this proposal wants global ids, there can only be 7 types globally that get the small ids. So I think this will definitely perform worse on space, but that might not matter.

michaelpj · 2023-11-14T14:40:10Z

CIP-????/README.md

+
+- _small_: ideally the constr_id integer should be as small as possible, as smaller integers are encoded more efficiently in CBOR and save the end user minutxo and txfees (constr_ids are encoded as the cbor tag up to 7 bit size, after that encoded as generic integer)
+- _unique_: There should be as little overlap with other values as possible, so that we can group together classes in unions without having to worry about setting/overwriting the constr id. This is reflected by the unique choice of identifiers in `ustr`.
+- _deterministic_: Datatypes that are defined in libraries may be imported in arbitrary contexts. the constr_id must therefore not depend on i.e. what other Unions the datatype is being used in or what other datatypes are declared in its surroundings. This rules out the Haskell approach and any automatically incrementing global counters.


I think it's the uniqueness that rules out the Haskell approach, not determinism.

Yes thats correct.

michaelpj · 2023-11-14T15:09:01Z

CIP-????/README.md

+To ensure that this does not only take the structural definition but also the intended usage into account, names of records are taken into account for the computation.
+
+There is no issue with backwards compatability when adopting this implementation as an opt-in choice for users.
+PlutusTx and most other languages allow explicitly setting the constructor id of objects anyways.


There is nothing that necessitates any particular interpretation of the integers in a Constr. We have generally assumed they will be indices: in particular, favouring small numbers is a reflection of that (also see https://www.ietf.org/id/draft-bormann-cbor-notable-tags-09.html#name-enumerated-alternative-data).

The point about conversion to SOPs is a good one. If we are able to offer a fast conversion from Data to SOPs, then you really will want to use indices rather than arbitrary ids (since if you want to case analyze the constructor with tag n, you need to provide alternatives for all of the n-1 previous tags too!).

michaelpj · 2023-11-14T16:23:12Z

Generally I'd like to see some more discussion of usecases, and maybe some indication that someone other than OpShin is interested in this.

rphair · 2023-11-15T02:42:16Z

Adding to #608 (comment) from CIP Editors' meeting today, where it was also brought up, that @nielstron we would be interested in seeing responses to the reviews already presented before we can more fully consider this as a CIP. Until then it does seem like it could instead be a useful "best practice" document with a good idea that might not be compelling for others to adopt.

rphair · 2024-08-20T02:57:31Z

@nielstron as far as I can tell, the last 2 commits 5d7861c & 1723c1b haven't addressed the feedback that I tried to summarise in #608 (comment) 9 months ago.

We are tagging some proposals Abandoned but we never tagged this one Waiting for Author first (I'm trying this week to overhaul our editing process with tagging stale PR's) so I'm applying that tag now. I expect that you can address the review points above (please make some notes in the relevant conversations) or explain why the points have all addressed.

It's just a matter of explaining your points clearly I believe... once we see that this is done we can put it back on the CIP meeting agenda to look at promoting it to a candidate; if no further progress the next few weeks it will probably be moved onto the Abandoned list & closed soon afterward.

rphair · 2024-09-24T01:57:00Z

@nielstron let's make one last call for updates before closing this as "abandoned" but please if you are interested in pursuing this then respond to the last comment & we'll plug it back into the review process.

nielstron · 2024-09-24T11:31:29Z

Hi @rphair ,
I don't have time to work on this CIP anymore, so feel free to tag it as abandoned.

Add CIP for unique, deterministic CONSTR IDs

d1eb327

nielstron changed the title ~~Add CIP for unique, deterministic constructor IDs~~ Almost-unique, deterministic constructor IDs Oct 20, 2023

nielstron added 2 commits October 20, 2023 11:26

Update README.md

1ef833d

Update README.md

1c17067

nielstron mentioned this pull request Oct 20, 2023

Reconstructable deterministic constrids Python-Cardano/pycardano#272

Merged

rphair changed the title ~~Almost-unique, deterministic constructor IDs~~ CIP-???? | Deterministic universal almost-unique Plutus Constructors Oct 21, 2023

rphair added the Category: Plutus Proposals belonging to the 'Plutus' category. label Oct 21, 2023

rphair reviewed Oct 24, 2023

View reviewed changes

CIP-????/README.md Show resolved Hide resolved

effectfully reviewed Nov 9, 2023

View reviewed changes

michaelpj reviewed Nov 14, 2023

View reviewed changes

nielstron added 2 commits February 8, 2024 13:32

Update README.md

5d7861c

Update README.md

1723c1b

effectfully mentioned this pull request Feb 15, 2024

A builtin converting the outer Data.Constr to a SOP for fast case? IntersectMBO/plutus#5777

Closed

rphair added the State: Waiting for Author Proposal showing lack of documented progress by authors. label Aug 20, 2024

rphair added State: Likely Abandoned Close if confirmed abandoned (long waiting). and removed State: Waiting for Author Proposal showing lack of documented progress by authors. labels Sep 24, 2024

rphair closed this Sep 24, 2024

effectfully mentioned this pull request Oct 28, 2024

Caseing on values of built-in types IntersectMBO/plutus#6602

Open


		ustr(union<X,Y,...,Z>) := "union<" + ustr(X) + "," + ustr(Y) + "," + ... + "," + ustr(Y) + ">"

		ustr(constr(name)<id, fields[f1:X,f2:Y,...,fn:Z]>) :=


		Note that the implementation first computes a `ustr` in human readable form and then transforms it into an integer. This is intentional, since the alternatives (directly computing a large unique number or similar approaches) are much more difficult to debug.

		To ensure that this does not only take the structural definition but also the intended usage into account, names of records are taken into account for the computation.

CIP-???? | Deterministic universal almost-unique Plutus Constructors #608

CIP-???? | Deterministic universal almost-unique Plutus Constructors #608

Uh oh!

Conversation

nielstron commented Oct 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rphair left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rphair commented Oct 24, 2023

Uh oh!

effectfully left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nielstron Feb 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelpj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nielstron commented Oct 20, 2023 •

edited

Loading

rphair left a comment •

edited

Loading

nielstron Feb 8, 2024 •

edited

Loading