Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script context conversion is expensive #4209

Closed
treeowl opened this issue Nov 12, 2021 · 13 comments
Closed

Script context conversion is expensive #4209

treeowl opened this issue Nov 12, 2021 · 13 comments

Comments

@treeowl
Copy link

treeowl commented Nov 12, 2021

Converting from BuiltinData to ScriptContext is pricy for large transactions. I don't see any obvious reason not to provide the script with the ScriptContext directly. Unlike the datum and redeemer, this has a known, fixed format.

@michaelpj
Copy link
Contributor

Yeah, we know.

The reason we do it this way is that it means that the ledger interface only operates with builtin types, rather than having to leak details of how datatypes are compiled into the interface with the ledger. We didn't expect that it would end up being so significant.

I have a few ideas to improve this, but they mostly involve drastic changes, e.g. reintroducing sums and products into PLC so we can give a more stable format for datatypes that we're willing to commit to.

@kk-hainq
Copy link
Contributor

The conversion is also heavy in terms of script size so there is much motivation to improve.

I have a few ideas to improve this, but they mostly involve drastic changes, e.g. reintroducing sums and products into PLC so we can give a more stable format for datatypes that we're willing to commit to.

This sounds like a lot of work though... I've added a mention to #4174. Hopefully, someone will pick it up and help complete it eventually.

@L-as
Copy link
Contributor

L-as commented Nov 30, 2021

I was asked by @jmchapman to share an idea I and some other folks developed, see https://gitlab.com/fresheyeball/plutus-tx-spooky.

I haven't actually tested the above code, so I don't know if it works, but the idea is to lazily decode the BuiltinData.
The core relevant code is:

newtype Spooky a = Spooky BuiltinData -- constructor is not exported

unSpooky :: UnsafeFromData a => Spooky a -> a
unSpooky (Spooky a) = unsafeFromBuiltinData a

toSpooky :: ToData a => a -> Spooky a
toSpooky = Spooky . toBuiltinData

-- example code
data SpookyScriptPurpose' = SpookyMinting (Spooky CurrencySymbol) | SpookySpending (Spooky TxOutRef) | SpookyRewarding (Spooky StakingCredential) | SpookyCertifying (Spooky DCert)

type SpookyScriptPurpose = Spooky SpookyScriptPurpose'

The arguments passed to the scripts would be "decoded" as a Spooky a, which you would unSpooky when you want to use. The fields of a should also use Spooky, such that the decoding is only one layer deep.

The reason I'm bringing this up, is that I don't see the point of introducing explicit support for sums and products when we already have BuiltinData. The functionality would overlap IMO. If BuiltinData is missing some features to make this approach viable, perhaps you could extend it with such features? NB: I am not familiar with how the generated {To,From}Data instances look.

Perhaps I'm missing something?

@michaelpj
Copy link
Contributor

That's basically the approach I tried in #4235, which is a bit more generic in that it delays any kind of computation, not just deserialization.

The problem with that approach (and I think also with yours) is that it's easy for users to cause the deserialization work to happen multiple times, which is actually worse than today. For example, I used Lazy on all the fields of TxInfo, but many functions take TxInfo directly, and must then force the fields... so if you call that function twice it will force it twice and do the work twice.

I think it could be viable to do things this way but it would require a lot of refactoring of the libraries to pass around the forced fields rather than TxInfo directly.

The reason I'm bringing this up, is that I don't see the point of introducing explicit support for sums and products when we already have BuiltinData.

The point is to not do any conversion at all, but rather pass in the data in exactly the form that the program needs.

@L-as
Copy link
Contributor

L-as commented Nov 30, 2021

I get that, however, what you're describing is slightly different from the optimal state of what I meant to describe.
Lazy stores a function, and has a much higher overhead than what I'm describing.
What I mean, is that rather than using Scott encodings, you'd just use BuiltinData directly. PlutusTx might generate code that rather than transforming the BuiltinData into a Scott encoding, then calling that, interfaces with the BuiltinData directly.

To access txInfoMint, the code might unsafeDataAsConstr the TxInfo, then fetch the 4th field element of the list directly, rather than going through some Scott encoding ritual. You could drop Scott encodings entirely and use BuiltinData for everything TBH, though it wouldn't support lazy fields, nor would it support functions AFAICT. You could add functions to BuiltinData? You could also choose between Scott encoding or this for a data type depending on whether it has a UnsafeFromData instance, since if it has UnsafeFromData it's likely going to be stored on-chain.

@michaelpj
Copy link
Contributor

Lazy stores a function, and has a much higher overhead than what I'm describing.

I don't think that's true. Why do you think it's true? Both of them call unsafeFromData when you look inside them, that's it.

What I mean, is that rather than using Scott encodings, you'd just use BuiltinData directly...

I think what you're suggesting is to basically compile datatypes to Data as a backend. That would work, but be much slower than the current version. Native sums and products is the fast version of that, effectively.

@L-as
Copy link
Contributor

L-as commented Dec 3, 2021

I don't think that's true. Why do you think it's true? Both of them call unsafeFromData when you look inside them, that's it.I don't think that's true. Why do you think it's true? Both of them call unsafeFromData when you look inside them, that's it.

You're right, but Lazy still has some more overhead though I think, since when eliminating, the scott encoding will call the passed argument with N forall a. Lazy a for N fields if I'm not misunderstanding anything, meaning that even the fields you do not use will be wrapped in a lambda abstraction. In the Spooky case, the caller would be responsible for unsafeFromData-ing it. However, I don't think this would make all the benchmarks faster than the baseline.

I think what you're suggesting is to basically compile datatypes to Data as a backend. That would work, but be much slower than the current version. Native sums and products is the fast version of that, effectively.

How would the design of sums and products be? Would you effectively add a statically sized heterogeneous list with O(1) indexing? I personally think that it would be best if the language was kept as simple as possible, which is why I think changing BuiltinData is better than adding new types, since sums and products would overlap with BuiltinData I think. Alternatively, you could drop Data entirely, and make the on-chain format of datums and redeemers fit products and sums. This would likely essentially mean losing the ability to inspect the format/type of the datum and redeemer, and instead assuming it.

@effectfully
Copy link
Contributor

Converting from BuiltinData to ScriptContext is pricy for large transactions.

This is being addressed by the Sums-of-products CIP.

I just added the "status: needs action from the team" label where the "action" is @michaelpj bringing SOP to production.

@effectfully effectfully self-assigned this Jun 13, 2023
@effectfully
Copy link
Contributor

This is being addressed by the cardano-foundation/CIPs#455 CIP.

Unfortunately, that didn't work out. We're looking into other options such as optimizing fromData/unsafeFromData as well as making using Data directly faster and more user-friendly, but it takes time to sort all of this out.

@L-as
Copy link
Contributor

L-as commented Nov 26, 2023

I don't understand why you don't just have a pragma for encoding some data types as Data

@effectfully
Copy link
Contributor

effectfully commented Nov 26, 2023

I don't understand why you don't just have a pragma for encoding some data types as Data

@michaelpj is that what we're converging to? Do there exist any docs on how the AsData thing is supposed to be used?

@michaelpj
Copy link
Contributor

That's more or less what we're getting to, yes. I've got a ticket to write some docs for AsData. We need to do a decent bit of work to get it usable for the ScriptContext, though.

@effectfully
Copy link
Contributor

There's now PlutusLedgerApi.V3.Data.Contexts and similar modules allowing for cheaper extraction of stuff from the ScriptContext.

@zliu41 do we have user-facing documentation on Data-backed `ScriptContext?

That's more or less what we're getting to, yes. I've got a ticket to write some docs for AsData.

Was done long ago: https://plutus.cardano.intersectmbo.org/docs/using-plutus-tx/optimizing-scripts-with-asData

I'm closing this issue, because the original question was answered by Michael:

The reason we do it this way is that it means that the ledger interface only operates with builtin types, rather than having to leak details of how datatypes are compiled into the interface with the ledger. We didn't expect that it would end up being so significant.

If there's still something unresolved here, we should open more focused issues, this one served us well for the general discussion and it's time to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants