Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler optimizations for smaller script outputs #4174

Open
kk-hainq opened this issue Oct 31, 2021 · 6 comments
Open

Compiler optimizations for smaller script outputs #4174

kk-hainq opened this issue Oct 31, 2021 · 6 comments

Comments

@kk-hainq
Copy link
Contributor

kk-hainq commented Oct 31, 2021

Describe the feature you'd like

Many dApp developers desire smaller output scripts to meet the limit of the blockchain. The short-term goal is to shrink complex "raw" scripts and basic-but-non-trivial state machines to pass the limit. The long-term goal is constant improvements as smaller scripts are generally net gains to the whole network.

Here are a few proposals to tackle this problem.

  1. Strip out unused constructors in PIR.
    This was proposed by @michaelpj in Strip out unused constructors in PIR #4148. Compiler: remove unused data constructors #4158 is an attempt with decent discussions for context but overall is facing a half-dead end. The main challenge is making it useful for the proposed goals as real-life scripts are applied at run-time. A "perfect" dependency analysis would nuke all scripts as unused functions. Persisting all input types, constructors, and their type dependencies might require too much effort for tiny gains. Hopefully, we can find a balance to proceed in this direction.

  2. Avoid retaining datatypes in PIR, which are used only at the type level.
    This was proposed by @michaelpj in Avoid retaining datatypes in PIR which are used only at the type level #4147. We have not attempted anything but this direction might face the same complications as Strip out unused constructors in PIR #4148. The solution might be very natural once Strip out unused constructors in PIR #4148 is solved.

  3. Better liveness analysis.
    This is a general bullet point to improve the liveness analysis that includes the above two proposals. There may be more. For context, it currently over-approximates the dependency among parts of a data type.

  4. Minimize or remove trace messages.
    We should add a flag to remove trace messages during compilation. dApp developers can then have easy-to-debug scripts during development and size-optimized for on-chain deployment with the same code.

  5. More and better simplifiers.
    We currently have a few simplifiers in UnwrapCancel, Beta, and Inline. I wonder if we could push it further in this direction.

  6. Write more documentation on code styles that yield smaller scripts.
    These can be minor choices like using if then else instead of calling || or &&. Others include matching on input redeemers to simplify validation rules (Redeemers in Script Context #3844), requiring more data in redeemers to not re-calculate it from script contexts, avoiding reading and converting datums from script contexts, avoiding converting data on-chain (related to Script context conversion is expensive #4209), and more.

  7. Explore and integrate new optimization techniques.
    We are going to break this bullet point down into more concrete proposals. In general, there are still many dead code techniques to try. Others include truncating input types at compile-time and their input values accordingly at run-time, utilizing more dynamic analysis, superoptimization, peephole optimization, and more.

  8. Document more case studies.
    Personally, I believe that studying real-life scripts is very beneficial for insights and inspirations towards practical solutions. At the current stage, we should favour effectiveness over elegance. We have to document real scripts for security research at Hachi anyway, will port any relevant findings and ideas here.

  9. We should write much more documentation, tests, and benchmarks.

  10. Support forced builtins as documented in Could untyped plutus core have primitives which don't need initial force #4183. We can then eliminate the "manual" forces at each builtin usage.

Describe alternatives you've considered

Different projects have been experimenting with different solutions to this problem. Many have to simplify application logic that reduces the set of functionalities a dApp can offer. A few have to write code in unnatural ways to exchange readability and risk bugs for a deployable script. Others have gone as far as writing their own optimizers or even compilers for other source languages.

The first two routes are both unfortunate and not scalable. The last one is exciting but would require too much time to be practical soon. We believe that helping improve the current compilation pipeline makes the most sense.

Additional context / screenshots

We have several people who are willing to help with all these proposals. We are likely to add more or write more on existing proposals with time. We can also write more documentation for interested people to join the work.

Relevant issues:

@michaelpj
Copy link
Contributor

Better liveness analysis

I think this is pretty good already. The only issue is the datatype issue you referenced.

Minimize or remove trace messages.

We thought about this. It's a bit tricky. It's likely that people will want to re-run the script that actually failed on the chain if something does fail, and it would be quite annoying to not have the trace information in that case. And you can't necessarily just swap in an alternative version if things care about hashes...

So it's complicated, which is why we did the stupid thing of just leaving them in.

Explore and integrate new optimization techniques.

There's lots of optimization we can do, although it's very unclear how much will actually help. At some point you just have to include the code the user asked for, which can be a lot!


There are also much more drastic things that we are considering internally. I'll mention a few of them here.

  1. Compress scripts. Compressing scripts gets us about a 40% saving, even given our reasonably compact binary encoding. Currently the idea is to implement this in the ledger, and I hope it will be in the next HF.
  2. Script references. Sketchy at the moment, and relies on several non-implemented ledger extensions, but we'd like to have a way to post scripts to the chain and then reference them afterwards, rather than having to submit them each time.
  3. Partial script references. Even more sketchy, but it would be nice to just be able to reference a large chunk of code (e.g. the data decoder for ScriptContext) somehow, rather than having to submit it every time.
  4. Pass structured data into scripts differently. Too soon to say much, but we waste a lot of time and space on fromData, it would be a big win to get rid of it.

All of these require ledger changes, so there's a bunch of design work that needs to go on etc.

@kk-hainq
Copy link
Contributor Author

kk-hainq commented Nov 1, 2021

We thought about this. It's a bit tricky. It's likely that people will want to re-run the script that actually failed on the chain if something does fail, and it would be quite annoying to not have the trace information in that case. And you can't necessarily just swap in an alternative version if things care about hashes...

So it's complicated, which is why we did the stupid thing of just leaving them in.

Does the proposal of adding a flag to remove traces make sense to you then? I know many developers would want that in these early days of tight limits. I guess in the long run we can map error codes off-chain or something.

There's lots of optimization we can do, although it's very unclear how much will actually help. At some point you just have to include the code the user asked for, which can be a lot!

That's why we want to help so we can write more logic without polluting our shared blockchain!

There are also much more drastic things that we are considering internally. I'll mention a few of them here.

  1. Compress scripts. Compressing scripts gets us about a 40% saving, even given our reasonably compact binary encoding. Currently the idea is to implement this in the ledger, and I hope it will be in the next HF.
  2. Script references. Sketchy at the moment, and relies on several non-implemented ledger extensions, but we'd like to have a way to post scripts to the chain and then reference them afterwards, rather than having to submit them each time.
  3. Partial script references. Even more sketchy, but it would be nice to just be able to reference a large chunk of code (e.g. the data decoder for ScriptContext) somehow, rather than having to submit it every time.
  4. Pass structured data into scripts differently. Too soon to say much, but we waste a lot of time and space on fromData, it would be a big win to get rid of it.

All of these require ledger changes, so there's a bunch of design work that needs to go on etc.

We haven't thought of 1 before, 40% would be very very nice. 2 does make sense. 3 would be a beauty. 4 would be indeed very practical, we realized that too and do tell each other to refrain from converting data on-chain. I'll continue to work on dependency analysis and removing unused data types for now given our earlier suggestions. Just tell me if you need anything more anytime!

@michaelpj
Copy link
Contributor

Does the proposal of adding a flag to remove traces make sense to you then?

Sure. It would be very simple: a pass that replaces all string literals with the empty one! With a plugin option to enable it. I'd take a PR for this. I'm somewhat unsure that it's a good idea, but having the option doesn't seem too bad.

@kk-hainq
Copy link
Contributor Author

kk-hainq commented Nov 2, 2021

Sure. It would be very simple: a pass that replaces all string literals with the empty one! With a plugin option to enable it. I'd take a PR for this. I'm somewhat unsure that it's a good idea, but having the option doesn't seem too bad.

I think people would love an option to remove all traces for good too. I'll get a few things up by the end of the week!

@michaelpj
Copy link
Contributor

I think people would love an option to remove all traces for good too. I'll get a few things up by the end of the week!

Right, so you could both

  1. Transform all string literals into the empty string
  2. Replace all occurrences of trace str a with a

Perhaps the latter would be sufficient.

@effectfully
Copy link
Contributor

We've already done a lot to reduce the size of the compiled scripts, but we do recognize that sizes are still far from being ideal. It is one of our objectives to further reduce script sizes, hence I'm adding the status: objective label.

@effectfully effectfully added Objective status: triaged and removed status: needs triage GH issues that requires triage labels Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants