Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Runtime APIs don't account for slot number #1327

@rphmeier

Description

@rphmeier

The Issue

Polkadot and other Substrate-based blockchains export a number of runtime APIs that allow the node-side code to inspect the state of the chain or acquire information derived from it. These calls take a header hash as a parameter to determine which state is being based off of.

Many of these calls return session-dependent data: who the validators are, their corresponding assignments, registered parachains, parathreads, etc.

In BABE, session changes can happen logically between blocks. BABE breaks down time into slots. You can have up to one block per slot (in any particular chain). Sessions are equivalent with Epochs in BABE, which are delineated by a constant number of slots. If an epoch ends at slot 4000, and you have a chain of two blocks, one at slot 3998 and one at slot 4002, the validators of the new session need to sign off on the new block.

Likewise, at the beginning of the block at slot 4002, the on_new_session logic would be triggered with a new validator set. This has implications for the logic of constructing the new block at slot 4002.

However, this is not dependent on the parent hash, but rather the slot number of the block we are currently constructing. If we had been building on the same parent at slot 3999, we would have wanted to use the old validator set.

It is important to note that once our node has perceived that it is authoring at slot X, it is wrong for it to author at slot X-1, even if it was supposed to. Time only moves forwards. And the change in slot number (and thus session number) applies to all chain heads, not only a subset.

The Fix(es)

Referring to the parent state by its hash is not sufficient. This is the primary means of reference of a parent state in the guide and in the code. There are two alternatives (Hash, SlotNumber) and (Hash, SessionIndex). We would want the pulse of the overseer to include information about the new slot number or session index along with information about new leaves and data being dropped.

The (Hash, SlotNumber) pair has the advantage that it doesn't need to move any significant logic into the overseer. The overseer should be able to get a stream of SlotNumbers from BABE somehow and pass that directly on to subsystems.

However, I prefer the (Hash, SessionIndex) approach. The reason being that sessions are thousands of slots long, and most of the time, the session doesn't change. The overseer would then be responsible for these tasks:

  • Listen for new slots on some Stream.
  • Determine if the session has changed between the last slot and the current one.
  • Send a message to all subsystems indicating that.

Once that change is made, all subsystems can track block-based work by (Hash, SessionIndex).

The last piece of the puzzle is the Runtime APIs themselves. Subsystems being aware of (Hash, SessionIndex) pairs is not useful if there is no way to extract session-localized data from the Runtime API. There are three approaches I am currently aware of:

  1. Amend the general runtime API interface to take the slot number along with the block hash.
  2. Transform every runtime API call C into two: first ensure_session(session_index) and then C. ensure_session is some hypothetical API call that we can introduce which forces session change logic to occur within the parachains runtime.
  3. Amend all relevant parachains runtime APIs to accept a session index as an argument. This would be anything that indirectly deals with the current validator set. If the session index is not equal to the current session index of the parent or the following session index (for which the validators are already known), the API call fails.

Approach 1 is difficult for a few reasons. First, runtime APIs are deeply entrenched in Polkadot and Substrate and this would be a large effort. Second, the runtime API interface needs to be generic as not all Substrate chains will have slot numbers. This implies some difficulty and trait/type juggling in taking this approach.

Approaches 2 and 3 are roughly equivalent to each other. Approach 2 ensures the session data is correct outside of the runtime API call, and approach 3 ensures the session data is correct inside the runtime API call.

Of these 3, I lean towards 3, with 2 as a close second. 3 is better than 2 because it enforces correctness as part of the interface of the function, meaning that the error of forgetting an ensure_session call could not be made.

Summary

  1. Introduce a notification stream for the slot worker in Substrate so we can listen for new slots and session changes. @sorpaas is working on this.
  2. Alter the overseer to watch for session changes and dispatch some kind of update to all subsystems to let them know.
  3. Alter the runtime API logic to be somewhat session-fluid, probably by accepting a session argument for each Polkadot runtime API call that is session-dependent.

Metadata

Metadata

Assignees

Labels

I3-bugFails to follow expected behavior.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions