Configurable step duration transitions#124
Conversation
| let header = block.header().clone(); | ||
| let first = header.number() == 0; | ||
|
|
||
| if let Some((_, dur)) = self.step_duration.range(0..=header.number()).last() { |
There was a problem hiding this comment.
Isn't on_prepare_block the wrong place for this? The method is meant to only be called when this node starts to prepare a block. The current implementation calls it more often, but only in nodes with a signing key. With #103, we'd actually only call it when it is our turn.
But even non-validators need to keep track of the step duration, I think? Maybe we should call this in on_close_block, which is called for our own as well as for imported blocks (if I'm not mistaken).
phahulin
left a comment
There was a problem hiding this comment.
Mapping doesn't seem to be supported when parsing spec from json file.
For spec
{
"name": "testnetpoa",
"engine": {
"authorityRound": {
"params": {
"stepDuration": {
"0": 5
},
"blockReward": "0xDE0B6B3A7640000",
"maximumUncleCountTransition": 0,
"maximumUncleCount": 0,
"validators": {
getting error
Loading config file from node1.toml
Spec json is invalid: invalid type: map, expected a hex encoded or decimal uint at line 7 column 3
|
Does this work? "stepDuration": {
"transitions": {
"0": 5
}
}, |
|
Same error |
It seems that some syntax checker needs to be corrected for the |
|
It needs |
|
Just try without quotes: It works for me. |
e163ed1 to
c681342
Compare
c681342 to
c0482f7
Compare
|
Now it works for me in format However I did a test with two validator nodes and noticed some strange things. Here is the piece of log:
and it stalled: |
|
I'm not sure about (1) because in my case blocks are produced at 5 second intervals: |
|
I also find it strange. Do you get correct timestamps after severl restarts? |
|
There is a problem with |
|
I tested other intervals successfully before the first change of duration. Then it always stalls. |
|
I also tried to launch it with |
|
This one also makes it stall: |
|
Anything with more than one step duration will make it stall. I tried to move the step duration update from |
|
@phahulin I tried to launch it with I think the reason is that our modified Parity can't call the |
Also there are no logs which would tell anything about the error. |
|
I added logging in the latest commit. |
|
This case works, though :) |
|
It's still the same duration though. |
|
It might be useful to print the calculated step numbers (if there is such a thing); I still don't have a good overview how the steps are computed. If there's any other place where we just say something like |
| let expected_seconds = self.load() | ||
| .checked_add(1) | ||
| .and_then(|ctr| ctr.checked_mul(self.duration as u64)) | ||
| .and_then(|ctr| ctr.checked_mul(self.duration.load(AtomicOrdering::SeqCst) as u64)) |
There was a problem hiding this comment.
I think this line is the main problem:
With a constant step duration, step number n just ended n * duration seconds after the beginning of 1970. Now, with a variable step duration, we'd have to count the number of steps up to the latest duration change, and then continue counting from there, with the new duration!
It almost looks like it would be easier to change the duration at a specified time (or step number) instead of a block number. Then at least Step wouldn't have to know about a particular block's timestamp, and could make the computation based on the step duration map only.
There was a problem hiding this comment.
We could compute time by adding up all preceding (num_blocks * duration). But that wouldn't work if (or rather when) blocks appear that are longer than duration because of excess load.
There was a problem hiding this comment.
Step duration and block time don't necessarily coincide, and it looks like the step number computation is independent of actual blocks (in particular if the network wasn't started in 1970…); we should probably keep it that way and just compute it based on the table of step duration changes.
| /// Change the next step duration if necessary and apply the block reward on finalisation of the block. | ||
| fn on_close_block(&self, block: &mut ExecutedBlock) -> Result<(), Error> { | ||
| self.update_step_duration(block.header().number() + 1); | ||
| self.update_step_duration(block.header().number()); |
There was a problem hiding this comment.
I don't really understand this commit, but won't that still compute the step the wrong way in Step::duration_remaining? E.g. let's say we have step duration 1 second and are 100 seconds into 1970, i.e. we're at step 100. Now we change the step duration to 3 seconds and then increment the step. So it will change its step number to 101, and Step::duration_remaining will think that step 101 lasts until the 303rd second of 1970, making the step last more than 200 seconds?
There was a problem hiding this comment.
I wanted the step duration update to happen between steps, so that the duration of the current step is not changed as opposed to when duration is changed directly.
| let next_dur = self.next_duration.load(AtomicOrdering::SeqCst); | ||
| let cur_dur = self.duration.swap(next_dur, AtomicOrdering::SeqCst); | ||
| if cur_dur != next_dur { | ||
| let starting_sec = unix_now().as_secs() as usize; |
There was a problem hiding this comment.
I think it would be safer to use the block's timestamp instead of our local time here, so the validators' step counting doesn't go out of sync.
(And I still feel it would be even safer to just configure the step duration changes by time instead of by block.)
There was a problem hiding this comment.
I think we can make it preserve the synchrony across step duration changes purely by time and the number of steps since the last step duration change. I don't think getting the block timestamp would be necessary; in any case it would not help to restore synchrony in this case.
| let next_step = ( | ||
| (unix_now().as_secs() as usize - starting_sec) / | ||
| (self.duration.load(AtomicOrdering::SeqCst) as usize) | ||
| ) + starting_step; |
There was a problem hiding this comment.
I'm still concerned that this value is going to be different in two validators if one of them handles the block earlier than the other. Won't that cause them to go out of sync?
There was a problem hiding this comment.
I think this is possible too, although the likelihood is extremely small. As a remedy I've been considering linking the duration update with a step number or scheduling an update one block ahead. If the update is already scheduled in a previous on_close_block, there will be no race conditions at the time of step duration change because it will be done in Step::increment but in that case we must forbid changing the step duration without any blocks in between.
There was a problem hiding this comment.
linking the duration update with a step number
Yes, I think that's the right approach. But making it a timestamp might be even more convenient: Then you could just specify the time of the change in the spec.
There was a problem hiding this comment.
Would you prefer having timestamps in the spec rather than block numbers? Than we'd have to map timestamps to block numbers.
There was a problem hiding this comment.
I don't think we'd ever have to map them at all: The step duration would just change at the steps corresponding to those timestamps, independent of any block numbers.
There was a problem hiding this comment.
Sure, but we can just define that they only affect the following step.
I think that's still more convenient than having to calculate and specify a step number.
There was a problem hiding this comment.
For the time being I implemented a simpler modification with the step numbers. If needed it can be further extended to timestamps. However this only has the difference that the step numbers no longer need to be calculated for writing the spec. There is another issue with calibrate which needs to be fixed in either case.
There was a problem hiding this comment.
By the way using steps (or block numbers) or timestamps in the spec seems complementary because internally we have to compute both in order to synchronise blocks. Which is why if we use timestamps in the spec, there would be internal synchronisation of block numbers which is similar to the step number routine. If using timestamps is only improves user convenience, I'd prefer to stay with block numbers for now and maybe convert to timestamps later if needed.
There was a problem hiding this comment.
Yes, I guess so. But if we used block numbers, I think we'd have to use the block's timestamp to change the step number, to guarantee that the nodes don't go out of sync. And that might complicate things more because our internal step counter could already have progressed further in the meantime. And what if there's a reorg that undoes a block that changed the step duration?
There was a problem hiding this comment.
Following d170486 step numbers coincide with block numbers. The initial_step atavism has been removed.
| StepDuration::Single(u) => { | ||
| let mut durs: BTreeMap<u64, u16> = BTreeMap::new(); | ||
| durs.insert(0, map_step_duration(u)); | ||
| durs |
There was a problem hiding this comment.
Maybe simpler:
iter::once((0, map_step_duration(u))).collect()| duration: u16, | ||
| inner: AtomicU64, | ||
| /// Duration of the current step. | ||
| current_duration: AtomicU16, |
There was a problem hiding this comment.
Indentation is off. (Spaces/tabs…)
| if let Some(&next_dur) = self.durations.get(&next_step) { | ||
| let prev_dur = *self.durations.range(0 .. next_step).last().expect("step duration map is empty").1; | ||
| let prev_starting_sec = self.starting_sec.load(AtomicOrdering::SeqCst); | ||
| let prev_starting_step = self.starting_step.load(AtomicOrdering::SeqCst); |
There was a problem hiding this comment.
I wonder whether we need to put the whole Step in a Mutex instead of making its fields atomic individually: Things probably go wrong if some of those fields are mutated while increment is running?
There was a problem hiding this comment.
This is the only place where these fields are modified apart from the constructor. I agree though that if we start mutating them elsewhere we need an atomic lock on them.
There was a problem hiding this comment.
But I'm not sure it's even guaranteed that no two instances of increment run at the same time. 😬
And even if it is, it would be safer in a single Mutex (or RwLock?), in case future modifications change this.
There was a problem hiding this comment.
I think the Atomic stuff was only supposed to help mutating fields of a Step that's passed by an immutable reference. Synchronisation is simulated by calibrate.
There was a problem hiding this comment.
Yes, but if the Step itself were passed as an Arc<Mutex<Step>>, that interior mutability wouldn't be necessary.
(I feel like this pattern—interior mutability of fields—is overused in Parity: It creates the illusion of thread-safety.)
| inner: AtomicU64::new(initial_step), | ||
| calibrate: our_params.start_step.is_none(), | ||
| duration: our_params.step_duration, | ||
| current_duration: AtomicU16::new(duration), |
| step: Arc::new(PermissionedStep { | ||
| inner: Step { | ||
| inner: AtomicU64::new(initial_step), | ||
| inner: AtomicU64::new(0), |
There was a problem hiding this comment.
Shouldn't we compute the actual current step number on startup?
There was a problem hiding this comment.
Actually another point is true: there is a configurable start_step which got thrown away with the water. I have to take it into account in order not to break the spec.
|
Done in #152. |
Fixes #122.
The PR adds a configuration map and at the same time keeps the previous configuration option for backwards compatibility.
I'll rebase it on top of the rebased
aura-posas soon as we update the working branch.