Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomized Fuzzer: cliff validator has not been changed, yet we bonded a new validator #2289

Closed
ValarDragon opened this issue Sep 10, 2018 · 8 comments
Assignees

Comments

@ValarDragon
Copy link
Contributor

ValarDragon commented Sep 10, 2018

Summary

This is achieved by commenting out "// govsim.SimulateMsgDeposit(app.govKeeper, app.stakeKeeper),", and then running:

go test ./cmd/gaia/app -run TestFullGaiaSimulation \
  -SimulationSeed=1 -SimulationEnabled=true \
  -SimulationNumBlocks=400 -SimulationBlockSize=400 \
  -v -timeout 24h

(This was ran on #2285), the error occurs on block 203. The logs for this one are 21mb, so that will be a pain to parse through (They're now saved at least :D)

This one can also be replicated on Seed=4 in case that helps debugging

/cc @cwgoes @rigelrozanski @alexanderbez

@cwgoes
Copy link
Contributor

cwgoes commented Sep 12, 2018

Replicable with go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=3 -SimulationEnabled=true -SimulationNumBlocks=400 -SimulationBlockSize=400 -v -timeout 24h now on develop.

@ValarDragon
Copy link
Contributor Author

This one also fails fairly early on (may reduce amount of logs that have to be dug through):

go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationSeed=9573 -SimulationEnabled=true -SimulationNumBlocks=150 -SimulationBlockSize=200 -v -timeout 24h

(Fails on block 42)

@alexanderbez
Copy link
Contributor

Will take a look at this now while I wait for updates on #2305.

@alexanderbez alexanderbez self-assigned this Sep 12, 2018
@alexanderbez
Copy link
Contributor

alexanderbez commented Sep 12, 2018

I believe I've discovered the problem that is causing this panic.

Jotting down initial brain dump here...

In a single block context and given the lowest three validators by power in the set: C (104), B (105), and A (106), a series of txs occur:

  • C (104) gets kicked out by another validator gaining power (111)
  • B (105) gets kicked out by another validator gaining power (135)
  • A (106) gets kicked out by another validator, in this case B gaining power resulting in 106
    • So now B is the new cliff with 106 and A is kicked out, also with 106.
  • Finally, some other non-bonded validator gets updated (irrelevant as it does not enter the set). UpdateBondedValidators finds the correct validator to bond (A), but it incorrectly thinks B is the lowest one -- in other words A and B should be swapped in the iteration. A should be bonded AND A should the last validator in the iteration.

This is causing the following to execute.

if newValidatorBonded && bytes.Equal(oldCliffValidatorAddr, validator.OperatorAddr) { // ...}

Again @cwgoes, setting the bondHeight in Keeper#UpdateValidator strikes again!

Once I resolve this and v0.25 is released, I'd like to focus on #2312.

@rigelrozanski
Copy link
Contributor

fascinating - makes sense

@alexanderbez
Copy link
Contributor

alexanderbez commented Sep 13, 2018

Update: Looks like the issue I described above is not the culprit.

In a given block context, a series of txs occur that cause the bottom two validators in the set to have the same power (B & C). B gets bonded/enters set and A gets removed. However, A should not actually get removed because it's still in the top maxValidators somehow (after inspecting the power store). It getting removed actually causes the panic on the next tx. A change I made in #2332 addresses this, but it's very specific case and I don't feel comfortable with it.

Pastebin: https://pastebin.com/64hMim3b

I've been trying to debug this for quite some time now, so it's very possible I'm missing something super obvious. In any case, I can't see a clear fix here. Maybe we should punt this in favor of #2312?

@ValarDragon @cwgoes @rigelrozanski

@ValarDragon
Copy link
Contributor Author

100% agree, I vote to punt and do #2312

@alexanderbez
Copy link
Contributor

Closing in favor of #2312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants