Explain Determinism #56

ethanfrey · 2017-01-19T14:05:40Z

This is more a question and a proposal to enhance documentation. Please give feedback here and then we can make some good docs out of it.

I often hear @jaekwon and @ebuchman saying that the hardest part of blockchain programming is the determinism. And I thought it was the p2p networking and bft consensus layer. Anyway, this brings me to the point that we probably need to make this very explicit and document what forms indeterminism can creep into our system.

In general, one assumes arbitrary op codes on a cpu are deterministic. I mean 2+2=4 and push, pop, jump, add, etc are all quite deterministic actions. Where then does the indeterminism come from? (short answer: syscall)

Below is my first attempt to explain sources of indeterminism, please expand, and comment:

Exceptional failures of the hardware

cosmic rays flip bits and the cpu overheated causing a short in the addition registers. This can happen, but should be regarded as so rare and unpredictable as not warrant much more concern.

Relying on some node-dependent state

calls to /dev/urandom or syscall.random should be different on every machine
calls to time, timeofday, etc. will also be different on other machines, even if one tries to sync the clocks well

Relying on unspecified implementation details

Mathematical functions are deterministic in that if f(x) = y, that is true over all time and space
However, many common packages hold a lesser rule. The same code may be deterministic (produce the same result if called 100 times), but there is no guarantee that v1.8.1 and v1.8.2 of the library will produce the same results. Especially, if client/servers use different libraries and/or languages to implement this function.
Race conditions (depending on the timing of concurrent processes)
Floating point numbers can easily introduce indeterminism, which is theoretically possible but tricky to avoid.
Another example is json encoding (there are many valid json encodings for one struct), and a simple changing of the spacing will break determinism
- json decoding, however, should be deterministic (with the exception of floats, see above)
The order of items when iterating through a map/hashtable/dict/whatever, is very dependent on the implementation, and should be considered random.
- However, if you just iterate to get a total count, which is invariant on the ordering of the items, then your code is once again deterministic

Depending on results from external systems

Filesystems can be full unpredictably, this can introduce non-deterministic failures (but generally not false values except for condition 1 above)
Networking calls can also introduce not just non-deterministic failures, but also non-deterministic successes. If I make a REST call on some API, even if there is no networking error, it may return different results each time.

Other categories???

Another way to break these down is:

Sources that may introduce non-deterministic failures

As long as your code handles these non-deterministic errors properly, leading to aborting the operation, and restarting from the beginning of the block (last known deterministic state), you should be able to handle this in a way that a system will remain deadlocked until the failure is resolved, rather than produce inconsistent results.
Careful coding can limit problems from non-deterministic failures, but we should give examples of how to do this safely, and also try to avoid these cases as much as possible

Sources that may introduce non-deterministic successes

These must be avoided at all costs, and should be well documented for all developers
It would be great to build a tool (like go vet) that checks code for these cases
Main danger zones: network calls, time, urandom (duh), floats, json encoding, iteration order over maps, ...

Okay, done for now, would love some feedback here. And maybe we can make a wiki page out of the results from this issue.

The text was updated successfully, but these errors were encountered:

melekes · 2017-01-19T14:55:42Z

Great write up! I think this article http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/#memory_address is worth reading as well.

But should be building a tool (like go vet) or our own language (+10000 hours of work) a priority? To draw the analogy with RethinkDB (post-mortem), isn't it like chasing correctness instead of a usecase? Or it will be our main advantage?

ebuchman · 2017-11-14T16:47:25Z

Let's make sure we have a clear place in our docs for talking about this. Then we can close this issue @zramsay

closes tendermint/abci#56

zramsay · 2018-03-06T07:27:39Z

tendermint/tendermint#1280 provides a place to discuss ;)

ethanfrey added the question label Jan 19, 2017

adrianbrink added discussion docs labels May 8, 2017

adrianbrink removed question labels May 16, 2017

zramsay self-assigned this Aug 24, 2017

zramsay removed their assignment Oct 3, 2017

ebuchman assigned zramsay Dec 5, 2017

zramsay added a commit to tendermint/tendermint that referenced this issue Mar 6, 2018

docs: add document 'On Determinism'

3019761

closes tendermint/abci#56

This was referenced Mar 6, 2018

docs: add 'On Determinism' tendermint/tendermint#1280

Merged

docs: expand the On Determinism document tendermint/tendermint#1279

Closed

zramsay closed this as completed Mar 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain Determinism #56

Explain Determinism #56

ethanfrey commented Jan 19, 2017 •

edited

Loading

melekes commented Jan 19, 2017

ebuchman commented Nov 14, 2017

zramsay commented Mar 6, 2018 •

edited

Loading

Explain Determinism #56

Explain Determinism #56

Comments

ethanfrey commented Jan 19, 2017 • edited Loading

melekes commented Jan 19, 2017

ebuchman commented Nov 14, 2017

zramsay commented Mar 6, 2018 • edited Loading

ethanfrey commented Jan 19, 2017 •

edited

Loading

zramsay commented Mar 6, 2018 •

edited

Loading