Skip to content
This repository has been archived by the owner on Aug 18, 2020. It is now read-only.

[CO-354] Fix SafeCopy instance for AddrAttributes #3685

Merged
merged 5 commits into from
Oct 2, 2018

Conversation

mhuesch
Copy link
Contributor

@mhuesch mhuesch commented Sep 28, 2018

Description

Until now, the instance was derived, and thus was broken when we
modified `AddrAttributes` in earlier release/1.3.1 commits by adding
the `aaNetworkMagic` field.

In order to make future modifications to the instance easier in the
future, we simply de/serialize a ByteString from the `Bi` instance
for `Attributes AddrAttributes`.

Since there is a very low probability of collisions between the old
SafeCopy binary format and the new CBOR/Bi binary format, we attempt
to decode via `Bi` first, and if that fails (either an invalid length
ByteString, or a CBOR failure), then we do a "legacy" decode, where
we only look for 2 fields (`AddrAttributes` had 2 before we added
`aaNetworkMagic`) and fill in `NMNothing` so that the existing
mainnet/staging databases are correctly read.

Linked issue

https://iohk.myjetbrains.com/youtrack/issue/CO-354

Type of change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • [~] 🛠 New feature (non-breaking change which adds functionality)
  • [~] ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • [~] 🏭 Refactoring that does not change existing functionality but does improve things like code readability, structure etc
  • 🔨 New or improved tests for existing code
  • [~] ⛑ git-flow chore (backport, hotfix, etc)

Developer checklist

  • I have read the style guide document, and my code follows the code style of this project.
  • If my code deals with exceptions, it follows the guidelines.
  • I have updated any documentation accordingly, if needed. Documentation changes can be reflected in opening a PR on cardanodocs.com, amending the inline Haddock comments, any relevant README file or one of the document listed in the docs directory.
  • [~] CHANGELOG entry has been added and is linked to the correct PR on GitHub.

Testing checklist

  • I have added tests to cover my changes.
  • All new and existing tests passed.

QA Steps

$ stack test
$ EXISTING_WALLET_PATH=<path>
$ stack exec cardano-node -- --topology=docs/network/example-topologies/mainnet-staging.yaml --configuration-key mainnet_dryrun_full  --db-path ../staging-db --tlscert ../tls-certs/server.crt --tlskey ../tls-certs/server.key --tlsca ../tls-certs/ca.crt --no-client-auth --wallet-db-path $EXISTING_WALLET_PATH

Until now, the instance was derived, and thus was broken when we
modified `AddrAttributes` in earlier release/1.3.1 commits by adding
the `aaNetworkMagic` field.

In order to make future modifications to the instance easier in the
future, we simply de/serialize a ByteString from the `Bi` instance
for `Attributes AddrAttributes`.

Since there is a very low probability of collisions between the old
SafeCopy binary format and the new CBOR/Bi binary format, we attempt
to decode via `Bi` first, and if that fails (either an invalid length
ByteString, or a CBOR failure), then we do a "legacy" decode, where
we only look for 2 fields (`AddrAttributes` had 2 before we added
`aaNetworkMagic`) and fill in `NMNothing` so that the existing
mainnet/staging databases are correctly read.
@mhuesch mhuesch requested a review from erikd as a code owner September 28, 2018 19:43
@mhuesch mhuesch force-pushed the intricate+mhuesch/CO-354/fix-safecopy branch from 1fc6a43 to 69b0d7c Compare September 28, 2018 20:55
In order to ensure we have backwards compatibility with the old
Address and Address' SafeCopy binary formats, we add some decode-only
tests, which exercise the "migration" part of our new
`SafeCopy AddrAttributes` instance. We also add golden tests which
ensure the new CBOR format is maintained going forwards.

We also add roundtrip testing for Address' (which exercises the new
CBOR code). There was already a roundtrip test for Address, so we
leave that.
@mhuesch mhuesch force-pushed the intricate+mhuesch/CO-354/fix-safecopy branch from 69b0d7c to 44fbef4 Compare September 28, 2018 20:59
rvl added 2 commits October 1, 2018 13:30
We want to work with mainnet-style addresses when testing asset
locking.

This reverts commit a2d569a.
The log file which is checked after running the tests got renamed from
nodeN.json to coreN.json.
adinapoli-iohk
adinapoli-iohk previously approved these changes Oct 1, 2018
Copy link
Contributor

@adinapoli-iohk adinapoli-iohk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- as a matter of personal preference I would have attempted immediately the CBOR decode in getCopy as this will serve us well for the future, but that's marginal and probably debatable. Nothing that should stop this PR from being merged anyway!

I would be curious to hear what Duncan has to say, though 😉

intricate
intricate previously approved these changes Oct 1, 2018
@mhuesch
Copy link
Contributor Author

mhuesch commented Oct 1, 2018

@adinapoli-iohk I think you're preference lines up with my intention when I was writing the code, can you elaborate on the differences between the implementation and what you were imagining?

In getLegacy I define the backwards compatible decode, but don't invoke it. Then we read in bytes without consuming input, so that we do not commit to one parsing path. We check that the CBOR length is valid and that we can decode CBOR before we return the "migrated format" result. So even though it occurs as the bottom-most code block, the CBOR function can be thought of as running first, I think.

@adinapoli-iohk
Copy link
Contributor

@mhuesch As said it's a minor (and probably debatable) alternative implementation, which would have been along the lines of:

getCopy = contain $ do
  resE <- Bi.deserialise' ...
  case resE of
      Left e -> ... getLegacy ..
      Right x -> return x

I.e. attempting the deserialisation first thing, skipping the look-ahead check, and then only if we fail we try the legacy one. But let me repeat that I was just playing the devil's advocate here, I think your implementation is an excellent one. 😉

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SafeCopy instance we need to be a bit more careful with the use of the cereal package primitives.

The uncheckedLookAhead is actually a bit dangerous and rather misleading. It only does what you expect if all the input is supplied in one chunk, since it only looks up to the end of the chunk. If acid-state uses the incremental interface to deserialise in e.g. 4k chunks then this can fail at a chunk boundary.

Instead we can use lookaheadM and ensure, something like:

-- ByteStrings are prefixed with a Int64 length. We cheat here and read the length as
-- thought it were a safePut-encoded Int64, so we know how long the ByteString will be.
--
-- 4[version] + 8[Int64] + <bytesLen>
-- bytesLen == length of AddrAttributes bytestring
tryNonLegacy <- Cereal.lookAheadE $ do
  bytesLen <- safeGet
  bytes    <- Cereal.ensure bytesLen
  pure (Bi.decodeFull bytes)
label $ case tryNonLegacy of
  Left  _             -> getLegacy
  Right attrAddrAttrs -> pure (attrData attrAddrAttrs)

The point is, we use the lookaheadE so that we don't have to mess with uncheckedSkip etc, since it'll automatically go back if it fails, and consume if it succeeds. And then it means we can use ensure rather than uncheckedLookAhead, since if the ensure fails then the surrounding lookAheadE fails. And finally the Bi.decodeFull returns the Either that lookAheadE expects to indicate failure or success.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 1, 2018

Otherwise, looks good.

@mhuesch
Copy link
Contributor Author

mhuesch commented Oct 1, 2018

@dcoutts After digging into your suggestion a bit, my conclusion is that it's better/safer, but ensure is problematic, because it returns the full remaining input. Thus we consume too much and the fields that come "after" the one our SafeCopy instance is parsing get consumed as well, so they fail with a demandInput error. I think we want something like ensureAndTake which ensures there are n bytes available, but only takes those n bytes.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 2, 2018

I think we can actually go simpler still. We don't need to parse the encoding of the bytestring manually. We can just use a normal safeGet inside a lookAheadE to get the bytestring and try to decode that.

getCopy = contain $ do
    -- Try decoding as a BSL.ByteString containing the new format, but if that
    -- fails go for the legacy format.
    tryNonLegacy <- Cereal.lookAheadE (Bi.decodeFull <$> safeGet)
    label $ case tryNonLegacy of
      Left  _             -> getLegacy
      Right attrAddrAttrs -> pure (attrData attrAddrAttrs)
  where
    label     = Cereal.label "Pos.Core.Common.AddrAttributes.AddrAttributes:"
    getLegacy = AddrAttributes <$> safeGet
                               <*> safeGet
                               <*> pure NMNothing

From @dcoutts:
> The uncheckedLookAhead is actually a bit dangerous
> and rather misleading. It only does what you expect
> if all the input is supplied in one chunk, since it
> only looks up to the end of the chunk. If acid-state
> uses the incremental interface to deserialise in e.g.
> 4k chunks then this can fail at a chunk boundary.

I switch to using `Alternative` to run the non-legacy
parser first, and if it fails, to run the legacy parser.
This seems to leverage Cereal's `fail` mechanism the way
it is intended - when the first parser fails, no input is
consumed, and the second parser is run.
@mhuesch mhuesch dismissed stale reviews from intricate and adinapoli-iohk via 1d6d9f7 October 2, 2018 15:01
@mhuesch
Copy link
Contributor Author

mhuesch commented Oct 2, 2018

@dcoutts I'm still mystified by lookAheadE's behavior, as it is described to do (I think) what we want it to do, yet it behaves differently.

I figured out how to use Alternative to run the non-legacy parser first, and if that fails, to then run the legacy parser. This seems fairly clean & safe. Just updated the PR.

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@disassembler disassembler merged commit 3fa92ec into release/1.3.1 Oct 2, 2018
@disassembler disassembler deleted the intricate+mhuesch/CO-354/fix-safecopy branch October 2, 2018 19:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants