Skip to content
This repository has been archived by the owner on Aug 18, 2020. It is now read-only.

[CO-347] Wallet's UTXO histogram view #3402

Conversation

paweljakubas
Copy link
Contributor

@paweljakubas paweljakubas commented Aug 14, 2018

Description

Adding Utxo statistics endpoint to API V1 with new data layer support.
Idea was to deliver solution that can in one pass calculate a number of statistics. Besides histogram I added as an example total number of stakes in a given wallet calculation - which in the last version can go out. The solution relies on Gabriel Gonzalez foldl library.
Alternatively, I was also considering the approach adopted:
http://tech.frontrowed.com/2017/09/22/aggregations/

As an example statistics look like:
data UtxoStatistics = UtxoStatistics { theHistogram :: ![HistogramBar] , theAllStakes :: !Word64 }

I added full functionality for Handler and stub for LegacyHandler as it is about to be removed soon. I added integration test in WalletSpec, and also commented one in TransactionsSpec (to be uncommented and checked when Client.hs usage of LegacyHandler will be replaced with Handler).

Moreover, I intentionally used [(Account, Utxo)] in WalletLayer. Maybe in the future installation one can extend this endpoint and filter using account info, etc.

Linked issue

https://iohk.myjetbrains.com/youtrack/issue/CO-347

Type of change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🛠 New feature (non-breaking change which adds functionality)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🏭 Refactoring that does not change existing functionality but does improve things like code readability, structure etc
  • 🔨 New or improved tests for existing code
  • ⛑ git-flow chore (backport, hotfix, etc)

Developer checklist

  • I have read the style guide document, and my code follows the code style of this project.
  • If my code deals with exceptions, it follows the guidelines.
  • I have updated any documentation accordingly, if needed. Documentation changes can be reflected in opening a PR on cardanodocs.com, amending the inline Haddock comments, any relevant README file or one of the document listed in the docs directory.
  • CHANGELOG entry has been added and is linked to the correct PR on GitHub.

Testing checklist

  • I have added tests to cover my changes.
  • All new and existing tests passed.

QA Steps

To build:
./scripts/build/cardano-sl.sh wallet-new
To test:
stack test cardano-sl-wallet-new
nix-build release.nix -A tests.walletIntegration

Screenshots (if available)

This commit adds stubs to legacy and new-wallet, histogram and bar datatypes are accomodated, one-pass many-aggregates functionality is added
Kernel level implementation of WalletLayer is added, Legacy is not supported. The implementation is used in Handlers of V1 API. Refactoring from Integer to Word64 is done
Added integration test for Wallets and Transactions (here commented until migration to Handler happens). Added ToSchema instance needed to pass swagger integration tests.
Decided to use [(Account, Utxo)] type in WalletLayer as it can be useful if the method is extended for account parameters. WalletLayers quickcheck required to add several instances as a consequence
Copy link
Contributor

@edsko edsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I didn't check the logic of the histogram itself, merely how it has been integrated with the wallet layer and the kernel, which looks good. Would appreciate it though if you changed the style of the implementation of the layer ever so slightly though to bring it in line with the rest (see comment below).

Right accountsIxSet ->
let hdRootId = HD.HdRootId . InDb $ rootAddr
hdAccountId accountIndex = HD.HdAccountId hdRootId (HD.HdAccountIx accountIndex)
in Right ( map (\acc -> (acc, accountUtxo db (hdAccountId $ V1.accIndex acc) ) ) $ IxSet.toList accountsIxSet )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the style of the wallet layer implementation as it was before I refactored it. Take a look at what it is now, the use of ExceptT can make a bit easier to write, and then you can also use from existing combinators such as fromRootId.

@@ -187,3 +187,34 @@ transactionSpecs wRef wc = do
etxn <- postTransaction wc payment

void $ etxn `shouldPrism` _Left
{-- Uncomment when new Client is Handler based rather than LegacyHandler based
it "posted transactions gives rise to nonempty Utxo histogram" $ do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there's a xit to declare pending tests. This way, we can still have GHC to verify the code as the codebase evolve, and as soon as it's ready, we can change the xit to it and have the test executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,51 @@
module Cardano.Wallet.API.V1.Handlers.Internal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than Cardano/Wallet/API/V1/Handlers/Internal, I'd rather have a Cardano/Wallet/Types/UtxoStatistics module with those functions and the type defined within it, then re-exported from V1/Types.hs.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved types and functions to Cardano/Wallet/Types/UtxoStatistics.hs and reexport them in V1/Types.hs


eresp <- getUtxoStatistics wc (walId wallet)
utxoStatistics <- fmap wrData eresp `shouldPrism` _Right
let possibleBuckets = fmap show $ ( map (\toPower -> 10^toPower :: Word64) [(1::Word64)..16] ) ++ [45 * (10^(15::Word64))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not re-use the generateBounds function ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

populateBuckets bounds =
case bounds of
(x:_) -> L.Fold (addCountInBuckets x) (initalizeMap bounds) (fmap (\pair -> HistogramBarCount (T.pack $ show $ fst pair, snd pair) ) . MS.toList)
_ -> error "populateBuckets has to be powered with nonempty bounds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, use a NonEmpty Word64 instead of a [Word 64] :D !

https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List-NonEmpty.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

utxosCoinValuesForAllAccounts pairs =
concatMap (\pair -> map extractValue (M.elems $ snd pair) ) pairs
in do
return $ single (computeUtxoStatistics $ utxosCoinValuesForAllAccounts w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads better with the do at the upper-most level:

         Right w -> do
             let extractValue :: TxOutAux ->  Word64
                 extractValue = 
                     getCoin . txOutValue . toaOut
                 
             let utxosCoinValuesForAllAccounts :: [(Account, Utxo)] -> [Word64]
                 utxosCoinValuesForAllAccounts pairs =
                     concatMap (\pair -> map extractValue (M.elems $ snd pair) ) pairs
              
             return $ single (computeUtxoStatistics $ utxosCoinValuesForAllAccounts w)

Also, I'd rather move these functions inside the computeUtxoStatistics function. It will also make its signature a bit clearer. Rather than [Word64] -> UtxoStatistics it becomes NonEmpty (Account, Utxo) -> UtxoStatistics which in my opinion gives already a much clearer idea of what the function is doing and how to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied within do . As underscored above I moved everything to Cardano/Wallet/Types/UtxoStatistics.hs which is reexported in V1/Types.hs. I retained lets here as there is Account which means in order to avoid cyclic dependencies one solution would be to move everything related with Account to separate Cardano/Wallet/Types/Account.hs and reexport it in V1.Types.hs. I am not sure if we want this

case floatingOrInteger val of
Left (_ :: Double) -> HistogramBarCount ("0", 0 :: Word64)
Right integer -> HistogramBarCount (key, integer)
constructHistogram _ = HistogramBarCount ("0", 0 :: Word64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above, if it's not an positive integer, we probably want to fail explicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

& type_ .~ SwaggerNumber
& minimum_ .~ Just 0
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@@ -94,6 +94,8 @@ bracketPassiveWallet =
, _pwlCreateAddress = pwlCreateAddress
, _pwlGetAddresses = pwlGetAddresses

, _pwlGetUtxos = error "Method not implemented for legacy handler"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this level I am retrieving [(Account,Utxo)] for a given Wallet. Then ad the level of endpoint handler I will turn it into Utxo statistics. Why, maybe we are to extend endpoint and introduce some Account fields filtering, etc.

deriving Eq

instance Show GetUtxosError where
show = formatToString build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure here, but do we actually use show to display our errors :s ?
Aren't we using their Buildable instances? If not, then I'd open a ticket to change that, and wouldn't define any custom instance for Show.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Show instances get used in some places (and are required when throwing these as expections). We do the same in a few places. I think the correct solution is to have Show be auto-derived and then define displayException in terms of the Buildable instances, but typically displayException is not used in top-level exception handlers. Either way, I'd say open a separate ticket about this because this is not just this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edsko Yes, that's exactly what I meant and I am not expecting such refactoring to happen in this PR :/ ... It is rather bad if we indeed rely on show to display exceptions in the top-level handlers.

I think it would be fine to go directly for formatToString build in those handlers (since the actual purpose of the Buildable is to construct a user-friendly readable output) yet I don't mind having that exposed as a clear displayException 👍

genTxIn = oneof
[ TxInUtxo <$> arbitrary <*> arbitrary
, TxInUnknown <$> choose (1, 255) <*> scale (min 150) arbitrary
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this defined here as a separated function 🤔 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was reused in one of my earlier versions, now only in one place -> fixed

@KtorZ KtorZ changed the title Origin/paweljakubas/co 347/utxos statistics [CO-347] Wallet's UTXO histogram view Aug 16, 2018
eresp <- getUtxoStatistics wc (walId wallet)
utxoStatistics <- fmap wrData eresp `shouldPrism` _Right
let possibleBuckets = fmap show $ ( map (\toPower -> 10^toPower :: Word64) [(1::Word64)..16] ) ++ [45 * (10^(15::Word64))]
let histogram = map (\ x -> curry HistogramBarCount x 0) possibleBuckets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: no space between \ and the variable it introduces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -0,0 +1,51 @@
module Cardano.Wallet.API.V1.Handlers.Internal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this

) where


import Prelude
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project uses Universum as the prelude instead of the stock Prelude.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

generateBounds :: BoundType -> [Word64]
generateBounds bType =
case bType of
Log10 -> ( map (\toPower -> 10^toPower :: Word64) [(1::Word64)..16] ) ++ [45 * (10^(15::Word64))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I have a bunch of type signatures I usually like to write a "specialized identity function" to make it a little more natural:

asWord64 :: Word64 -> Word64
asWord64 = id

which makes this look like:

Log10 -> (map (\toPower -> asWord64 (10^toPower)) [1 .. asWord64 16]) ++ [45 * (10 ^ asWord64 15)]

Or, as the ambiguity appears to come from the polymorphic type of (^), you could define a specialized variant that removes the need for signatures:

(^!) :: Word64 -> Word64 -> Word64
(^!) = (^)

(map (\toPower -> 10 ^! toPower) [1..16]) ++ [45 * (10 ^! 15)]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be expressed as:

take 16 (iterate (*10) 10) ++ [45 * (10 ^! 15)]

and since * requires both operands to be the same type, you only need the type annotation once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion applied

import Cardano.Wallet.API.V1.Types

import qualified Control.Foldl as L
import qualified Data.Map.Strict as MS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the codebase had a convention to do it with import ... as Map but it is actually split fairly evenly between as M and as Map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

-- | (b) avg or std of stake in a bucket
-- | (c) topN buckets
-- | to name a few
newtype HistogramBar = HistogramBarCount (Text, Word64) deriving (Show, Eq, Ord, Generic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a newtype over a tuple, I'd prefer to see a data with record fields:

data HistogramBar = HistogramBar
    { bucketName :: !Text
    , bucketUpperBound :: !Word64
    }

(,) as a type is usually lazier than what you want. This also allows you to provide better information on the type for generic serialization/deserialization and schemas, and it is more robust to refactoring/changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

-- | (a) sum of stakes in a bucket
-- | (b) avg or std of stake in a bucket
-- | (c) topN buckets
-- | to name a few
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haskell's doc comments do not use multiple | characters for each line. IIRC this won't render correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed | from everywhere in the comments I introduced

data UtxoStatistics = UtxoStatistics
{ theHistogram :: ![HistogramBar]
, theAllStakes :: !Word64
} deriving (Show, Generic, Ord)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 space indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

, "allStakes" .= allStakes ]

instance FromJSON UtxoStatistics where
parseJSON (Object v) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be derived generically. There's nothing about the type here that requires it to be manually written.

Once the HistogramBar type is a record with proper serde instances, this should be derivable.

Copy link
Contributor Author

@paweljakubas paweljakubas Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to specification of the task endpoint expects to have:

    "allStakes": 0,
    "histogram": {
      "10000000000000000": 0,
      "100000000000000": 0,
      "1000000000000": 0,
      "10000000000": 0,
      "100000000": 0,
      "1000000": 0,
      "10000": 0,
      "100": 0,
      "1000000000": 0,
      "100000000000": 0,
      "10": 0,
      "45000000000000000": 0,
      "1000": 0,
      "10000000000000": 0,
      "1000000000000000": 0,
      "100000": 0,
      "10000000": 0
    }
  }

rather than

    "allStakes": 0,
    "histogram": [
        {
             "bucketName" : "10",
             "bucketUpperBound" : 0
        },
 .....
    ]
  }

& at "10" ?~ (Inline $ mempty
& type_ .~ SwaggerNumber
& minimum_ .~ Just 0
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repetition of the Inline schema should be factored out. You can do this with:

declareNamedSchema _ = do
    wordRef <- declareSchemaRef (Proxy :: Proxy Word64)
    pure $ ...
            mempty & at "10" ?~ wordRef

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also suggests a further condensation, where the hashmap is constructed from a list consisting of the relevant powers paired with the schemaref instead of manually enumerated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@paweljakubas
Copy link
Contributor Author

Thank you all for the reviews and suggestions. I have just made respective commit

@@ -60,10 +61,10 @@ walletSpecs _ wc = do

eresp <- getUtxoStatistics wc (walId wallet)
utxoStatistics <- fmap wrData eresp `shouldPrism` _Right
let possibleBuckets = fmap show $ ( map (\toPower -> 10^toPower :: Word64) [(1::Word64)..16] ) ++ [45 * (10^(15::Word64))]
let histogram = map (\ x -> curry HistogramBarCount x 0) possibleBuckets
let possibleBuckets = fmap show $ (generateBounds Log10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parens or $ are unnecessary here; could be either fmap show (generateBounds Log10) or fmap show $ generateBounds Log10 or even show <$> generateBounds Log10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

arbitrary =
let possibleBuckets = fmap show (generateBounds Log10)
possibleBars = NL.zipWith HistogramBarCount possibleBuckets (NL.fromList [0..])
in elements (NL.toList possibleBars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does what you want -- this makes it choose a bucket name and a constant number from the [0..] list. Eg, for every HistogramBarCount x y, the x will uniquely determine the y.

I think what you want instead is something like:

arbitrary = do
    name <- elements $ map show (generateBounds Log10)
    bound <- arbitrary
    pure (HistogramBarCount name bound)

This will pick a random upper bound and name from the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

, "allStakes" .= allStakes ]

instance FromJSON UtxoStatistics where
parseJSON (Object v) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use Data.Aeson.withObject :: String -> (Object -> Parser a) -> Parser a rather than pattern matching directly on the constructors. The String label is used for error messages, which is really helpful when debugging a parse failure.

So, replace:

parseJSON (Object v) = ...
parseJSON _ = empty

-- with:
parseJSON = withObject "UtxoStatistics" $ ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored as suggested

instance FromJSON UtxoStatistics where
parseJSON (Object v) =
let histogramListM = case HMS.lookup "histogram" v of
Nothing -> empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty fails without an error message. Try to avoid doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored as suggested


instance FromJSON UtxoStatistics where
parseJSON (Object v) =
let histogramListM = case HMS.lookup "histogram" v of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aeson has an operator (.:) :: FromJSON a => Object -> Text -> Parser a that does the lookup and reports an error message if it isn't present. So this can be:

parseJSON = withObject "UtxoStatistics" $ \o -> do
    histo <- o .: "histogram" :: Parser (HashMap Text Word64)
    let constructHistogram = uncurry HistogramBarCount
    pure $ map constructHistogram $ HMS.toList  histo

Here we specify the type of the parser, but it's unnecessary, because GHC would figure out what it needs to return. By requesting a HashMap Text Word64, we will get the right error messages for invalid values in that type:

λ> eitherDecode "-123" :: Either String Word64
Left "Error in $: Word64 is either floating or will cause over or underflow: -123.0"
λ> eitherDecode "12345678901234567890" :: Either String Word64
Right 12345678901234567890
λ> eitherDecode "123456789012345678901234567890" :: Either String Word64
Left "Error in $: Word64 is either floating or will cause over or underflow: 1.2345678901234567890123456789e29"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored as suggested

Just (Object bars) -> do
let constructHistogram (key, Number val) =
case floatingOrInteger val of
Left (_ :: Double) -> error "UtxoStatistics FromJson not integer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Parser type, prefer fail to error as fail reports the error using Aeson's parsing machinery, while error bypasses all of that and blows up at runtime. It's a difference between decode :: FromJSON a => ByteString -> Maybe a returning a Nothing and blowing up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored as suggested

arbitrary = genTxIn
arbitrary = oneof
[ TxInUtxo <$> arbitrary <*> arbitrary
, TxInUnknown <$> choose (1, 255) <*> scale (min 150) arbitrary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is what you want to do. This will set the size parameter to QuickCheck generation to be the minimum of what it already is or 150. If you want to restrict the actual values, then you'd want suchThat:

arbitrary `suchThat` (<= 150)

Would generate valid values up to 150.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I trusted what Core people used in core/test/Test/Pos/Core/Arbitrary/Txp.hs . Which is of course not a proof that it is correct. I will contact Eric concerning this. Maybe it will give rise to a new ticket and I inform him to affect also this instance

@paweljakubas
Copy link
Contributor Author

paweljakubas commented Aug 18, 2018

Correction commit done.

I had previously exactly these linker errors in a number of checks:
#3432

histogramBinNumCond histo = (length $ HMS.keys histo) <= 0
validateKeys = any (\key -> notElem key $ map show (NL.toList $ generateBounds Log10) )
histogramKeysCond = validateKeys . HMS.keys
validateVals = any (< 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<= 0 ? Are we fine with zero values? Or are we interested to catch any non-positive value ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are fine with zero values as bar represent count

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused with the names at first (bounds vs values) from the constructor. As a matter of fact, we are totally fine with empty buckets 👍

histogramKeysCond = validateKeys . HMS.keys
validateVals = any (< 0)
histogramValsCond = validateVals . HMS.elems
allStakesCond = (< 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting way of doing it. For those, I usually leverage the Monad instance of Either and the when combinator. This way, we avoid the hassle and overhead of defining extra identifiers. I believe it reads a bit better:

validateUtxoStatistics  
    :: HashMap Text Word64
    -> Word64
    -> Either UtxoStatisticsError (HashMap Text Word64, Word64)
validateUtxoStatistics histogram allStakes = do
    let (keys, elems) = (HMS.keys histogram, HMS.elems histogram)

    let acceptedKeys = show <$> (toList $ generateBounds Log10)

    when (length keys <= 0) $ 
        Left ErrHistogramEmpty

    when (any (flip notElem acceptedKeys) keys) $
        Left ErrHistogramNamesInvalid

    when (any (< 0) elems) $
        Left ErrHistogramUpperBoundNegative

    when (allStakes < 0) $
        Left ErrAllStakesNegative

    Right (histogram, allStakes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

instance Arbitrary HistogramBar where
arbitrary = do
possiblenames <- elements $ map show (NL.toList $ generateBounds Log10)
bound <- arbitrary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't entirely arbitrary, it is suchThat (\n -> n <= upperBound)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry :s .. I probably induced you to use suchThat here. That's wasn't my best choice of words / example ...

data HistogramBar = HistogramBarCount
{ bucketName :: !Text
, bucketUpperBound :: !Word64
} deriving (Show, Eq, Ord, Generic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a confusion in the field names here 🙃 , took me a short while to figure it out from the rest of the code.
bucketUpperBound actually refers to the value / number of stakes whereas bucketName is the actual upper bound.

Hence the following question: what is the need for the bound (i.e. bucketName here) to be a Text? Why isn't this a Word64, especially when it is generated using Word64 values and converted to Text using show. If it's about using it as a key in the JSON representation, having a Word64 here shouldn't be much of a problem!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I made a change

module Cardano.Wallet.Types.UtxoStatistics
( computeUtxoStatistics
, UtxoStatistics (..)
, HistogramBar (..)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want to expose the internals to the outside world. I'd be in favor of opaque types with smart-constructors 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

instance ToJSON UtxoStatistics where
toJSON (UtxoStatistics bars allStakes) =
let histogramObject = Object . HMS.fromList . map extractBarKey
extractBarKey (HistogramBarCount bound stake) = bound .= stake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what lead to the conclusions about the actual meaning of bucketUpperBound and bucketName in the comments above.


data UtxoStatistics = UtxoStatistics
{ theHistogram :: ![HistogramBar]
, theAllStakes :: !Word64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't allStakes be derived from the list of histogram already? I mean, it's a nice-to-have in the API response, that's for sure, but I think here we better have a

allStake :: UtxoStatistics -> Word64

GHC-runtime is rather good at caching these kind of results anyway. And we only need it when serializing to JSON right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, we can't actually compute it from the list of bars indeed, because we only have the buckets there so it'll only be an approximation of what's in the bucket. Though, I believe this is preferable whatsoever to having a allStakes :: Word64 next to the histogram.

let's say:

approxTotalStakes :: UtxoStatistics -> Word64

The argument behind this is that, having allStakes in the data-structure means that we have to maintain and control some sort of invariant since there could be a discrepancy between the list of bars and the total number of stakes. That discrepancy can't realistically come from us but could come from external UtxoStatistics we could parse through JSON, which is also unlikely.

Maybe I am being too picky here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when computing histogram, in the same traverse I compute allStakes. I can add many more other statistics here in the future. And in one traverse it is going to be computed. Usually there is no easy (nonapproximative) way of going from one aggregation to other

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I am just afraid about parsing UtxoStatistics from an external source but I guess this will never happen in the end 👍

histogramKeysCond = validateKeys . HMS.keys
validateVals = any (< 0)
histogramValsCond = validateVals . HMS.elems
allStakesCond = (< 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting way of writing this ^.^
When dealing with Either like this, I usually go for the Monad approach and leverage the when combinator. It avoids all the hassle of defining extra identifiers (making sure variables don't conflicts etc..) and is, to my opinion, a bit more readable:

validateUtxoStatistics  
    :: HashMap Text Word64
    -> Word64
    -> Either UtxoStatisticsError (HashMap Text Word64, Word64)
validateUtxoStatistics histogram allStakes = do
    let (keys, elems) = (HMS.keys histogram, HMS.elems histogram)

    let acceptedKeys = show <$> (toList $ generateBounds Log10)

    when (length keys <= 0) $ 
        Left ErrHistogramEmpty

    when (any (flip notElem acceptedKeys) keys) $
        Left ErrHistogramNamesInvalid

    when (any (< 0) elems) $
        Left ErrHistogramUpperBoundNegative

    when (allStakes < 0) $
        Left ErrAllStakesNegative

    Right (histogram, allStakes)

Also, this function looks like a good candidate for a smart constructor and could return an actual UtxoStatistics rather than an HashMap and a Word64. You can find some inspiration here maybe:

https://github.com/input-output-hk/cardano-sl/blob/Squad1/CO-325/api-v1-improvements/wallet/src/Pos/Util/Mnemonic.hs#L126-L149

https://github.com/input-output-hk/cardano-sl/blob/Squad1/CO-325/api-v1-improvements/wallet/src/Pos/Util/Mnemonic.hs#L350-L363

(NOTE: We might want to expose this eitherToParse function somewhere else ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

addCountInBuckets thefirst acc entry =
case Map.lookupGE entry acc of
Just (k, v) -> Map.insert k (v+1) acc
Nothing -> Map.adjust (+1) thefirst acc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From foldl documentation, we find this:

Fold (x -> a -> x) x (x -> b) -- Fold step initial extract

When there's a clear structure like this, it's usually a good idea to have the implementation follows this structure as it helps understanding (same for bracket for instance).

What about something like:

foldBuckets :: NonEmpty Word64 -> L.Fold Word64 [HistogramBar]
foldBuckets bounds =
    let
        step x a =
            case Map.lookupGE a x of
                Just (k, v) -> Map.insert k (v+1) x
                Nothing     -> Map.adjust (+1) (head bounds) x

        initial =
            Map.fromList $ zip (NL.toList bounds) (repeat 0)

        extract =
            map (uncurry HistogramBar) . Map.toList
    in
        L.Fold step initial extract

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


instance Arbitrary TxOut where
arbitrary = genericArbitrary
shrink = genericShrink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does all this has to do with the PR 😮 ? Maybe an artifact from a rebase / merge conflict ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unfortunately needed to support Utxo


instance Arbitrary UtxoStatistics where
arbitrary = UtxoStatistics <$> arbitrary
<*> arbitrary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "invariant" I was talking about enters into play here. We could generate an arbitrary statistics here where we have 14000 in the last bucket, but a total take of 42 ADA which doesn't make sense.
Hence the difficulty of having such parameter as part of the structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some code that addresses this but it goes forever ;-/

= ErrHistogramEmpty
| ErrHistogramNamesInvalid
| ErrHistogramUpperBoundsNegative
| ErrAllStakesNegative
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the total stake is lower than the minimum amount that can be represented with all the buckets 😛 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

summarizeUtxoStatistics bounds =
UtxoStatistics
<$> populateBuckets bounds
<*> L.sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit the same as below, minor styling remark but we could avoid the extra summarizeUtxoStatistics and make the implementation a bit more straightforward:

computeUtxoStatistics :: [Word64] ->  UtxoStatistics
computeUtxoStatistics = L.fold $ UtxoStatistics
    <$> foldBuckets (generateBounds Log10)
    <*> L.sum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

histogram <- arbitrary
let (minPossibleValue, maxPossibleValue) = getPossibleBounds histogram
let histoBars = map (uncurry HistogramBarCount) $ Map.toList histogram
allStakes <- arbitrary `suchThat` (\s -> s >= minPossibleValue && s <= maxPossibleValue)
Copy link
Contributor

@KtorZ KtorZ Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the way Quickcheck works usually makes suchThat a poor choice in many case because the condition becomes harder and harder to satisfy as QuickCheck looks for values in a new range. A rule of the thumb is that it's good when you have to make sure some values get discarded, but when you want to pick values in a range, it's better to rely on choose:

do
    histogram <- arbitrary
    allStakes <- choose (getPossibleBounds histogram) 
    ...

possiblenames <- elements $ map show (NL.toList $ generateBounds Log10)
bound <- arbitrary
possiblenames <- elements (NL.toList $ generateBounds Log10)
bound <- arbitrary `suchThat` (>= 0)
Copy link
Contributor

@KtorZ KtorZ Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word64 are unsigned-integers 😛

@KtorZ
Copy link
Contributor

KtorZ commented Aug 20, 2018

NOTE:

We're going to need this (#3432) for the arguments list build issue
I'll rebase the base as soon as we have it.

KtorZ and others added 4 commits August 20, 2018 15:32
This was done using weeder and is for now necessary for the CI to kick in. We are starting to reach a
critical point in terms of dependencies and this kind of solution might not work anymore in the future.

There's a PR opened about doing something against these 'arguments list too long'. :finger_crossed:
- We do not expose type internals and provide one smart constructor for that purpose (computeUtxoStatistics)
- Same for the 'BoundType' which should actually be part of the API if we intend to use it, making it an opaque
  type with exposed constructors allow for easy extension and maintainability
- Reviewed a bit errors to make constructors a bit less specified in favor of constructor with arg
- Added 'BoundType' as 'boundType' to the JSON representation
- Made Aeson & Swagger imports explicit!
This is more semantically correct and type-safe than taking a raw list of 'Word64'. This way, we also get documentation
for free simply by looking at the function signature and also makes calls for callers simpler (provided they have a list
of available utxos, but why would they call the function if they hadn't? :) )
@KtorZ KtorZ force-pushed the origin/paweljakubas/CO-347/utxos-statistics branch from 2b5478b to 166665c Compare August 20, 2018 16:13

-- * Constructing 'BoundType'
, log10
) where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paweljakubas Only exposing opaque types here with smart-constructors.
Note that I have removed the few externals calls to mkUtxoStatistics for they were not really needed and could simply rely on computeUtxoStatistics. This function should be our one and single constructor for the outside world.

We now use mkUtxostatistics internally only (in the FromJSON instance).
Note also, I've removed the Haphazard constructor for the BoundType and here again, to keep the module easy to extend, we hide the implementation to the outside world, only having a "smart-constructor" log10 which does the trick.

= ErrEmptyHistogram
| ErrInvalidBounds !Text
| ErrInvalidTotalStakes !Text
deriving (Eq, Show, Read, Generic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paweljakubas I've factored out a couple of errors as they were semantically referring to the same thing, only with a small difference which can be captured in a message Text. It makes the error type a bit less bloated but still rather expressive.

object
[ "histogram" .= histogramObject bars
, "allStakes" .= allStakes
, "boundType" .= log10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need "boundType" in the JSON representation otherwise we simply force the bound type upon decoding a JSON object to Log10 and there's no point of keeping that flexible.

{-# INLINE log10 #-}

-- | Compute UtxoStatistics from a bunch of UTXOs
computeUtxoStatistics :: BoundType -> [Utxo] -> UtxoStatistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a change here we've suggested in the past and then forgot about. The API used to take [Word64] only, but that gives us poor type-guarantees and requires the caller to know about how to extract the correct value from an Utxo. Here, we take care of that burden ourselves and even get documentation for free!

--
-- INTERNALS
--

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything below is actually internal stuff and doesn't need to be exported 👍

calculatePossibleBound fromPair =
sum .
concatMap (\pair -> matching fromPair pair $ createBracketPairs acceptedKeys) $
Map.toList histogram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly haven't verified much of this as my brain isn't much capable of it right now :(

@parsonsmatt parsonsmatt merged commit e65cc2f into Squad1/CO-325/api-v1-improvements Aug 21, 2018
@KtorZ KtorZ deleted the origin/paweljakubas/CO-347/utxos-statistics branch August 24, 2018 12:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants