Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_history_by_date #405

Closed
wants to merge 1 commit into from
Closed

Conversation

oxarbitrage
Copy link
Member

@oxarbitrage oxarbitrage commented Sep 24, 2017

Background:

Some background is needed to understand better how do we reach here.

The code on this pull is a result of working on a filter by date account history api function for issue #358

The first attempt was to add the date into the account transaction history object(atho). This is discussed and implemented here #379
, problem is the solution increase ram usage significantly so i moved to a new option suggested by @pc(check last comments of above pull). It was very hard and tricky to make a global member function inside the account transaction history index that can pull the date of the block at the moment of the history is added and store it but only in the index, this reduces ram but decrease performance in replay. This is a solution that actually works but don't convince me fully.

Problem is that to make the time index we need to call fetch block every time new history is added in the entire system. To reduce computation we found out we could remove the block fetching on the global function and just make the index by_block instead of by_time. related code can be found at https://github.com/oxarbitrage/bitshares-core/tree/develop_358

The challenge there is that api calls will input time and index is by block. There is no way to simple relate the blocks and the time as the production chain was stuck at moments, blocks are missing at any time, etc. In a theoretical chain producing blocks every 3 seconds since genesis this will be easy but it is not our case.

As we need precision, @pc suggested me to make a binary search inside the by_block index to get the closest blocks to the date arguments of the api call.
After a successful attempt of introducing a binary search algo adapted for our needs i realized the by_block index actually haves the same order as the already there by_seq index. Making the binary search in the by_seq index makes sense now as the most effective way to do it.

The solution

Proposed code if accepted is a generic way to solve issues like #61 , #243 and some others where different data is already in the blockchain but hard and inefficient to relate in an api call. Solution does not need to add any new index, data to object or anything like that, everything is just done in the api.cpp and api.hpp. This was explored before but performance sucks as a traversal search inside the account will have to be made. The key is the binary search.
Can probably be made with less lines but please note built in binary searches were not possible for our case so i had to make my own. Still, i could be missing special cases or other issues. There could be a faster algo than the binary search for sequenced indexes however i was not able to find any that can suit.

Binary search loop is made in by_seq index to find start date inside the account. Then a normal loop is made with this start until the end date.

Inside the loop lower_bound is used, this is also a binary search in the index however the introduced function get_account_history_by_date seems to be fast and precise in all the cases tested.

In a synchronized chain, in and account with 850 operations like http://bitshares-explorer.io/#/accounts/xeroc the function loop will need 9-10 iterations to actually find any date inside. I don't have resources to test in a full node and with accounts with millions of ops but definitely this will be a lot faster than linear searching or anything else.

alfredo@alfredo-Inspiron-5559 ~/CLionProjects/issue358_5 $ curl --data '{"jsonrpc": "2.0", "params": ["history", "get_account_history_by_date", ["1.2.282", "2015-12-26T00:00:06", "2015-12-26T23:00:00", 100]], "method": "call", "id": 10}' http://localhost:8090/rpc --silent | jq
{
  "id": 10,
  "jsonrpc": "2.0",
  "result": [
    {
      "id": "1.11.732691",
      "op": [
        0,
        {
          "fee": {
            "amount": 3273437,
            "asset_id": "1.3.0"
          },
          "from": "1.2.97259",
          "to": "1.2.282",
          "amount": {
            "amount": 3,
            "asset_id": "1.3.569"
          },
          "memo": {
            "from": "BTS5UJBnRKcaysHN8iXCDUr5zkjN8U393EVbK5PfjgU8P3sjSYmXv",
            "to": "BTS5TPTziKkLexhVKsQKtSpo4bAv5RnB8oXcG4sMHEwCcTf3r7dqE",
            "nonce": "10810090194978835734",
            "message": "3a03b41636f4f8741a15ef501f3f9d0fe5a956f0d70fbd25d7dc8d8a73fa8629ee063370263534e202fe113f8c64b2e92f012e953316a386b14299613c8e33bd"
          },
          "extensions": []
        }
      ],
      "result": [
        0,
        {}
      ],
      "block_num": 2101074,
      "trx_in_block": 5,
      "op_in_trx": 0,
      "virtual_op": 43295
    }
  ]
}
alfredo@alfredo-Inspiron-5559 ~/CLionProjects/issue358_5 $ 

notes:

  • it does not deal with more than 100 ops in the same block but i dont think we should here but actually rethink the limits.
  • filter by op can be done client side once he haves all the ops in interval.
  • filter by asset was discarded from this call as it is too specific, only works against a few ops.

@pmconrad
Copy link
Contributor

I think a good way to deal with the 100 ops in a block problem would be to continue adding ops to the result until the last op of the current block is reached, never mind the limit.

@oxarbitrage
Copy link
Member Author

@abitmore what do you think on this approach to make this and other similar calls ?

@oxarbitrage
Copy link
Member Author

I was talking with @abitmore in the telegram about this yesterday and we are unsure on what will be the best. there are actually 3 options by now to work around this (get history by date) and other similar issues that require similar actions.

1- Store date and other fields into the object even if it is redundant. This is easy to do and works good but as we have ram problems already, an increase in memory was initially discarded, it was the first option presented. Another problem is that as we need more relations to resolve more issues more data will need to be added to object increasing ram even more.
2- Have global member function that can add needed data to the index. Date was added and worked, the cons of this option is that speed of replay decreases, it need more computing power to process the block fetching in the global function and index a date. The pro is no additional ram is needed.
3- The last presented option is a binary search with the current by_seq index. Pro is that we don't need to add anything new and everything is done in the api call itself. Cons is that it may take too much time to look into the history for an account with millions of operations.

In order to test option 3 in a big sized account , @xeroc provided me a fat server to run a full node. While synchronizing i was able to made some tests with an account that already had 1.5 million ops and i was surprised the binary search was able to find a date in 20 iterations against 10 iterations needed with my previous testing with an account with just 850 ops. Getting the date of an op in 20 iterations in an account with 1.5 million ops is very good.

However this is not confirmed, while i was synchronizing the server ran out of ram, this is because there was another full node running in the server and of course it not able to handle both of them. I made that mistake and now can't access to it so i sent xeroc a msg to reboot it or something. Very sorry about that, i know you have testnet and other stuff running there. it is actually the same server i use to get full account history in the explorer, that stopped working by now also.

Anyways, i hope we can restore it soon and complete that binary search test.

Talking with @xeroc today we were discussing the implementation of this in a plugin, this way we could use any of the 3 options( i am still more in the side of number 3 by now) without interfering with the current bitshares implementations.

I think it's a good option, i may go after it. if i can confirm the binary search is fast enough in huge accounts the plugin will most probably just expose the api call when loaded.

Any comment/suggestion is always appreciated.

@pmconrad
Copy link
Contributor

Another question is - should we invest more time into this, or close this one with a quick but partial solution and instead focus on the long-term solution (i. e. offloading the history stuff into an external database)?

@oxarbitrage
Copy link
Member Author

i was making some tests with a full node with the code of this pull request.

for an account with 5,5 million of operations(http://bitshares-explorer.io/#/accounts/1.2.36449) the api call can get the start date in 23 iterations, here is a log output i have added to the node on how it finds the right seq:

989549ms th_a       api.cpp:424                   get_account_history_ ] m 2745389
989549ms th_a       api.cpp:424                   get_account_history_ ] m 4118084
989550ms th_a       api.cpp:424                   get_account_history_ ] m 4804432
989550ms th_a       api.cpp:424                   get_account_history_ ] m 4461258
989550ms th_a       api.cpp:424                   get_account_history_ ] m 4632845
989551ms th_a       api.cpp:424                   get_account_history_ ] m 4547051
989551ms th_a       api.cpp:424                   get_account_history_ ] m 4589948
989551ms th_a       api.cpp:424                   get_account_history_ ] m 4568499
989552ms th_a       api.cpp:424                   get_account_history_ ] m 4557775
989552ms th_a       api.cpp:424                   get_account_history_ ] m 4563137
989552ms th_a       api.cpp:424                   get_account_history_ ] m 4560456
989553ms th_a       api.cpp:424                   get_account_history_ ] m 4559115
989553ms th_a       api.cpp:424                   get_account_history_ ] m 4558445
989553ms th_a       api.cpp:424                   get_account_history_ ] m 4558110
989554ms th_a       api.cpp:424                   get_account_history_ ] m 4558277
989554ms th_a       api.cpp:424                   get_account_history_ ] m 4558193
989554ms th_a       api.cpp:424                   get_account_history_ ] m 4558151
989554ms th_a       api.cpp:424                   get_account_history_ ] m 4558130
989555ms th_a       api.cpp:424                   get_account_history_ ] m 4558120
989555ms th_a       api.cpp:424                   get_account_history_ ] m 4558115
989556ms th_a       api.cpp:424                   get_account_history_ ] m 4558112
989556ms th_a       api.cpp:424                   get_account_history_ ] m 4558113
989556ms th_a       api.cpp:424                   get_account_history_ ] m 4558114

the call used for this output was:

url --data '{"jsonrpc": "2.0", "params": ["history", "get_account_history_by_date", ["1.2.36449", "2017-09-01T00:00:00", "2017-09-01T23:00:00", 1]], "method": "call", "id": 10}' http://88.99.145.10:9999/rpc  | jq

so, 9-10 itrs in a 1k account and 23 in a 5.5M account looks pretty good for me. however for extra protection i can move the call into a plugin.

@oxarbitrage
Copy link
Member Author

Guys, what do you think if instead of a plugin we use a new api explorer_api disabled by default with a few calls needed for explorer and other purposes like get_history_by_date(#358) , get_transaction_id , get_transaction_from_id(#373), get_history_by_block(#61), etc.

Let me know what do you think as this should be simpler to do than a plugin(lot less code) and provide this binary search calls only to the witness node that want them until elastic search integration that is in process now. I think it could be a fast partial solution.

@pmconrad
Copy link
Contributor

pmconrad commented Oct 9, 2017

The problem with adding API calls is that we have no way to track which calls are being used and which aren't. So we have no choice but to support them all, forever. Therefore I'd prefer a solution that gives us a choice to abandon support for it, the idea being that anyone who is interested in keeping the API will have to maintain the code themselves.

One way to go might be to create an "explorer" branch that contains the API, and that is maintained mostly by merging master after each release. Better ideas are welcome, of course.

@oxarbitrage
Copy link
Member Author

The idea of a separated branch is good and can be an immediate solution however i am making some progress with elasticsearch database and in order to solve all of this issues we just need to add data to the database at replay and continue doing it in real time(just as the current account_history_plugin do but saving to the database instead of the object) then the clients can pull from the database directly instead of to the node. This is the "easiest" part of what i have planned to remove trading history from RAM and i should be able to have a prototype of this first step fast. lets hold this for this week. thanks @pmconrad

@abitmore
Copy link
Member

Having multiple branches with different features, means if a service provider need features from 2 or more branches, she have to merge code by herself, who may or may not be capable for doing so. As time goes by many of the branches may be stale, or have conflicts with master, and we still don't know which branch is being used, there will be additional/wasted code maintenance work if we need to merge to every branch after every release.

@oxarbitrage
Copy link
Member Author

good point there @abitmore , having more branches is not a good idea, its a nightmare for maintenance. as you know i am open to pretty much any option but on this i am with you now. 3 branches is more than enough to maintain master, develop and testnet and that should be it.

@pmconrad
Copy link
Contributor

But that's my point really. :-)
If we keep everything in the core we also keep the responsibility, forcing ourselves to maintain everything, forever.

Those branches would enable us to concentrate on the core. Anyone who is interested in the branch functionality will have to maintain it themselves, or pay a dev to do it. The big advantage here is that functionality that is no longer in use by anyone will automatically die out, without consuming our resources forever.

@oxarbitrage
Copy link
Member Author

That's an interesting point of view thinking on scalability and future @pmconrad . The problem at present time is that if we don't do it nobody does and business are lost. Potential clients that want to build on top of bitshares will just focus on building the frontend, they will not do it if they know they have to also code the apis themselves in the core, maintain a branch and all that.

I think it is our responsibility(or at least mine) to offer solutions to this entities that want to do business with us and the ones that are already doing it and want/need to improve. The api is our face, our access point and the best way to get data from the blockchain.

This will not mean that we have to say amen to every request that came from anybody but i think we should be flexible in the subject. Please note that get_history_by_date for example is requested from bitshares-ui, we can't just tell them maintain your own branch.

Anyways, if everything goes as i plan the elasticsearch plugin will fill this gap in all the cases so we will have to maintain only 1 plugin and the api will not grow anymore. Whatever is the case it is good for our project to discuss this things indeed :)

@pmconrad
Copy link
Contributor

Two more suggestions:

  • We use versioned APIs. That gives us the choice to deprecate and abandon old versions.
  • We support the set of APIs that are being used by "official" clients (at this time, that would be the cli_wallet, the bitshares-ui and your explorer). Anything on top of that will have to be encapsulated in an "unsupported" API that can be changes/abandoned at any time.

In either case, encapsulating APIs is extremely important. Can plugins add APIs to the application without the application knowing about it? (In the sense, does libaries/application need a dependency to libraries/plugins/whatever?) If not, we should add that possibility, that would provide sufficient encapsulation, IMO.

@xeroc
Copy link
Member

xeroc commented Oct 13, 2017

I tend to agree with @pmconrad, here.

Things might change over time when the development team behind bitshares-core grows bigger, but until we are there, we ensure that we can maintain what is already there.

That's what I like to much about plugins. All we still need is the steem way of integrating plugins into the build process. They use some templating-meta magic to do that.

@oxarbitrage
Copy link
Member Author

closing this one as the get_history_by_date will be part of the global solution of storing the account history data we are working on. when i get a first version of that i will reference to this.

@oxarbitrage oxarbitrage deleted the issue358_7 branch December 19, 2018 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants