Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The memory utilization is increasing in Istanbul BFT #481

Closed
hagishun opened this issue Aug 14, 2018 · 14 comments
Closed

The memory utilization is increasing in Istanbul BFT #481

hagishun opened this issue Aug 14, 2018 · 14 comments

Comments

@hagishun
Copy link

System information

  • ENV1
    Version: 1.7.2-stable
    Git Commit: 4a77480
    Quorum Version: 2.0.2
    Architecture: amd64
    Network Id: 1
    Go Version: go1.7.3
    Operating System: linux
    GOPATH=
    GOROOT=/usr/local/go

  • ENV2
    Version: 1.7.2-stable
    Git Commit: fd0e3b9
    Quorum Version: 2.0.2
    Architecture: amd64
    Network Id: 1
    Go Version: go1.7.3
    Operating System: linux
    GOPATH=
    GOROOT=/usr/local/go

Expected behaviour

The memory utilization is periodically released.

Actual behaviour

Mamory utilization is increasing.
Only one node is periodically released.

Steps to reproduce the behaviour

Evnr

Backtrace

log.zip

[backtrace]
@fixanoid
Copy link
Contributor

@hagishun could you give me some details of the cluster: whats the configuration like and what hardware was used? Also, are you getting same issued when building quorum client from master?

@tharun-allu
Copy link

image
This shows the memory utilization on the nodes I am running. Looks to me there might be some memory leak.
The graph is 1 week utilization.

@fixanoid
Copy link
Contributor

@tharun-allu thanks for the metrics. Whats the load on the chain and how far did it advance blockwise?

@tharun-allu
Copy link

@fixanoid the above graphs are from a network of nodes with 16G memory each and block height is 4 million.

I have restarted geth process in a different network yesterday with 7 million blocks and attached is the memory graph of the nodes (there are 4 in the network
image

What I noticed is 2 of the nodes shot up their memory and one node died and other will soon die.

@namtruong
Copy link
Contributor

Hi @tharun-allu I've been trying to replicate this issue but unable to. Did you get this only after it reached 4 million blocks? From the graph attached it seems quite stable for sometime before going up - was there any incident observed in the log?

@tharun-allu
Copy link

image
@namtruong The 2nd graph was from my development network. Attached is the same nodes graphs for this week. Unfortunately I only implemented monitoring for blockheight last week and the block height for the second graph is 7.9 million now.

My suspicion is that the more transactions that go through the network the faster the growth is. I currently run 3 sets of networks and the rate of growth seems to correlate with how busy (transactions) the network is. If you want me to give you any additional data or collect any new metrics I can do that and post here.

Also to reduce confusion, I can only post data from only one environment. Let me know your thoughts

@namtruong
Copy link
Contributor

@tharun-allu thank you for the info.

I've put up a change for this - https://github.com/namtruong/quorum/tree/bugfix/istanbul-storeBacklog-memory-leak
Could you please test it on the branch and let me know if this has fixed the issue?

Many thanks!

@tharun-allu
Copy link

image
This is the latest pattern, I have not tested the fix yet. As I notice only one node jumping up and then sort of stable after. I am going to restart that node to see if other nodes behave differently. I will keep you posted on my observations.

hagishun added a commit to hagishun/quorum that referenced this issue Sep 6, 2018
hagishun added a commit to hagishun/quorum that referenced this issue Sep 6, 2018
@tharun-allu
Copy link

tharun-allu commented Sep 6, 2018

@namtruong I updated my dev network with the code from branch and I will keep you updated how it goes today.

# ./geth version
Geth
Version: 1.7.2-stable
Git Commit: 891c6c5e5c2a38c2a2982587bc12b282422929a4
Quorum Version: 2.1.0
Architecture: amd64
Go Version: go1.10.4
Operating System: linux
GOPATH=

@namtruong
Copy link
Contributor

@tharun-allu have you got any update?

@tharun-allu
Copy link

image

Looks like this has resolved this issue. I will confirm by downloading the 2.1.0 from upstream and seeing whether the issue come back. I upgraded from 2.0.1 to 2.1.0 from @namtruong branch.

@namtruong
Copy link
Contributor

@tharun-allu thanks for your update. Fyi, me and my colleague were also working on a different patch here https://github.com/jpmorganchase/quorum/compare/master...trung:f-istanbul-backlogs?expand=1 . We're in the process of testing the changes - but they are ultimately solving the same issue. Please feel free to test either of the solutions and let us know your feedback

@fixanoid
Copy link
Contributor

@tharun-allu the new pull that addresses the issue is here: #521

@tharun-allu
Copy link

image
The PR seems to have resolved this issue.

@jpmsam jpmsam closed this as completed in a34b725 Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants