Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TomP2P Performance issues over time #164

Open
iggydv opened this issue May 3, 2021 · 4 comments
Open

TomP2P Performance issues over time #164

iggydv opened this issue May 3, 2021 · 4 comments

Comments

@iggydv
Copy link
Contributor

iggydv commented May 3, 2021

I've been running some minor benchmarks on tomP2P, and I'm in need of some reference benchmarking. I'm curious whether my results are similar to the original benchmarks of the system - could you please provide me with some relevant material to help me?

I've used TomP2P in a distributed storage network, built for low latency and high reliability, with tomP2P as a third layer of storage, or rather a persistence layer, and I've seem to run into some performance issues which become apparent over time.

My test:

  • Nodes on different hosts [40]
    • Node 1 x16 peers
    • Node 2 x9 peers
    • Node 3 x15 peers
  • RPS: 10
  • Duration: 1800 sec
  • PUT commands only (on 8 selected storage nodes only)

from my artillery test against a rest endpoint:

All virtual users finished
Summary report @ 21:54:17(+0200) 2021-05-03
  Scenarios launched:  18000
  Scenarios completed: 13342
  Requests completed:  13342
  Mean response/sec: 9.95
  Response time (msec):
    min: 2
    max: 9995
    median: 8
    p95: 5789
    p99: 9162.1
  Scenario counts:
    peer-2: 2286 (12.7%)
    peer-1: 2159 (11.994%)
    peer-12: 2306 (12.811%)
    peer-13: 2217 (12.317%)
    peer-18: 2249 (12.494%)
    peer-4: 2328 (12.933%)
    peer-17: 2174 (12.078%)
    peer-3: 2281 (12.672%)
  Codes:
    200: 1381
    400: 14
    408: 11947
  Errors:
    ETIMEDOUT: 4200
    ECONNRESET: 458

As you can see I have a lot of timeouts and failing connections. Also Seeing quite a bit of timeout warnings as expected

WARN  TimeoutFactory - Channel timeout for channel Sender [id: 0xbd747647, L:/0:0:0:0:0:0:0:0:57213].
WARN  TimeoutFactory - Request status is msgid=-1862733718,t=REQUEST_1,c=PING,tcp,s=paddr[0x975eb768918c948a5de0dc3cc419b424bd131363[/192.168.0.150,5491]]/relay(false)/slow(false),r=paddr[0x99830ba9278cafaa6a52bda03c1755733463c0de[/<my-ip, port>]]/relay(false)/slow(false)

I'm using the latest stable version of TomP2P

I'd really appreciate any help/advice
Implementation reference - https://gitlab.com/iggydv12/nomad/-/blob/master/src/main/java/org/nomad/storage/overlay/TomP2POverlayStorage.java

Put example logs
overlay-put-test.log

@iggydv
Copy link
Contributor Author

iggydv commented May 4, 2021

@tbocek any advice? :)

@iggydv
Copy link
Contributor Author

iggydv commented May 6, 2021

After some digging I found that one of the nodes (OS X node) was causing the performance issues and causing many of the messages to time out. My current setup is definitely flawed for OS X, for some reason, or It's a bug. I'm not able to say at this point.

One thing I've noticed is that as the ring grows, the reliability successful requests goes down, it seems like for some reason the data becomes distributed (re-distributed) and is no longer available to other nodes?

For bootstrapping I don't always use the same node - I'm not sure if that is an issue?

Version 14.0 (14.0)

@iggydv
Copy link
Contributor Author

iggydv commented May 21, 2021

I'm still having some issues with reliability of the network, puts succeed 100% of the time, but my read success is more around the 50% mark, and it gets worse as the network scales. I guess this is expected, but even after using Direct replication - indirect replication does not work on the latest Beta version :(

Is there any way that you could help me out to try and make my reads more reliable @tbocek

@tbocek
Copy link
Member

tbocek commented May 24, 2021

Hi @iggydv, thanks for the detailed report. My advice is to enable as much logging as possible and go through the logs to see where it got stuck. Maybe upgrading the Netty library could help.

One problem I never could really solve is when using TCP and short lived connections. Netty does a pretty good job in client/server communication, but when it comes to fast paced short lived TCP connection (even shutting down before the connection was established), then I faced issues.

Thus, we are currently looking into a UDP-based protocol to make the connections more suitable for a P2P setting. The current repo is here: https://gitlab.com/p2p-library-in-golang/code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants