Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHT performance degrades with more values in the ring #128

Open
sgoendoer opened this issue Feb 4, 2016 · 4 comments
Open

DHT performance degrades with more values in the ring #128

sgoendoer opened this issue Feb 4, 2016 · 4 comments

Comments

@sgoendoer
Copy link

Hi,

I stumbled upon an issue with the ring when it's being flooded with values. I noticed, when there are more values in the ring, the reaction time increases dramatically.

Setup: I have 3 virtual nodes running. Not very performant ones, but oh well... They are running Debian Wheezy. TomP2P runs in version 4.4 (via Maven), a Jetty receives REST requests with to be written to the ring.

Now I wrote a small shell script that pushes a key-value pair to one of the nodes in a loop, i.e. I am writing as much and as fast as I can to the ring. The script then measures and logs the time needed for such a request to complete.

Results: At first, the requests times are ok. Something like an average of 1.0 seconds, a median of 0.99, and a maximum of 1.3. Interestingly, there is an "recurring outlier": Approx. every 50th request takes significantly more time to complete (like 1.5 seconds in the beginning).

Observing this for a few thousand requests, the average and median request times remain close to 1.0 to 1.2 seconds, while the request duration of this "recurrent outlier" increases linear! After as little as 5k requests we are talking about a duration of 3.8 seconds already!

Apparently, with an increasing number of values being written to the ring, the performance changes for the worse. Big time! After approx 33k requests, the outliers take as much as (up to) 75 seconds (!!!!!) to complete, while the median duration remains close to what it was in the beginning: The median is still at 1.01 seconds (!!!!!), while the average increased to 1.8 seconds (mainly due to the outlier I guess).

Is this a known issue?

Graph: graph

raw data: requesttimes.txt

@sgoendoer
Copy link
Author

Running the same setup now for 2 days straight. Max values go up as far as 190 seconds!

@tbocek
Copy link
Member

tbocek commented Feb 6, 2016

Thanks for the report. Can you try the latest 5.0 release? Its still beta, but more stable than 4.4. Thanks.

@sgoendoer
Copy link
Author

You mean beta8? We are currently working on including it. I will post results as soon as we have some.

An update on the issue: After approx. 50k datasets, the DHT was "full", so we stopped the test. "Full", meaning requests took like 20 minutes (!!!) regardless of whether we tried to read or write. I figure that was mainly due to RAM limitations, as our nodes all feature just 1GB of memory. We logged all data we pushed to the ring and reached approx. 1 GB of data being logged around 50k datasets. So this might be the explanation for this. Anyhow, looking into the log files, the following line showed up a lot:

2016-02-08 12:09:53 INFO  Scheduler:99 - slow down, we have a huge backlog!

In the meantime: Here is some data from the test i ran. I calculated average, minimum, maximum, and median request times for each 1000 requests:

results

The missing max-value was 227.5485981 and I deleted it to make the chart readable. After 50k requests, data got MUCH worse...

@ChronosXYZ
Copy link

Have you used disk-based storage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants