Skip to content

Fix #51 wrong stream_buffer_byte_size calculation #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 3, 2020
Merged

Fix #51 wrong stream_buffer_byte_size calculation #52

merged 12 commits into from
May 3, 2020

Conversation

DaWe35
Copy link
Contributor

@DaWe35 DaWe35 commented May 2, 2020

I solved the calculation, it's working, but now the script needs another dependency (pympler).

Test code:

import sys
from pympler import asizeof
import os

d = []

for i in range(1024):
    stri = os.urandom(1024)
    d.append(stri)

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size
    
sz = sys.getsizeof(d)
print(sz/1024/1024, 'MB', ' - sys.getsizeof(d)')

sz = get_size(d)
print(sz/1024/1024, 'MB', ' - get_size(d)')

sz = asizeof.asizeof(d)
print(sz/1024/1024, 'MB', ' - asizeof.asizeof(d)')

@DaWe35
Copy link
Contributor Author

DaWe35 commented May 2, 2020

I just messed up with branches, sorry for the lot of commits

@oliver-zehentleitner
Copy link
Member

I just messed up with branches, sorry for the lot of commits

Absolutely no problem, I also commit very often to save my steps. No worries!

Thanks for your support!

@DaWe35
Copy link
Contributor Author

DaWe35 commented May 2, 2020

Test with 1GB:

kép

Conslusion:

  • Of course, sys.getsizeof() is not good for us.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

  • The get_size() method will be a good estimation, but it's not completely accurate. Anyway, I think this would be best to use, I just don't know, where to place this code the codebase.

  • asizeof() is slower, but with 1GB, 0.01 sec is not that bad. Another counter-argument, it requires the pympler lib. Anyway, it gives the most accurate estimation.

Code:

import sys
from pympler import asizeof
import os
import time
d = []

for i in range(1000):
    stri = os.urandom(1024*1024)
    d.append(stri)

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size
    


start_time = time.time()

sz = sys.getsizeof(d)
print(sz/1024/1024, 'MB', ' - sys.getsizeof(d)')

print("--- %s seconds ---\n" % (time.time() - start_time))
start_time = time.time()

sz = get_size(d)
print(sz/1024/1024, 'MB', ' - get_size(d)')

print("--- %s seconds ---\n" % (time.time() - start_time))
start_time = time.time()

sz = asizeof.asizeof(d)
print(sz/1024/1024, 'MB', ' - asizeof.asizeof(d)')

print("--- %s seconds ---\n" % (time.time() - start_time))

@oliver-zehentleitner oliver-zehentleitner added this to the 1.13.0 milestone May 2, 2020
@oliver-zehentleitner oliver-zehentleitner linked an issue May 2, 2020 that may be closed by this pull request
12 tasks
@oliver-zehentleitner
Copy link
Member

I suggest to place def get_size() in the manager class as a static method.

And then switch this to the new method you created.

Please add docstrings.

Nice work!

@oliver-zehentleitner
Copy link
Member

It can NOT be static becouse of the recursive call!

@DaWe35
Copy link
Contributor Author

DaWe35 commented May 3, 2020

I think I found a better alternative.

Problem:
Every time when the buffer gets bigger, get_size() and even asizeof() will be slower, because they are O(n) complexity algorithms. I experienced that when the buffer is bigger than ~20MB, it cannot decrease because get_size() and asizeof() need more resources, and data processing will be slower and slower, and the data in the buffer will increase infinitely.

Solution:
I found a way to make get_stream_buffer_byte_size() O(1) complexity.
total_received_bytes / total_receives * stream_buffer_length
This calculation is just an estimation, but it works very well, there is an example:
Névtelen

@oliver-zehentleitner
Copy link
Member

Good analysis and good idea to solve it!

@oliver-zehentleitner
Copy link
Member

You asked somewhere "And what about this? Why lock needed?" - I got it via email but I can not find it on github.

The reason is, len(self.stream_buffer) has to walk through self.stream_buffer to count the amount of entries. If you stream everything from binance, then you have to estimate ~3000 receives per second. Every receive is added to the stream_buffer and is going to get removed. That are ~6000 operations on stream_buffer per second.

Till len() gets finished, the stream buffer changed a few times and this will cause a RuntimeError exception.

Even if you use two threads and both do i += 1 1000 times per second, you can estimate the result will be lower than 2000, because i can be 10 for example, both threads read the value 10 at the same time and add 1 to it, thread A writes back 11 and thread B will do the same.

@DaWe35
Copy link
Contributor Author

DaWe35 commented May 3, 2020

Maybe on telegram, or I removed the comment.
Yeah, I understand, thank. I did not expected that reading a list can modify it.

@oliver-zehentleitner
Copy link
Member

it doesnt modify it, but i will say something "during itteration the object changed".

Copy link
Member

@oliver-zehentleitner oliver-zehentleitner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks!

@oliver-zehentleitner oliver-zehentleitner merged commit f96a861 into LUCIT-Systems-and-Development:master May 3, 2020
@DaWe35 DaWe35 deleted the stream_buffer_size branch May 4, 2020 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong stream_buffer_byte_size calculation
2 participants