Fix #51 wrong stream_buffer_byte_size calculation #52

DaWe35 · 2020-05-02T10:34:21Z

I solved the calculation, it's working, but now the script needs another dependency (pympler).

Test code:

import sys
from pympler import asizeof
import os

d = []

for i in range(1024):
    stri = os.urandom(1024)
    d.append(stri)

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size
    
sz = sys.getsizeof(d)
print(sz/1024/1024, 'MB', ' - sys.getsizeof(d)')

sz = get_size(d)
print(sz/1024/1024, 'MB', ' - get_size(d)')

sz = asizeof.asizeof(d)
print(sz/1024/1024, 'MB', ' - asizeof.asizeof(d)')

…et-api into stream_buffer_size

This reverts commit d8a9134. Revert "fix #49 (comment)" This reverts commit e5cb4aa. Revert "curses example" This reverts commit aae6bf5.

DaWe35 · 2020-05-02T10:35:29Z

I just messed up with branches, sorry for the lot of commits

oliver-zehentleitner · 2020-05-02T10:54:26Z

I just messed up with branches, sorry for the lot of commits

Absolutely no problem, I also commit very often to save my steps. No worries!

Thanks for your support!

DaWe35 · 2020-05-02T12:26:39Z

Test with 1GB:

Conslusion:

Of course, sys.getsizeof() is not good for us.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

The get_size() method will be a good estimation, but it's not completely accurate. Anyway, I think this would be best to use, I just don't know, where to place this code the codebase.
asizeof() is slower, but with 1GB, 0.01 sec is not that bad. Another counter-argument, it requires the pympler lib. Anyway, it gives the most accurate estimation.

Code:

import sys
from pympler import asizeof
import os
import time
d = []

for i in range(1000):
    stri = os.urandom(1024*1024)
    d.append(stri)

def get_size(obj, seen=None):
    """Recursively finds size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    # Important mark as seen *before* entering recursion to gracefully handle
    # self-referential objects
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum([get_size(i, seen) for i in obj])
    return size
    


start_time = time.time()

sz = sys.getsizeof(d)
print(sz/1024/1024, 'MB', ' - sys.getsizeof(d)')

print("--- %s seconds ---\n" % (time.time() - start_time))
start_time = time.time()

sz = get_size(d)
print(sz/1024/1024, 'MB', ' - get_size(d)')

print("--- %s seconds ---\n" % (time.time() - start_time))
start_time = time.time()

sz = asizeof.asizeof(d)
print(sz/1024/1024, 'MB', ' - asizeof.asizeof(d)')

print("--- %s seconds ---\n" % (time.time() - start_time))

oliver-zehentleitner · 2020-05-02T22:32:34Z

I suggest to place def get_size() in the manager class as a static method.

And then switch this to the new method you created.

Please add docstrings.

Nice work!

oliver-zehentleitner · 2020-05-02T22:34:21Z

It can NOT be static becouse of the recursive call!

DaWe35 · 2020-05-03T18:38:32Z

I think I found a better alternative.

Problem:
Every time when the buffer gets bigger, get_size() and even asizeof() will be slower, because they are O(n) complexity algorithms. I experienced that when the buffer is bigger than ~20MB, it cannot decrease because get_size() and asizeof() need more resources, and data processing will be slower and slower, and the data in the buffer will increase infinitely.

Solution:
I found a way to make get_stream_buffer_byte_size() O(1) complexity.
total_received_bytes / total_receives * stream_buffer_length
This calculation is just an estimation, but it works very well, there is an example:

oliver-zehentleitner · 2020-05-03T19:43:19Z

Good analysis and good idea to solve it!

oliver-zehentleitner · 2020-05-03T19:54:42Z

You asked somewhere "And what about this? Why lock needed?" - I got it via email but I can not find it on github.

The reason is, len(self.stream_buffer) has to walk through self.stream_buffer to count the amount of entries. If you stream everything from binance, then you have to estimate ~3000 receives per second. Every receive is added to the stream_buffer and is going to get removed. That are ~6000 operations on stream_buffer per second.

Till len() gets finished, the stream buffer changed a few times and this will cause a RuntimeError exception.

Even if you use two threads and both do i += 1 1000 times per second, you can estimate the result will be lower than 2000, because i can be 10 for example, both threads read the value 10 at the same time and add 1 to it, thread A writes back 11 and thread B will do the same.

DaWe35 · 2020-05-03T19:57:44Z

Maybe on telegram, or I removed the comment.
Yeah, I understand, thank. I did not expected that reading a list can modify it.

oliver-zehentleitner · 2020-05-03T20:04:46Z

it doesnt modify it, but i will say something "during itteration the object changed".

oliver-zehentleitner

Great! Thanks!

DaWe35 added 8 commits April 30, 2020 14:54

curses example

aae6bf5

fix #49 (comment)

e5cb4aa

small window fix

d8a9134

Merge https://github.com/oliver-zehentleitner/unicorn-binance-websock…

be46245

…et-api into stream_buffer_size

stream_buffer_size_bytes improvement, but not perfect

f068abb

get_stream_buffer_byte_size: use pumpler/asizeof

9ab5105

fix #51

ec49bb2

Revert "small window fix"

f8da49b

This reverts commit d8a9134. Revert "fix #49 (comment)" This reverts commit e5cb4aa. Revert "curses example" This reverts commit aae6bf5.

oliver-zehentleitner added this to the 1.13.0 milestone May 2, 2020

oliver-zehentleitner linked an issue May 2, 2020 that may be closed by this pull request

Wrong stream_buffer_byte_size calculation #51

Closed

12 tasks

DaWe35 added 3 commits May 3, 2020 13:56

2x perfonmance optimalization for get_eam_buffer_byte_size

fe2bd3d

fix syntax error

adfc5aa

O(1) complexity get_stream_buffer_byte_size() estimation

8b19c20

revers some changes

677d22e

oliver-zehentleitner approved these changes May 3, 2020

View reviewed changes

oliver-zehentleitner merged commit f96a861 into LUCIT-Systems-and-Development:master May 3, 2020

DaWe35 deleted the stream_buffer_size branch May 4, 2020 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #51 wrong stream_buffer_byte_size calculation #52

Fix #51 wrong stream_buffer_byte_size calculation #52

DaWe35 commented May 2, 2020

DaWe35 commented May 2, 2020

oliver-zehentleitner commented May 2, 2020

DaWe35 commented May 2, 2020 •

edited

Loading

oliver-zehentleitner commented May 2, 2020

oliver-zehentleitner commented May 2, 2020

DaWe35 commented May 3, 2020

oliver-zehentleitner commented May 3, 2020

oliver-zehentleitner commented May 3, 2020

DaWe35 commented May 3, 2020

oliver-zehentleitner commented May 3, 2020

oliver-zehentleitner left a comment

Fix #51 wrong stream_buffer_byte_size calculation #52

Fix #51 wrong stream_buffer_byte_size calculation #52

Conversation

DaWe35 commented May 2, 2020

DaWe35 commented May 2, 2020

oliver-zehentleitner commented May 2, 2020

DaWe35 commented May 2, 2020 • edited Loading

Test with 1GB:

Conslusion:

Code:

oliver-zehentleitner commented May 2, 2020

oliver-zehentleitner commented May 2, 2020

DaWe35 commented May 3, 2020

oliver-zehentleitner commented May 3, 2020

oliver-zehentleitner commented May 3, 2020

DaWe35 commented May 3, 2020

oliver-zehentleitner commented May 3, 2020

oliver-zehentleitner left a comment

Choose a reason for hiding this comment

DaWe35 commented May 2, 2020 •

edited

Loading