Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterate_stream uses deprecated write_object instead of write_entity #33

Closed
stchris opened this issue Feb 27, 2024 · 1 comment
Closed

Comments

@stchris
Copy link
Contributor

stchris commented Feb 27, 2024

iterate_stream uses the (deprecated) write_object:

def iterate_stream(dataset, file, entity_id=None):

instead of the newer write_entity. The latter uses orjson and might show a significant speed boost, so some before/after benchmarking would be useful here as well.

@stchris
Copy link
Contributor Author

stchris commented Mar 5, 2024

I ran the following benchmark:

from io import StringIO, BytesIO

from followthemoney import model
from followthemoney.cli.util import write_object, write_entity

import pyperf


ENTITY = {
    "id": "test",
    "schema": "Person",
    "properties": {
        "name": ["Ralph Tester"],
        "birthDate": ["1972-05-01"],
        "idNumber": ["9177171", "8e839023"],
        "topics": ["role.spy"],
    },
}


def bench_write_object(obj):
    write_object(StringIO(), obj)


def bench_write_entity(obj):
    write_entity(BytesIO(), obj)


runner = pyperf.Runner()
obj = model.get_proxy(ENTITY)
runner.bench_func("write_object", bench_write_object, obj)
runner.bench_func("write_entity", bench_write_entity, obj)

and it yielded a significant improvement:

write_object: Mean +- std dev: 2.76 us +- 0.02 us
write_entity: Mean +- std dev: 924 ns +- 8 ns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant