Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limits on locals() logging do not prevent runaway traversal #329

Open
nicku33 opened this issue Feb 18, 2020 · 5 comments
Open

Limits on locals() logging do not prevent runaway traversal #329

nicku33 opened this issue Feb 18, 2020 · 5 comments
Assignees

Comments

@nicku33
Copy link

nicku33 commented Feb 18, 2020

Currently the values defined in DEFAULT_LOCALS_SIZES
don't prevent traversal of large collections. The Shortener only applies after the whole object graph has been copied, as far as I can see. In our case, data transformation, it is not uncommon to have massive lists and dictionaries with 100k+ objects.

In traverse.py

return list_handler(list(traverse(elem, key=key + (i,), **kw) for i, elem in enumerate(obj)), key=key)

traverses and creates a copy of the collection. This can itself be very time consuming as well as memory intensive. Neither things one wants in a logging library. Memory use in particular should be practically bounded.

In addition, there is no limit to the amount of recursion except for circular reference checks. So for example a highly imbalanced tree implemented in dict() that has degraded to a linked list will be traversed.

I'd like to apply the iteration limits defined in configuration to the generators above. Maybe making a def limited_enumerate(iterable, stop_limit) to plug into enumerate.

As well I'd like to add a recursion count that respects an optional limit.

Thoughts ? Let me know if I've erred as this is all based on inspection. I haven't written a test.

@nicku33 nicku33 changed the title Limits do not prevent runaway traversal Limits on locals() logging do not prevent runaway traversal Feb 18, 2020
@waltjones
Copy link
Contributor

waltjones commented May 6, 2020

If I understand correctly, the idea is to apply the limits from the locals config dictionary during capture of the locals objects rather than later.

I'd like to add a recursion count that respects an optional limit

I thought maxLevel might be intended for this, but I don't see yet where it is applied. Maybe a maxDepth option is what is needed here?

(Looking at the options defined here:

DEFAULT_LOCALS_SIZES = {
'maxlevel': 5,
'maxdict': 10,
'maxlist': 10,
'maxtuple': 10,
'maxset': 10,
'maxfrozenset': 10,
'maxdeque': 10,
'maxarray': 10,
'maxstring': 100,
'maxlong': 40,
'maxother': 100,
}
)

@nicku33
Copy link
Author

nicku33 commented May 7, 2020

during capture of the locals objects rather than later

Yes. The limits seem currently used to censor objects going out to logs but the list(traverse... does an unbounded materialization of the generator output before that, if I read the code correctly. This resulted in hangs of 10 min for me with large data objects. I'd prefer a logging utility like pyrollbar have bounded resource usage, since it can be invoked in situations where a system is already stressed.

I also couldn't see where maxlevel was applied either, though I have to refamiliarize myself with the issue. I think that must have been the intention. maxdepth does seem like a more intuitive name for it.

So questions for you:
a) Do the other rollbar clients have any naming standards for these limit configs we should adhere to ?
b) If not, use maxlevel if not currently used or change to maxdepth ?
c) Do you generally agree with the suggestions ?

@wrgoldstein
Copy link

Any update on this?

@waltjones
Copy link
Contributor

Thanks for the nudge.

@nicku33 There are not other naming standards to follow. I prefer maxdepth, assuming maxlevel isn't already being used for this. (When I reviewed I didn't see it being used for anything.) Overall I agree with the motivation and the plan.

@danielmorell danielmorell self-assigned this Jun 21, 2024
@danielmorell
Copy link
Collaborator

We have been talking about this internally and it is being moved up in priority.

Since the last activity on this issue, we have released v1.1.0-alpha with proper support for maxlevel, as well as moved the shortener to be the first transform applied to the collection of data.

Still to be done is to limit the collection of data instead of trying to shorten it after collecting it all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants