Performance improvements to counter context.clone slowdown #1009

nelfin · 2021-06-03T10:34:20Z

Steps

For new features or bug fixes, add a ChangeLog entry describing what your PR does.
Write a good description on what the PR does.

Description

This adds a couple of changes to InferenceContext to eliminate copies of the inferred nodes cache and limit the total number of inferred nodes in a context and all of its clones (preventing deep inference chains).

Type of Changes

	Type
✓	✨ New feature
✓	🔨 Refactoring

Related Issue

Ref #1003

This change abuses mutable references to create a sort of interior mutable cell shared between a context and all of its clones. The idea is that when a node is inferred at the toplevel, it is called with context = None, creating a new InferenceContext and starting a count from zero. However, when a context is cloned we re-use the cell and cause the count in the "parent" context to be incremented when nodes are inferred in the "child" context.

hippo91

Thanks @nelfin for this smart MR!
It seems to be ok. I just have a question regarding memory consumption.
By the way i would really appreciate the review of @cdce8p.
I think it is also necessary to test this MR against pylint but maybe have you already done it.

hippo91 · 2021-06-05T09:13:14Z

astroid/context.py

+
+        e.g. the bound node of object.__new__(cls) is the object node
+        """
+        return _INFERENCE_CACHE


If i understand well, here we will store every inferred node in a global variable that will exists throughout the life of the program.
It will definitely allow cpu time sparing but what about memory consumption?

Is the number of inferred node limit sufficient for the speed up ? There are really big code bases using pylint and having a cache can affects them to the point it's not possible to run pylint on them at all.

If i understand well, here we will store every inferred node in a global variable that will exists throughout the life of the program.
It will definitely allow cpu time sparing but what about memory consumption?

Your understanding is correct. It could theoretically grow without bound, but I suppose many other parts of astroid (such as astroid.MANAGER.astroid_cache) can as well. I regularly see pylint/astroid runs in the hundreds of MiB, but without a benchmark, I'm not sure there's a meaningful way to answer this question.

I tried running pylint (commit 9361fa0630ce920081b57865be7882a39e9da1e3) on itself with and without these astroid changes.

With wip/performance (0d7d798):

26.77user 0.25system 0:27.08elapsed 99%CPU (0avgtext+0avgdata 231516maxresident)k 3312inputs+248outputs (3major+81935minor)pagefaults 0swaps

Without (d3fea8f):

45.15user 0.24system 0:45.43elapsed 99%CPU (0avgtext+0avgdata 235892maxresident)k 0inputs+336outputs (0major+83907minor)pagefaults 0swaps

So memory usage (max RSS) went down from 235MB -> 231MB

Is the number of inferred node limit sufficient for the speed up ? There are really big code bases using pylint and having a cache can affects them to the point it's not possible to run pylint on them at all.

They address different problems: the inferred node limit is to stop pathological cases, like really large files. The inferred node cache helps with re-use, like when scanning a package with a lot of files. The main problem I saw with the inference cache was that it was re-built from scratch for every node so the hit-rate was very low, and cloning it introduced a large number of small copies which was inefficient time-wise. In fact, it was so inefficient that just removing it altogether produces a small but noticeable (~2%) speed-up on astroid 2.5.6 (middle of 3 runs):

astroid-2.5.6 (36dda3f):

38.57user 0.26system 0:38.86elapsed 99%CPU (0avgtext+0avgdata 231020maxresident)k 0inputs+72outputs (0major+81740minor)pagefaults 0swaps

With no-cache.patch applied:

37.77user 0.25system 0:38.05elapsed 99%CPU (0avgtext+0avgdata 230788maxresident)k 0inputs+72outputs (0major+81551minor)pagefaults 0swaps

no-cache.patch:

diff --git a/astroid/context.py b/astroid/context.py index 18220ec..73f1f4d 100644 --- a/astroid/context.py +++ b/astroid/context.py @@ -27,11 +27,10 @@ class InferenceContext: "lookupname", "callcontext", "boundnode", - "inferred", "extra_context", ) - def __init__(self, path=None, inferred=None): + def __init__(self, path=None): self.path = path or set() """ :type: set(tuple(NodeNG, optional(str))) @@ -64,14 +63,6 @@ class InferenceContext: e.g. the bound node of object.__new__(cls) is the object node """ - self.inferred = inferred or {} - """ - :type: dict(seq, seq) - - Inferred node contexts to their mapped results - Currently the key is ``(node, lookupname, callcontext, boundnode)`` - and the value is tuple of the inferred results - """ self.extra_context = {} """ :type: dict(NodeNG, Context) @@ -102,7 +93,7 @@ class InferenceContext: starts with the same context but diverge as each side is inferred so the InferenceContext will need be cloned""" # XXX copy lookupname/callcontext ? - clone = InferenceContext(self.path, inferred=self.inferred) + clone = InferenceContext(self.path) clone.callcontext = self.callcontext clone.boundnode = self.boundnode clone.extra_context = self.extra_context diff --git a/astroid/node_classes.py b/astroid/node_classes.py index 7faf681..089b579 100644 --- a/astroid/node_classes.py +++ b/astroid/node_classes.py @@ -358,11 +358,6 @@ class NodeNG: yield from self._infer(context, **kwargs) return - key = (self, context.lookupname, context.callcontext, context.boundnode) - if key in context.inferred: - yield from context.inferred[key] - return - generator = self._infer(context, **kwargs) results = [] @@ -378,7 +373,6 @@ class NodeNG: # Cache generated results for subsequent inferences of the # same node using the same context - context.inferred[key] = tuple(results) return def _repr_name(self):

tests/unittest_brain.py

Pierre-Sassoulas

👍

astroid/context.py

cdce8p · 2021-06-05T12:25:22Z

I'll take a look at it later today

cdce8p

I can confirm the speed improvements. However this change causes some StopIteration exceptions when testing against HomeAssisant: https://github.com/cdce8p/ha-core/runs/2753897777?check_suite_focus=true#step:7:55

nelfin · 2021-06-07T09:47:45Z

@cdce8p, I was able to fix the StopIteration bubbling up through safe_infer by just catching it, but I haven't been able to come up with a good way to write a regression test for it. Re-running this on your ha-core fork, I got some new messages on homeassistant/components/mysensors/light.py:

************* Module homeassistant.components.mysensors.light
homeassistant/components/mysensors/light.py:157:23: E1120: No value for argument 'iG' in function call (no-value-for-parameter)
homeassistant/components/mysensors/light.py:157:23: E1120: No value for argument 'iB' in function call (no-value-for-parameter)

but these don't seem to be related. I've raised pylint-dev/pylint#4546 since I was able to replicate a false-positive with astroid 2.5.6.

cdce8p · 2021-06-07T11:13:16Z

@nelfin Thanks for fixing the issue! Indeed the no-value-for-parameter error seems to be unrelated.

Pierre-Sassoulas

I think this is mergeable, as the caching also got better. Do you agree @hippo91 ?

hippo91 · 2021-06-07T15:03:29Z

Yes it seems to be ok.

Pierre-Sassoulas · 2021-06-07T15:07:05Z

Thank you a lot for this, @nelfin , great performance improvement, and the information you gave were super detailed and informative 🔥

hippo91 · 2021-06-07T15:15:12Z

Yep thanks a lot @nelfin ! Great job!

cdce8p · 2021-06-07T15:26:20Z

I can only agree, thanks @nelfin 🚀

nelfin force-pushed the wip/performance branch from d1dc22d to 865ca65 Compare June 3, 2021 22:56

nelfin force-pushed the wip/performance branch from 865ca65 to 0d7d798 Compare June 3, 2021 23:14

nelfin changed the title ~~WIP: performance improvements to counter context.clone slowdown~~ Performance improvements to counter context.clone slowdown Jun 3, 2021

nelfin marked this pull request as ready for review June 3, 2021 23:18

hippo91 approved these changes Jun 5, 2021

View reviewed changes

hippo91 self-assigned this Jun 5, 2021

hippo91 added the Waiting on author label Jun 5, 2021

Pierre-Sassoulas approved these changes Jun 5, 2021

View reviewed changes

astroid/context.py Outdated Show resolved Hide resolved

cdce8p requested changes Jun 5, 2021

View reviewed changes

cdce8p added this to the 2.5.8 milestone Jun 6, 2021

nelfin added 2 commits June 7, 2021 15:12

Add global inference cache

86b924c

Update changelog

8efd924

nelfin force-pushed the wip/performance branch from 0d7d798 to 8efd924 Compare June 7, 2021 05:12

Update safe_infer to catch StopIteration

543d798

cdce8p approved these changes Jun 7, 2021

View reviewed changes

cdce8p added the pylint-tested PRs that don't cause major regressions with pylint label Jun 7, 2021

cdce8p added 2 commits June 7, 2021 15:03

Merge branch 'master' into wip/performance

11638f0

Merge branch 'master' into wip/performance

7e3fd68

Pierre-Sassoulas removed the Waiting on author label Jun 7, 2021

Pierre-Sassoulas approved these changes Jun 7, 2021

View reviewed changes

Pierre-Sassoulas merged commit cf6528c into pylint-dev:master Jun 7, 2021

cdce8p mentioned this pull request Jun 7, 2021

Performance regressions with astroid 2.5.7 pylint-dev/pylint#4524

Closed

jacobtylerwalls mentioned this pull request Jun 30, 2021

Unhandled StopIteration during node inference #1080

Closed

anders-kiaer mentioned this pull request Jun 30, 2021

False positives on pandas.io.parsers.TextFileReader pylint-dev/pylint#4577

Open

DanielNoord mentioned this pull request Mar 12, 2022

Fix typing in context.py #1471

Merged

1 task

jacobtylerwalls mentioned this pull request Jul 19, 2023

Improving inference caching (unnecessary context clones?) #529

Open

Uh oh!

Performance improvements to counter context.clone slowdown #1009

Performance improvements to counter context.clone slowdown #1009

Conversation

nelfin commented Jun 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps

Description

Type of Changes

Related Issue

Uh oh!

hippo91 left a comment

Choose a reason for hiding this comment

Uh oh!

hippo91 Jun 5, 2021

Choose a reason for hiding this comment

Uh oh!

Pierre-Sassoulas Jun 5, 2021

Choose a reason for hiding this comment

Uh oh!

nelfin Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cdce8p commented Jun 5, 2021

Uh oh!

cdce8p left a comment

Choose a reason for hiding this comment

Uh oh!

nelfin commented Jun 7, 2021

Uh oh!

cdce8p commented Jun 7, 2021

Uh oh!

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

Uh oh!

hippo91 commented Jun 7, 2021

Uh oh!

Pierre-Sassoulas commented Jun 7, 2021

Uh oh!

hippo91 commented Jun 7, 2021

Uh oh!

cdce8p commented Jun 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nelfin commented Jun 3, 2021 •

edited

Loading