Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ Release Date: TBA

* Add lineno and col_offset for ``Keyword`` nodes and Python 3.9+

* Add global inference cache to speed up inference of long statement blocks

* Add a limit to the total number of nodes inferred indirectly as a result
of inferring some node


What's New in astroid 2.5.7?
============================
Expand Down
60 changes: 48 additions & 12 deletions astroid/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,17 @@
"""Various context related utilities, including inference and call contexts."""
import contextlib
import pprint
from typing import Optional
from typing import TYPE_CHECKING, MutableMapping, Optional, Sequence, Tuple

if TYPE_CHECKING:
from astroid.node_classes import NodeNG


_INFERENCE_CACHE = {}


def _invalidate_cache():
_INFERENCE_CACHE.clear()


class InferenceContext:
Expand All @@ -28,11 +38,17 @@ class InferenceContext:
"lookupname",
"callcontext",
"boundnode",
"inferred",
"extra_context",
"_nodes_inferred",
)

def __init__(self, path=None, inferred=None):
max_inferred = 100

def __init__(self, path=None, nodes_inferred=None):
if nodes_inferred is None:
self._nodes_inferred = [0]
else:
self._nodes_inferred = nodes_inferred
self.path = path or set()
"""
:type: set(tuple(NodeNG, optional(str)))
Expand Down Expand Up @@ -65,14 +81,6 @@ def __init__(self, path=None, inferred=None):

e.g. the bound node of object.__new__(cls) is the object node
"""
self.inferred = inferred or {}
"""
:type: dict(seq, seq)

Inferred node contexts to their mapped results
Currently the key is ``(node, lookupname, callcontext, boundnode)``
and the value is tuple of the inferred results
"""
self.extra_context = {}
"""
:type: dict(NodeNG, Context)
Expand All @@ -81,6 +89,34 @@ def __init__(self, path=None, inferred=None):
for call arguments
"""

@property
def nodes_inferred(self):
"""
Number of nodes inferred in this context and all its clones/decendents

Wrap inner value in a mutable cell to allow for mutating a class
variable in the presence of __slots__
"""
return self._nodes_inferred[0]

@nodes_inferred.setter
def nodes_inferred(self, value):
self._nodes_inferred[0] = value

@property
def inferred(
self,
) -> MutableMapping[
Tuple["NodeNG", Optional[str], Optional[str], Optional[str]], Sequence["NodeNG"]
]:
"""
Inferred node contexts to their mapped results

Currently the key is ``(node, lookupname, callcontext, boundnode)``
and the value is tuple of the inferred results
"""
return _INFERENCE_CACHE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand well, here we will store every inferred node in a global variable that will exists throughout the life of the program.
It will definitely allow cpu time sparing but what about memory consumption?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the number of inferred node limit sufficient for the speed up ? There are really big code bases using pylint and having a cache can affects them to the point it's not possible to run pylint on them at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand well, here we will store every inferred node in a global variable that will exists throughout the life of the program.
It will definitely allow cpu time sparing but what about memory consumption?

Your understanding is correct. It could theoretically grow without bound, but I suppose many other parts of astroid (such as astroid.MANAGER.astroid_cache) can as well. I regularly see pylint/astroid runs in the hundreds of MiB, but without a benchmark, I'm not sure there's a meaningful way to answer this question.

I tried running pylint (commit 9361fa0630ce920081b57865be7882a39e9da1e3) on itself with and without these astroid changes.

With wip/performance (0d7d798):

26.77user 0.25system 0:27.08elapsed 99%CPU (0avgtext+0avgdata 231516maxresident)k
3312inputs+248outputs (3major+81935minor)pagefaults 0swaps

Without (d3fea8f):

45.15user 0.24system 0:45.43elapsed 99%CPU (0avgtext+0avgdata 235892maxresident)k
0inputs+336outputs (0major+83907minor)pagefaults 0swaps

So memory usage (max RSS) went down from 235MB -> 231MB

Is the number of inferred node limit sufficient for the speed up ? There are really big code bases using pylint and having a cache can affects them to the point it's not possible to run pylint on them at all.

They address different problems: the inferred node limit is to stop pathological cases, like really large files. The inferred node cache helps with re-use, like when scanning a package with a lot of files. The main problem I saw with the inference cache was that it was re-built from scratch for every node so the hit-rate was very low, and cloning it introduced a large number of small copies which was inefficient time-wise. In fact, it was so inefficient that just removing it altogether produces a small but noticeable (~2%) speed-up on astroid 2.5.6 (middle of 3 runs):

astroid-2.5.6 (36dda3f):

38.57user 0.26system 0:38.86elapsed 99%CPU (0avgtext+0avgdata 231020maxresident)k
0inputs+72outputs (0major+81740minor)pagefaults 0swaps

With no-cache.patch applied:

37.77user 0.25system 0:38.05elapsed 99%CPU (0avgtext+0avgdata 230788maxresident)k
0inputs+72outputs (0major+81551minor)pagefaults 0swaps

no-cache.patch:

diff --git a/astroid/context.py b/astroid/context.py
index 18220ec..73f1f4d 100644
--- a/astroid/context.py
+++ b/astroid/context.py
@@ -27,11 +27,10 @@ class InferenceContext:
         "lookupname",
         "callcontext",
         "boundnode",
-        "inferred",
         "extra_context",
     )
 
-    def __init__(self, path=None, inferred=None):
+    def __init__(self, path=None):
         self.path = path or set()
         """
         :type: set(tuple(NodeNG, optional(str)))
@@ -64,14 +63,6 @@ class InferenceContext:
 
         e.g. the bound node of object.__new__(cls) is the object node
         """
-        self.inferred = inferred or {}
-        """
-        :type: dict(seq, seq)
-
-        Inferred node contexts to their mapped results
-        Currently the key is ``(node, lookupname, callcontext, boundnode)``
-        and the value is tuple of the inferred results
-        """
         self.extra_context = {}
         """
         :type: dict(NodeNG, Context)
@@ -102,7 +93,7 @@ class InferenceContext:
         starts with the same context but diverge as each side is inferred
         so the InferenceContext will need be cloned"""
         # XXX copy lookupname/callcontext ?
-        clone = InferenceContext(self.path, inferred=self.inferred)
+        clone = InferenceContext(self.path)
         clone.callcontext = self.callcontext
         clone.boundnode = self.boundnode
         clone.extra_context = self.extra_context
diff --git a/astroid/node_classes.py b/astroid/node_classes.py
index 7faf681..089b579 100644
--- a/astroid/node_classes.py
+++ b/astroid/node_classes.py
@@ -358,11 +358,6 @@ class NodeNG:
             yield from self._infer(context, **kwargs)
             return
 
-        key = (self, context.lookupname, context.callcontext, context.boundnode)
-        if key in context.inferred:
-            yield from context.inferred[key]
-            return
-
         generator = self._infer(context, **kwargs)
         results = []
 
@@ -378,7 +373,6 @@ class NodeNG:
 
         # Cache generated results for subsequent inferences of the
         # same node using the same context
-        context.inferred[key] = tuple(results)
         return
 
     def _repr_name(self):


def push(self, node):
"""Push node into inference path

Expand All @@ -103,7 +139,7 @@ def clone(self):
starts with the same context but diverge as each side is inferred
so the InferenceContext will need be cloned"""
# XXX copy lookupname/callcontext ?
clone = InferenceContext(self.path.copy(), inferred=self.inferred.copy())
clone = InferenceContext(self.path.copy(), nodes_inferred=self._nodes_inferred)
clone.callcontext = self.callcontext
clone.boundnode = self.boundnode
clone.extra_context = self.extra_context
Expand Down
2 changes: 1 addition & 1 deletion astroid/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def safe_infer(node, context=None):
try:
inferit = node.infer(context=context)
value = next(inferit)
except exceptions.InferenceError:
except (exceptions.InferenceError, StopIteration):
return None
try:
next(inferit)
Expand Down
9 changes: 7 additions & 2 deletions astroid/node_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,12 +357,16 @@ def infer(self, context=None, **kwargs):
# explicit_inference is not bound, give it self explicitly
try:
# pylint: disable=not-callable
yield from self._explicit_inference(self, context, **kwargs)
results = tuple(self._explicit_inference(self, context, **kwargs))
if context is not None:
context.nodes_inferred += len(results)
yield from results
return
except exceptions.UseInferenceDefault:
pass

if not context:
# nodes_inferred?
yield from self._infer(context, **kwargs)
return

Expand All @@ -378,11 +382,12 @@ def infer(self, context=None, **kwargs):
# exponentially exploding possible results.
limit = MANAGER.max_inferable_values
for i, result in enumerate(generator):
if i >= limit:
if i >= limit or (context.nodes_inferred > context.max_inferred):
yield util.Uninferable
break
results.append(result)
yield result
context.nodes_inferred += 1

# Cache generated results for subsequent inferences of the
# same node using the same context
Expand Down
3 changes: 3 additions & 0 deletions astroid/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import collections
from functools import lru_cache

from astroid import context as contextmod


class TransformVisitor:
"""A visitor for handling transforms.
Expand Down Expand Up @@ -42,6 +44,7 @@ def _transform(self, node):
# if the transformation function returns something, it's
# expected to be a replacement for the node
if ret is not None:
contextmod._invalidate_cache()
node = ret
if ret.__class__ != cls:
# Can no longer apply the rest of the transforms.
Expand Down