Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

hanspinckaers · 2018-08-06T15:11:31Z

Hi all, David asked to put our email conversation on the issue tracker.

Hi David,

Thank you for your work on Jedi, it's amazing.

I wanted to be slightly faster because in some long functions in my code with large libraries it takes up to 5 seconds for Jedi to provide completions. So I dug into the code with a profiler and saw that I could trade in type-information of completions for a lot of speed. And I made a (very hacky) global boolean to keep track when I do not need to find extra context.

Patches: https://gist.github.com/HansPinckaers/663bb1461de2821d0bcce4ecef1f35c0

This speeds up Jedi and YCM from 5 seconds to 0.2 seconds on my XPS laptop in some cases. Since it's still very hacky I'm reading the source code now to try to understand it and see if I can fix this more elegantly. Maybe If you have time you could guide me in a direction? Also, I thought, maybe this provides you with ideas how to make a speed option for people with slow computers.

Thanks again,
Hans

David's response:

Hi Hans

I'm currently on vacation and also working on a bit of Open Source
stuff so I might not have time for a very long response :)

What is your library that is really slow? Also: How fast is your computer?

One thing I'm currently working on is typeshed integration. This will
make it possible to have stubfiles to raise performance significantly
for some things (especially the standard library). I also think that
it will improve performance by a lot for many other libs, because they
are using the standard library.

The thing with your hack is that it completely kills name resolution.
So you won't be able to follow names around and therefore making
completion effectively much worse for a lot of things. So the question
is really why is name resolution taking so long there (or since
everything's recursing, which lookup takes long) :)

hanspinckaers · 2018-08-06T15:16:38Z

My computer is quite fast, it's a Dell XPS 15 9560 laptop with a i7.

This is an example of slower autocompletion:

import time
from jedi import Script
source = r"""
#!/usr/bin/env python

import PIL
import cv2

def __getitem__(self, data):
    data = cv2.cvtColor(data, cv2.COLOR_BGR2RGB)
    pil_im = PIL.Image.fromarray(data)

    if data.shape[0] > data.shape[1]:
        width, height = pil_im.size

    if height:
        width, height = pil_im.size

    pi
""".strip()
lines = source.strip().split('\n')

start = time.time()
ret = Script(source, len(lines), len(lines[-1]), None).completions()
print(time.time() - start, len(ret))

This takes about 2.7 seconds on my laptop.

I'm quite inexperienced with the inner workings of Jedi, so it doesn't suprise me that this very hacky return in find() causes some things to break. But so far i didn't see any problems with autocompletion (the hack is disabled when a dot is typed and I use a separate jedi install in jedi-vim for documentation/go-to etc).

micbou · 2018-08-06T16:12:13Z

@hanspinckaers Did you update to the latest version of YCM? A lot of performance improvements were included in PR ycm-core/ycmd#1056. Completion is almost instant for me when trying your example.

micbou · 2018-08-06T16:32:05Z

Nevermind. I was trying with a Python installation that didn't have opencv installed. With the right Python, it takes ~1.1s the first time then ~0.8s the subsequent times. Seems reasonable to me but it would be great if we could further improve this.

davidhalter · 2018-08-06T21:29:38Z

I'd suggest to wait for my typeshed merge and then we'll see what we can do. I hope that I can finally find a way to fix the issues especially with tensorflow (it's probably the worst performing library for Jedi).

hanspinckaers · 2018-08-07T08:46:01Z

Let me know if there is something I can do to help. Still reading and trying to understand all the code right now.

This is tensorflow autocomplete with my patches applied:

Omegastick · 2018-09-11T07:06:22Z

I'm on version 0.12.1 of Jedi and the above script takes just over 10 seconds to run the first time, then 1.3 seconds each time after that. This is on a Google Compute Engine n1-highcpu-16. Were the above fixes implemented in 0.12.1?

davidhalter · 2018-09-15T15:00:03Z

No. I'll try to check performance in the next few days. tensorflow seems to perform even worse. I'm thinking about disabling Jedi for those libraries for now :/

hanspinckaers · 2018-09-16T07:18:56Z

Maybe instead of disabling Jedi you could disable type-information (.description etc) of completions for big libraries. This already speeds up autocompletion by a lot.

davidhalter · 2018-09-16T09:27:40Z

That's actually a good idea.

davidhalter · 2018-10-01T21:26:56Z

Ok. I tried to debug the issues with cv2/PIL/tensorflow. It looks like they are all kind of connected.

PIL

I modified your script a bit and it's still very very slow with this simplified version.

import time
from jedi import Script
source = r"""
import PIL

pil_im = PIL.Image.fromarray()
pil_im.size[0]."""
lines = source.split('\n')

start = time.time()
ret = Script(source).completions()
second = time.time()
print(second - start, len(ret))
ret = Script(source).completions()
print(time.time() - second, len(ret))

When debugging this, I realized that this is mostly a problem of inferring some types. I don't think that we can improve this case by a lot at the moment.

Wait for database indexes: #1059.

cv2

This case got quite a bit lower since all the builtin modules are loaded in a subprocess. I'm not sure if there's a good fix for this, one thing that makes this quite a bit faster (about 3 times), is using Script(..., environment=jedi.api.InterpreterEnvironment()). However this comes with a few disadvantages, like potential segfaults etc. It's not a good idea, but it's faster. For a long-term solution, we should be using database indexes as well (#1059).

tensorflow

In this case I followed your idea of avoiding type inferring after the initial list of completions is created. This makes it a lot faster. I still think this is not the best long-term solution, but for now this works quite well and I think we can use it. I'll clean up my commit and push it here.

Consider this issue as "solved" for now, because everything depends on #1059.

davidhalter · 2018-10-01T22:55:04Z

Just as a final note. Tensorflow should in most cases (but not all) be way faster, but a not always be "correct" anymore. This especially affects import tensorflow; tensorflow., which really needed changing. It was sooo slow.

hanspinckaers · 2018-10-02T09:59:26Z

Thanks for the work David!

I have a question regarding this line:

jedi/jedi/evaluate/compiled/__init__.py

Line 38 in c24eb4b

if dotted_name.startswith('tensorflow.'):

A lot of people use import tensorflow as tf, would this also trigger for tf.?

davidhalter · 2018-10-02T13:05:51Z

Yes. This is absolutely not related, because it's behind a lot of abstractions.

davidhalter closed this as completed in c24eb4b Oct 1, 2018

davidhalter mentioned this issue Oct 1, 2018

Accessing completion attributes is very slow (tensorflow) #1116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

hanspinckaers commented Aug 6, 2018

hanspinckaers commented Aug 6, 2018

micbou commented Aug 6, 2018

micbou commented Aug 6, 2018

davidhalter commented Aug 6, 2018

hanspinckaers commented Aug 7, 2018 •

edited

Loading

Omegastick commented Sep 11, 2018 •

edited

Loading

davidhalter commented Sep 15, 2018

hanspinckaers commented Sep 16, 2018

davidhalter commented Sep 16, 2018

davidhalter commented Oct 1, 2018

davidhalter commented Oct 1, 2018

hanspinckaers commented Oct 2, 2018

davidhalter commented Oct 2, 2018

Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

Comments

hanspinckaers commented Aug 6, 2018

David's response:

hanspinckaers commented Aug 6, 2018

micbou commented Aug 6, 2018

micbou commented Aug 6, 2018

davidhalter commented Aug 6, 2018

hanspinckaers commented Aug 7, 2018 • edited Loading

Omegastick commented Sep 11, 2018 • edited Loading

davidhalter commented Sep 15, 2018

hanspinckaers commented Sep 16, 2018

davidhalter commented Sep 16, 2018

davidhalter commented Oct 1, 2018

PIL

cv2

tensorflow

davidhalter commented Oct 1, 2018

hanspinckaers commented Oct 2, 2018

davidhalter commented Oct 2, 2018

hanspinckaers commented Aug 7, 2018 •

edited

Loading

Omegastick commented Sep 11, 2018 •

edited

Loading