Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance of Jedi with large libraries (cv2/PIL) imported #1195

Closed
hanspinckaers opened this issue Aug 6, 2018 · 13 comments
Closed

Comments

@hanspinckaers
Copy link

Hi all, David asked to put our email conversation on the issue tracker.


Hi David,

Thank you for your work on Jedi, it's amazing.

I wanted to be slightly faster because in some long functions in my code with large libraries it takes up to 5 seconds for Jedi to provide completions. So I dug into the code with a profiler and saw that I could trade in type-information of completions for a lot of speed. And I made a (very hacky) global boolean to keep track when I do not need to find extra context.

Patches: https://gist.github.com/HansPinckaers/663bb1461de2821d0bcce4ecef1f35c0

This speeds up Jedi and YCM from 5 seconds to 0.2 seconds on my XPS laptop in some cases. Since it's still very hacky I'm reading the source code now to try to understand it and see if I can fix this more elegantly. Maybe If you have time you could guide me in a direction? Also, I thought, maybe this provides you with ideas how to make a speed option for people with slow computers.

Thanks again,
Hans


David's response:

Hi Hans

I'm currently on vacation and also working on a bit of Open Source
stuff so I might not have time for a very long response :)

What is your library that is really slow? Also: How fast is your computer?

One thing I'm currently working on is typeshed integration. This will
make it possible to have stubfiles to raise performance significantly
for some things (especially the standard library). I also think that
it will improve performance by a lot for many other libs, because they
are using the standard library.

The thing with your hack is that it completely kills name resolution.
So you won't be able to follow names around and therefore making
completion effectively much worse for a lot of things. So the question
is really why is name resolution taking so long there (or since
everything's recursing, which lookup takes long) :)

@hanspinckaers
Copy link
Author

My computer is quite fast, it's a Dell XPS 15 9560 laptop with a i7.

This is an example of slower autocompletion:

import time
from jedi import Script
source = r"""
#!/usr/bin/env python

import PIL
import cv2

def __getitem__(self, data):
    data = cv2.cvtColor(data, cv2.COLOR_BGR2RGB)
    pil_im = PIL.Image.fromarray(data)

    if data.shape[0] > data.shape[1]:
        width, height = pil_im.size

    if height:
        width, height = pil_im.size

    pi
""".strip()
lines = source.strip().split('\n')

start = time.time()
ret = Script(source, len(lines), len(lines[-1]), None).completions()
print(time.time() - start, len(ret))

This takes about 2.7 seconds on my laptop.

I'm quite inexperienced with the inner workings of Jedi, so it doesn't suprise me that this very hacky return in find() causes some things to break. But so far i didn't see any problems with autocompletion (the hack is disabled when a dot is typed and I use a separate jedi install in jedi-vim for documentation/go-to etc).

@micbou
Copy link
Contributor

micbou commented Aug 6, 2018

@hanspinckaers Did you update to the latest version of YCM? A lot of performance improvements were included in PR ycm-core/ycmd#1056. Completion is almost instant for me when trying your example.

@micbou
Copy link
Contributor

micbou commented Aug 6, 2018

Nevermind. I was trying with a Python installation that didn't have opencv installed. With the right Python, it takes ~1.1s the first time then ~0.8s the subsequent times. Seems reasonable to me but it would be great if we could further improve this.

@davidhalter
Copy link
Owner

I'd suggest to wait for my typeshed merge and then we'll see what we can do. I hope that I can finally find a way to fix the issues especially with tensorflow (it's probably the worst performing library for Jedi).

@hanspinckaers
Copy link
Author

hanspinckaers commented Aug 7, 2018

Let me know if there is something I can do to help. Still reading and trying to understand all the code right now.

This is tensorflow autocomplete with my patches applied:
testjediycm

@Omegastick
Copy link

Omegastick commented Sep 11, 2018

I'm on version 0.12.1 of Jedi and the above script takes just over 10 seconds to run the first time, then 1.3 seconds each time after that. This is on a Google Compute Engine n1-highcpu-16. Were the above fixes implemented in 0.12.1?

@davidhalter
Copy link
Owner

No. I'll try to check performance in the next few days. tensorflow seems to perform even worse. I'm thinking about disabling Jedi for those libraries for now :/

@hanspinckaers
Copy link
Author

Maybe instead of disabling Jedi you could disable type-information (.description etc) of completions for big libraries. This already speeds up autocompletion by a lot.

@davidhalter
Copy link
Owner

That's actually a good idea.

@davidhalter
Copy link
Owner

Ok. I tried to debug the issues with cv2/PIL/tensorflow. It looks like they are all kind of connected.

PIL

I modified your script a bit and it's still very very slow with this simplified version.

import time
from jedi import Script
source = r"""
import PIL

pil_im = PIL.Image.fromarray()
pil_im.size[0]."""
lines = source.split('\n')

start = time.time()
ret = Script(source).completions()
second = time.time()
print(second - start, len(ret))
ret = Script(source).completions()
print(time.time() - second, len(ret))

When debugging this, I realized that this is mostly a problem of inferring some types. I don't think that we can improve this case by a lot at the moment.

Wait for database indexes: #1059.

cv2

This case got quite a bit lower since all the builtin modules are loaded in a subprocess. I'm not sure if there's a good fix for this, one thing that makes this quite a bit faster (about 3 times), is using Script(..., environment=jedi.api.InterpreterEnvironment()). However this comes with a few disadvantages, like potential segfaults etc. It's not a good idea, but it's faster. For a long-term solution, we should be using database indexes as well (#1059).

tensorflow

In this case I followed your idea of avoiding type inferring after the initial list of completions is created. This makes it a lot faster. I still think this is not the best long-term solution, but for now this works quite well and I think we can use it. I'll clean up my commit and push it here.

Consider this issue as "solved" for now, because everything depends on #1059.

@davidhalter
Copy link
Owner

Just as a final note. Tensorflow should in most cases (but not all) be way faster, but a not always be "correct" anymore. This especially affects import tensorflow; tensorflow., which really needed changing. It was sooo slow.

@hanspinckaers
Copy link
Author

Thanks for the work David!

I have a question regarding this line:

if dotted_name.startswith('tensorflow.'):

A lot of people use import tensorflow as tf, would this also trigger for tf.?

@davidhalter
Copy link
Owner

Yes. This is absolutely not related, because it's behind a lot of abstractions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants