-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize CLI startup time #696
Conversation
Optimizes startup time for CLI usage by avoiding import of connexion
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #696 +/- ##
==========================================
+ Coverage 99.63% 99.66% +0.03%
==========================================
Files 89 89
Lines 6222 6257 +35
==========================================
+ Hits 6199 6236 +37
+ Misses 23 21 -2
☔ View full report in Codecov by Sentry. |
The rdflib import could be turned to lazy, but the code is not super clean and the time saving is only ~0.05 s. However I would still add it. For the command completions even such a tiny improvement in the response time is noticeable I think. |
To fix conflict
Seems that upgrading to soon-to-be-released Connexion 3 requires some changes to the way application is created, see #689 (comment). Hopefully the changes can be incorporated to this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks OK, although making more and more local imports makes the code messier IMHO. But it seems worth it in this case, as the startup speed improves noticeably (I also tested this on my laptop and the speedup was around 0.2 seconds). Maybe someday we could switch to real PEP 690 lazy imports, but those are scheduled for Python 3.12, which should be released in a few days, but it will take years until we can drop support for 3.11.
I see that Codecov complains about cli.py line 31 not being covered by tests, which I found puzzling, since you added a test that runs annif run --help
which should execute this line. But then I added some debugging code around that line and found out that the test doesn't actually exercise that line at all. I think that's because when using CliRunner (e.g. runner.invoke(annif.cli.cli, ["run", "--help"])
) that doesn't change sys.argv
at all, so it also won't execute line 31 of cli.py. Is there anything we could do here to make sure that the test also exercises this path?
The line could definitely be run in a test such as def test_run_separate_process():
result = os.popen("python annif/cli.py run --help").read()
assert "Run a local development server." in result But I don't know how to ensure this this really starts Flask with Connexion, and not just Flask... And codecov does not take this kind of test into account in coverage counting (at least locally run). |
The How about using some other information than just For testing Lines 7 to 10 in 3451bd3
The unit test that exercises this code overrides the value using monkeypatching: Lines 887 to 900 in 3451bd3
Something similar could perhaps be done here - define a constant in cli.py which uses sys.argv and then use it in the USE_CONNEXION = len(sys.argv) > 1 and sys.argv[1] == "run"
if USE_CONNEXION:
create_app = annif.create_app # Use Flask with Connexion
else:
# Connexion is not needed for most CLI commands, use plain Flask
create_app = annif.create_flask_app Then you can monkeypatch USE_CONNEXION in the unit test to force it to be |
...except now that I think of it, I'm not sure if that will work, because monkeypatching is probably applied too late for this to have an effect. Anyway, something similar could work here. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Actually using the And I made also I think this can be merged. When upgrading to Connexion 3 (#702) this functionality can or may need to be changed again. |
Optimizes CLI startup time by avoiding unnecessary imports for basic commands, mainly to improve the CLI command completion response times in #693.
Changes:
Imports of joblib and numpy are simply moved inside the functions/methods where they are used.
Import of connexion is avoided for all CLI commands but
annif run
andannif routes
by using separate creation scripts for plain-Flask and Connexion-Flask applications. When started viagunicorn "annif:create_app()
the Connexion-Flask app is created as previously.This is a continuation of the optimizations in #514, #544 and #563.
I checked the import time of the pre-release of connexion 3, and it was even slower (~0.35 s) than of the current v2.14.* (0.12 s), so avoiding connexion import is even more valuable in the future.
Improvement compared to current main
The improvement seems to be some 0.3 seconds (~0.1 s coming from each of joblib, numpy, and connexion).
Before (main):
After (PR):
The CLI command completion response times seemed to follow times of
annif --help
, so they are improved similarly.Timing imports for completion responses does not show any easy targets for further optimization. The rdflib is used via multiple modules and in nearly all methods of
corpus.subject.SubjectFileSKOS
, and I don't know how rdflib import could be done only when needed in there in a nice way.Tuna output for the timing of
list-*
completion response time: