-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark representing async-heavy, heterogeneous workload #239
Comments
maybe can you propose a simple yet representative code for such a benchmark ? |
I'm not sure how to go about it, as taxing a tracing JIT would require a lot of code (I think PyPy gives up after a 6000 opcode trace). Some kind of code generation may need to be employed. I'm open to ideas. |
Thanks for the suggestion! A benchmark like this would be valuable. It's mostly a matter of someone with enough knowledge taking the time to do it. It's relatively straight-forward to write a benchmark, (even outside the pyperformance repo), and pyperformance provides plenty of examples. Also, it would be helpful to have both a workload-oriented benchmark and a feature-oriented one (macro vs. micro). There's more info about this on the faster-cpython wiki. |
It may be useful to write the benchmark against the anyio API, so that it could be run against both asyncio and trio. |
I'm thinking about what metrics of my app would most represent the workload. Then create a synthetic benchmark with similar metrics. assuming a single threaded program, not hitting CPU limits, perhaps:
|
here's how my app looks by those metrics:
"unique context switch rate" substantiates that the workload is heterogeneous. Each second, it touches about 80 different coroutine functions ( "step unique lines" substantiates that each coroutine step involves a significant amount of code to trace (avg and median are ~160 lines) From the context switch rate and avg step duration, it's possible to derive approximate CPU load: 119e-6 * 4267 = 51% (This is running on CPython 3.8. The durations are about 25% shorter on Pyston 3.8.) |
What's missing from the metrics, and especially relevant to JIT implementations, is some notion of the number of opcodes run per step. It should probably exclude repeated loop iterations-- so something like "step opcode line count". --> I didn't go as far as counting opcodes, but I've revised previous posts to include unique line count per step |
context: Recently I set about trying to find a faster Python interpreter for my async app, which has a high rate of context switching, many coroutines, and each coroutine running different code. I found that the app runs 2x slower on PyPy, and 20% faster on Pyston-full. One reason may be that a tracing JIT will naively trace across context switches, such that a given trace will never be repeated, due to arbitrary occurrence and ordering of coroutine resumes.
There really seems to be nothing in the benchmark world that represents this kind of async workload-- which, by the way, I expect to become more popular with time. The current async benchmarks in pyperformance have nothing like this-- the coroutines are trivial and homogenous.
It's concerning that the Pyston full fork is being retired, while PyPy blindly continues as if everything's OK, and faster-python proceeds at a furious pace-- all without evaluating a workload that is significant today, and may become more so in the future.
The text was updated successfully, but these errors were encountered: