-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solution Execution Time Comparison #959
Comments
You can fork a solution(which creates a kumite) and compare the performance of arbitrary code/solutions. Here's an example in ruby(the output isn't displaying correctly, but you can fork and run the tests yourself.) |
@10XL, This would add a new dimension to the Codewars platform. |
@esoNais |
Of course, then the problem is that you need to run the code a lot of times to get good results, which are often over 12 seconds. (Statistics 101: To get a ±0.1% uncertainty, you need to run the code 10^6 times) Also, how is runtime defined? Best case? Average case? Worse case? Mean or Median or quartiles? And how do you define an "average case"? What does a "random input" even mean for some problems? |
@Voileexperiments |
I second this idea; I think that "scoring" could be implemented similarly to how the Zachtronics games are scored. In the Zachtronics games, generally there are 3 scores that you shoot for, those being execution time/cycles, SLOC, and "resources" (in the games there is typically a resource management system of some kind). Their programming games are: These games are great because the scoring system pushes you to compete against yourself, and I've had a few instances where the scores I've met for certain problems aren't even on the chart because so few people have optimized as heavily, which was exhilarating. |
Just want to second this. It's horrible to try to weigh your own solutions with other peoples solutions based only on how "beautiful" the code is. In most cases you need to optimize either on footprint, memory usage or speed. We need those statistics on the solutions! Especially when learning a new language you can't really have a gut feeling for the performance on different ways to solve a problem. BTW, Is there any way to check the full tests that run on the problems? |
Yes. You can click fork and view the tests, open a new translation and check the tests, or click "view test cases" on the discourse page. |
i found this page because i was searching for this feature.. @Olangu is exactly correct. sometimes clever code, a perl looking one liner might perform best, but may also perform worst. using built in library tools (such as .net linq) may hurt or improve performance as well. performance (speed) rating and memory usage rating should be auto calculated for each solution. and should be completely separate from users clicking like on "clever" or "best practice". many users may not even have a solid understanding of what "best practice" or "clever" would actually mean in real world scenarios. these 2 are relative to the audience, rather than to actual standards or veterans. having performance (speed) rating and memory usage rating would be a much more accurate way to gauge ones owns code and see others that ranked higher. actually giving users a tool to use for improvement in both coding and understanding. i created my own performance test on my local vs 10 other solutions. i was actually surprised on how well some did. i used a sample data set of 25 and ran each set in each iteration. i found that 500 or less usually gave same similar. 1000 to 10000 seemed to produce similar. anything over that, up to 500000 seemed to produce a third set of results. while the ones which performed great at over 1000, performed good below that - there were some that out performed by magnitudes but snailed over 1000. so i would think a median of a similar approach would be reliable. or even just a flat number like 100000 iterations for different sample data sets. nobody is expecting physics equations for performance accuracy. just some kind of benchmark for assessing a fair performance ranking. |
In 2016, this was Codewars's ( @jhoffner ) response to a similar request: #651 (comment) ( TL:DR; "The result are just too inconsistent to rely on and if its not reliable, there is no point to having it." ) I agree with @Philsop and @heribertolugo that performance results don't need to be very accurate to be useful as a way of sorting unrated solutions. |
@barakplasma the execution time is not just "inaccurate" or "inconsistent", it's completely unrepresentative of the real solution efficiency. A more or less accurate way to tell which one is faster is by checking how far they can go without timing out which would mean manually editing the test cases to run the function thousands of times, or to generate huge inputs (whichever approach makes sense for that particular case). Remaining fair when all the inputs for different solutions are different (due to randomness) is impossible, moreover sometimes the runtime will be jumping like crazy between runs because a solution can be optimized to efficiently handle one sort of data, but lose by a big margin for a different sort of data. Everybody wants to have "execution time measurement" but nobody knows what an impossible feat it is to implement it in a remotely sensible way. |
Hi, Below is a screenshot of what it looks like once you've submitted the solution. |
Challenges on Codewars are based on unit tests unlike most of the similar platforms taking inputs from stdin. Like most of the things, both approaches have strengths and weaknesses. For example, our approach is more natural, but their approach have more control over how the solution is executed.
I don't think so. As far as I know, their inputs are static (please correct me if I'm wrong).
When you submit, your solution is tested against multiple inputs in the same format (They probably have some code that parses the stdin and passes to the user's solution to check the output.) So on LeetCode, and most of the other similar platforms, testing the solution works like the following:
On Codewars, the following happens:
Even though LeetCode have significantly tighter control on how solution is executed, you'll see different execution time when you submit the same solution multiple times. And it can be different enough that it sometimes tells you that your solution is faster than 99% and sometimes 65% with the same solution. On LeetCode, this is not a huge deal because it's not too obvious since you can't directly compare with other users (they have a "sample"). But imagine how you'd feel if you saw that someone else at 99% while you're at 65% with the same approach. Also, I don't know how they deal with changes in their environment that affect performance (e.g., language version, VM performance, changes in inputs, etc.), but all of the past submissions needs to be reevaluated when that happens. The only way to minimize the divergence is to do proper benchmarking and this can be extremely expensive. I don't think LeetCode does it and I don't think any other platform does it for their free services. Showing the total test time of the solution at the time of the submission is not difficult, but it's not really useful number and it can be extremely misleading. So we'll need at least the following:
|
One thing I don't understand in this frequently recurring topic is why lots of people involved have clear obsession over performance (especially runtime) while displaying a uniform, indiscriminating disgust of "one-liner, ''''''''clever'''''''' solutions". Ranking everything blindly is easy. Making it meaningful and informative is really, really hard. So far I only see all these people making suggestions of how to do it (as if it's implying "look, making a chart is easy, WHY AREN'T YOU LAZY BUMS DOING IT?"), but not why to do it, or whether it should be done:
So until the stances of the above-mentioned people are resolved, I propose we ignore whatever they says for the sole reason that they have clear bias and are not aimed at making things helpful compared to something like a "detected time complexity" feature. TBH it's just a nuisance at this point. |
Just to add some background information: When a user pays for Codewars Red, the user receives a head-to-head comparison feature. The following image is a screenshot from my view of a head-to-head comparison, since I pay for Codewars Red. Look below the solution where it compares runtime for this kata. So this information is already collected, just not visible to everyone. |
I was not aware of the randomness of the tests, that certainly makes it more complex. @Voileexperiments Regarding the "why", I'll explain my experience. I used LeetCode, and other such websites, to train for technical interviews. And there, performance is VERY VERY important, in most algorithms questions at least. Now you are right that I don't really care about the exact number of milliseconds, or if I'm faster than some other guy, but time complexity yes! I just assumed measuring time is easier than detecting the time complexity. Maybe this assumption was wrong. I'd be really happy if I could get complexity feedbacks compared to other solutions, even better than exact timing. I don't think ignoring is the right approach, understanding the users' persona and requirements is the goal, ask your Product Management ;) |
@nasht00 Yes, I know all of this comes from leetcode, and no, it's not commonplace at all, nor is it representative to its actual usage. If you ask people in typical OJ platforms, or some OI/ACM participants, they'll tell you its main usage is to compare very large time differences (e.g 8 seconds vs sub-second), not the tiny ones due to micro-optimizations.
Not when the same dead horse has been beaten 1000 times in the past with all kinds of loaded requests of "please do this so one liner, ''''''''clever'''''''' solution would stop appearing among the solutions". It's extremely obnoxious and unhelpful. |
I can see another potential problem with this feature: User1 submits their solution, and gets execution time measured as 100 ms. User2 submits their identical solution and execution time gets measured as 150 ms. They wonder what's going on, resubmit, and now they get 200 ms. Now I can see legions of What I would like, tho, is to somehow enforce particular performance constraint on kata, even the easy ones. Make "Sum of numbers 1...n" fail with non-O(1), or "Sum of multiplies of 3 or 5" fail with O(n). However the problem is that this makes (would make) a lot of people complain about easy kata being too difficult (see "Is a number prime?") or it would require creating another version of kata, and effectively creating almost-a-duplicate. Tiered kata would be a nice solution to this problem, I think. |
Yes, it's recorded at the time of the submission, but like I wrote above, it's not useful number. I wouldn't have included it in head to head compare either, but I wasn't involved back then.
There are some kata that have performance requirements. Not all of the kata needs it.
It is very easy to show some number (I bet many won't notice even if I just did
Yeah, tiered/sequential kata can be nice to have. At Qualified, we have something like that. A candidate is required to solve a challenge and the subsequent challenge adds more requirements using the previous solution as a starting point. Another issue with having easy and hard version is spoilers. Sequential kata will prevent this too, by requiring one to solve the easy one first to proceed and not allowing to see the solutions until all of the kata in the sequence is completed. |
Fair enough about exact timing. Again, I don't care about "please do this so one liner, ''''''''clever'''''''' solution would stop appearing among the solutions" - not my case here. Neither is "why am I ranked better than this guy" - I don't care either. I only want to categorize the solutions, to know if my solution meets the expectations or if I'm very far off. |
That would indeed be useful, I'm terrible at detecting that :) But that's a different issue, right? If you do implement this, however, you can't do it for all the languages, tho, right? Just maybe JS and Python? |
If you passed the tests in time, then your solution met the expectation. Try more difficult kata that requires performance.
I'm just saying that's probably what I'd do if we're going to do something to classify solutions by performance.
Why do you think it's only JS/Python? Anyways, it's not on our roadmap at the moment and it's not going to happen anytime soon. |
It depends on what you use for detecting time complexity, if you find some library, do you really think it's going to support 40 languages? |
No? I don't know why we need a single library that supports 40 languages. That's like saying we can't test different languages because they all have different test frameworks. We don't need to add it to every language for every kata at once. Like I wrote above, I don't know exactly how to do it yet. But each language can have its own ways as long as it can collect information and report back. It might be similar to how we handle the tests, but we get some data back instead of the test result. Again, I don't know about the details yet, but I don't think we need one library that supports many languages. |
I think finding and classifying on the big O for an answer would be a better persuit than actual execution times because of the variability. This could involve scaling the inputs at different points and doing a best-fit curve. It doesn't necessarily need to rank off that (but could), but showing the big O under the answer would be informative. |
I'd also like to see this feature! It is interesting how in leetcode sometimes you can exceed the time limit if your algorithm is not perfomant enough. I also agree that the way to go is to classify answers by big O and not time of execution. Would be great! |
The time limit in CW is the very same thing... |
Oh, sorry, I didn't know it also existed, didn't come across that yet |
Maybe this is a good use case for a chatGPT API or other LLM implementation. I tried it using the API playground with I asked:
The answer was:
|
How many languages on codewars do we have? 43? If we can make these as a separate service, and someone test all problems with all working solutions to produce the real big o notation regarding space & time used to execute, then that's a plan, I don't how bad is it on the current arch, but doing this as a separate service is doable? What do you think? |
OK after reading the entire thread, and going through every nitty gritty, I will share why I need this feature, for sometime in the thread I thought this is a sales strategy to subscribe to codewars red, yet the main problem remains!! |
Can we add a new tab in the kata setup if enabled, then we can calculate the space time complexity of the code, and output to some variants in the user dashboard? So we can work through our bank of kata's one by one? |
I suggest add a small statistic indication of the solutions' avg execute speed in the "Solutions" menu.
And when you click the "Compare with your solution" button, it will compare your solution and the other solutions execute speeds. I feel this will make room for improvement in people ideas of optimization.
┆Issue is synchronized with this Clickup by Unito
The text was updated successfully, but these errors were encountered: