-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibly inacurate speedup measures #3
Comments
Thanks for your testing. Did you consider the file cache in the system? As below, you can see the first time search is very slow but following grep is really faster since the file is already cached by the system.
The similar situation can be seen for
My search base: git clone --recursive https://github.com/apache/incubator-mxnet |
Another test for big file. BTW, you can set thread num to 4 in the source code for your computer.
|
I repeated the same test here (enwiki-lastest-abstract.xml) running twice to take advantage of cache:
Could you be, perhaps, running an older version of grep? I'm running PS: I'm running with 8 threads in order to take maximum advantage of my CPU (i7-7700HQ). |
You're right. I am using a default one in the machine Let me update the grep and try again :) |
Exactly, the new version of @MatheusBernardino Do you have interests to improve the speed? If so, we can figure out where to start.
|
I like the subject, but unfortunately, I don't have the time now :( But even not reaching the original grep performance, your code is a good example of work distribution in parallel applications :) Until you figure it out how to improve speed, maybe you could just update the README with the updated grep test and make it cleaner that this version uses fixed string, not regex. What do you think? |
Thanks for the interests for the demo. Closing now and welcome to the contributions from the community. |
The standard grep uses regex as a pattern, while ParallelGrep uses a fixed string (substring search), which is much less computationally expensive than regex execution. So if grep's "-F" option wasn't used when measuring the time for the speedup calculations at README, the results may not be accurate. (Note: "-F" makes grep use fixed string instead of the default regex)
For example, running both versions in my i7-7700HQ (4 cores w/ hyper-threading) at the Linux Kernel src directory and searching for "int static" I'm getting this:
So I think you could more explicitly say, at README, that this is a simpler version of grep that works with fixed strings only. Or you could upgrade your version to use
<regex.h>
. But unfortunately, making this change it performed0m17.885s
, in the same previous test. Which means a slowdown against default grep :( Also, if the tests were indeed performed without grep's "-F" option, I think you should re-run them taking that into account and update the README information.Don't get me wrong, it's a nice work :) And it seems really hard to get speedup in a parallel grep version... (I tried it myself previously and failed) But I hope you get it in an upcoming version :)
The text was updated successfully, but these errors were encountered: