-
Notifications
You must be signed in to change notification settings - Fork 653
Conversation
Thank you for contributing this pull request! Here are a few pointers to make sure your submission will be considered for inclusion. The following commiters were not found in the CLA:
You can fix all these things without opening another issue. Please see CONTRIBUTING.md for more information |
This solution is kind of obvious. But in the past we deliberately didn't go for the QueryPerformanceCounter solution here because it turned out to have a significant performance impact. IIRC this effect was much more apparent when multiple node processes were running at the same time. That said, I tested mostly on what we would now consider "old laptop" - a HP laptop with a 2.8ghz dualcore Intel T9600 processor running windows 7. Maybe it's time for a re-evaluation. An alternative would be to use timeGetTime (v8 does this too), and optionally use timeBeginPeriod to set the clock resolution to 1ms. |
Here are some numbers I obtained on Windows 8.1 x64, with a Phenom II 2.5 GHz processor. |
Thanks for the feedback. QueryPerformanceCounter's latency is indeed dependent on the OS version and the specific system hardware. My latest commit enables the high-precision solution only when a reasonably fast implementation is detected. |
I don't really enjoy this, this solution is timing dependent and people may suffer from expecting higher resolution and getting lower if at process startup system was very busy. @orangemocha is it possible to statically detect this? On which configurations is it faster? |
Thank you for your feedback. Unfortunately there is no reliable static way of determining the type of Qpc implementation. In summary, different OS versions use different strategies and newer ones adapt to the hardware available, even using performance measurements. I don’t think it’s a good idea for us to try to guess that logic, and there is no API to do it. Foreseeably, newer systems will have improved Qpc implementations and the need for the GetTickCount approach will go away over time. So IMHO the performance measurement is the most reliable way available to take advantage of fast high-precision timer implementations when available. I agree it’s not the prettiest thing, but I think it’s either this or sticking to GetTickCount and a 15.6ms resolution. Which approach do you like better? (This is a non-rhetorical question). What are the known scenarios for which we/people care about timer precision in node? A different question: is the impact of even a slow Qpc implementation (think in the order of 1 microsecond) so terrible? We call uv_update_time once per iteration of the event loop, and during timer manipulations (e.g listOnTimeout). I could be wrong but this doesn’t seem like a huge deal. Regarding you comment about busy startup time, I could try to improve the reliability of the initial measurement (e.g. by briefly raising the thread priority). |
Ok, this makes sense. What do you think about adding configuration option for it? Preferably as an API method. If this sounds good to you - I guess we could just remove this hacks and just use QPC, if not configured otherwise. |
I ran the http benchmarks on 2 different machines. The results can be summarized as follows:
The increased average throughput is largely due to the benchmarks in http\chunked.js. For reasons unclear to me, the increased precision yields dramatic improvements in these tests (up to +300%). Based on these results, and assuming the http benchmarks are a good indicator, there doesn’t seem to be a significant performance degradation using QueryPerformanceCounter. I updated the PR with the commit that uses QueryPerformanceCounter unconditionally. |
For the record, I gathered these measurements using the http benchmarks, with this command: |
Minor change to keep last_tick_count for ABI compatibility, in case we ever want to roll this back. |
What is the status of this? Do we land it, discard it? |
There hasn't been much discussion about this AFAIK. My vote would be for landing it. v0.12.0 would be a good time to get some real world testing and get feedback on any negative performance impact. If that was the case I'd be on call for fixing it for the next minor release. What's the decision process? If we decide to land it, I'll update the commit to resolve the merge conflict. |
What commit? |
Well, the decision process is that we trust each other on the respective areas of expertise :-) I can read the code, but clearly you and @piscisaureus know a lot more about it. If you guys think we should land it I'm happy to do so. If anyone else knows about this and has some feedback to share, now would be a good time :-)
The one on this PR: https://github.com/MSOpenTech/libuv/commit/fed2906fed784cbced7de8ab9908be60af70fb2c |
Updated to solve merge conflicts. @piscisaureus are you ok with this? |
@@ -316,7 +316,8 @@ RB_HEAD(uv_timer_tree_s, uv_timer_s); | |||
HANDLE iocp; \ | |||
/* The current time according to the event loop. in msecs. */ \ | |||
uint64_t time; \ | |||
/* GetTickCount() result when the event loop time was last updated. */ \ | |||
/* This is unused, but we are keeping it to maintain ABI compatibility. */ \ | |||
/* TODO: Remove in v0.14. */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we care about ABI compatibility at this point, v0.10 and master diverged so much it's already broken in tens of places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but actually this was intended to maintain ABI compatibility in case we decide to revert this change in v0.12.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Can you please clarify the comment a bit? " ...maintain ABI compatibility in case of revert" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I am nervous about what our chinese xp bootlegger user base is going to think, but let's give it a spin. But before you land:
|
A quick test on my machine reported |
Some googling found that someone has a system which reports |
Thanks, @retep998. Useful. |
Thanks, @retep998 and @piscisaureus, definitely useful! From MSDN: Regarding switching to integer math, I actually think that floating point math is more reliable here, given that there are not guarantees on the range of magnitude of the performance counter frequency. In fact, to be on the safer side, I think we could make uv_update_time call QueryPerformanceCounter directly, and do the int conversion after converting to the millisecond unit. See Line 485 in a4f0350
In the worst case scenario, counter.QuadPart wraps every 100 years. We are converting to nanoseconds, and on a system with 1 microsecond resolution, this means multiplying by 1000. The conversion to uint64_t would then overflow every 100years/1000 = Then yes to all your other points. Regarding uv__time_forward, do you know what scenario required this trick? I wonder if we can still have a problem because of rounding to the nearest millisecond. |
If nothing unexpected happens I plan to release libuv 1.0.0 by the end of next week. Can you guys spend some cycles on this? @piscisaureus @orangemocha |
Sorry, I have been swamped. Yes. I'll get to it by early next week. |
Thanks Alexis! |
Cc @janjongboon |
Er, @janjongboom |
Updated to address @piscisaureus ' feedback. |
I have a question related to improving timer precision, though it shouldn't block this PR. Shouldn't uv_timer_start call uv_update_time before computing the timer's due time? |
/* This is unused, but we are keeping it to maintain ABI compatibility */ \ | ||
/* in case we need to revert uv_update_time to using GetTickCount */ \ | ||
/* in v0.12.x. */ \ | ||
/* TODO: Remove in v0.14. */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have some reserved fields for backwards ABI emergencies, we can remove this I'd say.
Not sure. On Unix we don't. It's done once per loop iteration, so unless some code stalled the loop we should be OK I think. |
Thanks @saghul ! Updated. |
Thanks Alexis! LGTM. @piscisaureus any further comment? |
@saghul, sorry, yes.
|
@piscisaureus I am confused. uv__time_forward was removed. Maybe you are looking at an outdated commit? https://github.com/joyent/libuv/pull/1165/files I replaced the invocations of uv__time_forward() with uv_update_time(). If I just removed the invocations, I would get a couple of unit test failures (timer_from_check and timer_run_once). The point of uv__time_forward() seemed to be that we wanted the time elapsed to be reflected in the loop time, and we need to at least update the time for that to be the case. |
I had a look at this and I believe @orangemocha is right and we need to update the time after we blocked for I/O, otherwise any timer started after polling for I/O will get the time wrong, as the timer_from_check test shows. FWIW we also do it for Unix: https://github.com/joyent/libuv/blob/master/src/unix/linux-core.c#L203-L207 Now, one can also say that the call to |
@orangemocha: Moreover, I think we should always update the time after GetQueuedCompletionStatus, because if we stopped polling due to an I/O operation and schedule a timer in a check callback, we'd get the time wrong, because it was never updated. |
Updated to call uv_update_time() unconditionally after GetQueuedCompletionStatus. I also made a change in test/test-timer.c, because now we are precise/fast enough that we were sometimes missing that 1 millisecond timer in the test. |
@piscisaureus ' concern of updating the time too often is not unjustified though... now we update it multiple times and it might not even be needed. What do you guys think about updating it lazily, as in https://github.com/MSOpenTech/libuv/commit/b09abd7e849d9820c101489051bfb4aabb30a34b ? |
Uh, how so? When running with UV_RUN_ONCE the loop will block for 1ms (in this case) and then, after I/O it will run timers again: https://github.com/joyent/libuv/blob/master/src/win/core.c#L405-L415 How can we miss it? |
While that might be an improvement, can we do it as a follow-up to this PR? Just so that we don't loose track of things :-) Now, about the patch: I think it makes the functions look a bit backwards: The idea of updating the time lazily sounds good though. One idea that crossed my mind would be to somehow "mark" the state of the loop, so we know we are in the section after polling for I/O, and only update the time if you start a timer or run |
Definitely ok to do the lazy update as a follow up. |
Regarding the test timer_run_once, I see what's going on. We are polling with a timeout equal to timer->due_time - loop->time, however when GetQueuedCompletedStatus returns and we update the time, the new loop->time is equal to timer->due_time -1. That is, GetQueuedCompletedStatus occasionally times out a bit early (I am guessing this could be due to rounding errors, and it never seems higher than 1 ms). I can think of the following solutions: IMO, c) is the best option. Rather than letting timers fire always late or always early, distort time a little bit when the wait ends a bit early. |
@orangemocha yeah, c) looks like the best approach IMHO. |
History repeating itself. So whatever, c) seems best to me too. |
Now the Unix and Windows code does look consistent... sans the time forward thing. For some reason Unix doesn't seem to suffer from such issue (returning a bit earlier than expected, that is). PS: Once Alexis addresses this last thing I plan to land it and tag rc1. |
Improving timing precision by using QueryPerformanceCounter. This is part of the fix for Node.js' test-timers-first-fire.js.
Updated. |
Thanks Alexis! I'll merge it later today. |
Landed in ✨ 6ced8c2 ✨. Thanks Alexis! |
Yay, thanks! |
Improving timing precision by using QueryPerformanceCounter.
This is part of the fix for Node.js' test-timers-first-fire.js.