-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework winapi's QueryPerformance* functions to match XDK #663
base: master
Are you sure you want to change the base?
Changes from 1 commit
ab3a9f5
64f6505
3faa575
235dde3
9a603f7
8ea50ce
eb4385b
ef0e000
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,28 +10,19 @@ | |
#include <stddef.h> | ||
#include <xboxkrnl/xboxkrnl.h> | ||
|
||
BOOL QueryPerformanceCounter (LARGE_INTEGER *lpPerformanceCount) | ||
{ | ||
assert(lpPerformanceCount != NULL); | ||
|
||
lpPerformanceCount->QuadPart = __rdtsc(); | ||
return TRUE; | ||
} | ||
|
||
BOOL QueryPerformanceFrequency (LARGE_INTEGER *lpFrequency) | ||
{ | ||
assert(lpFrequency != NULL); | ||
|
||
#ifdef USE_RDTSC_FOR_FREQ | ||
#define AVG_SET 10 | ||
ULARGE_INTEGER f_rdtsc, avg; | ||
ULONG f_ticks = 0; | ||
static LARGE_INTEGER frequency = {0, 0}; | ||
static void __attribute__((constructor)) PrimeQueryPerformanceFrequency () | ||
{ | ||
#define AVG_SET 2 | ||
ULARGE_INTEGER f_rdtsc, avg = {0, 0}, s_rdtsc; | ||
JayFoxRox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
ULONG f_ticks = 0, s_ticks = 0; | ||
JayFoxRox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
avg.QuadPart = 0; | ||
Sleep(500); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Values generated soon after the system is booted seem to give less precise results, as to why; I'm not sure yet. Runs with a 700ms initial wait and 200ms loop wait give us There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say that's good enough? What kind of precision do you expect? With your multiplication by 1000 at the end, this means that there's a single tick difference (1.36425648 nanoseconds). That's a 0.00013636369% error.. how is that not negligible? What are you using this function for that you require such precision? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't need such precision. It's an attempt to placate you and other maintainers. I don't want a hard value, you do. So I've made it so it's as close as I think is reasonable (possible?). What percentage ± would you accept, how long of a wait? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say a sleep of less than a millisecond would be good in constructors. Maybe 10ms if necessary. Not sure about the percentage. But I'd expect the integer to be exact (that's the "correct" solution after all) on a stock Xbox. If the integer is off by 1 I wouldn't care too much, although I could already see problems with different functions making assumptions about the fixed frequency (and there you'd likely use the hardcoded value). Maybe using the int -> float32 error would be a good measure. But in this instance, there's a perfect solution which has no wait. The only time it doesn't work is with CPU mods / under- or overclocking. We could support the more common case of overclocking the CPU, but not if it has major drawbacks for most users which run a stock Xbox (which is the majority). |
||
|
||
for (int i = 0; i < AVG_SET; i++) { | ||
ULARGE_INTEGER s_rdtsc; | ||
ULONG s_ticks; | ||
// If we call rdtsc too fast we'll end up with div by 0 | ||
Sleep(200); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Almost a full second startup delay is excessive. I'd say 100ms is the upper bound of anything that runs as a constructor, and I imagine it's ample time when removing that counter-productive loop and using a higher precision timer like what I mentioned above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ofc we should prefer not having to wait at all, which should be possible when reading the CPU clockspeed from wherever it is stored. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We can't guarantee this is accurate. The old method of changing the FSB on CPU replaced Xbox' were to mess with the clock generator output, not the hardware register used for that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd argue "CPU replaced Xbox" is no longer an Xbox and out of scope for nxdk. It's impossible for us to support all of these variations in a useful way. nxdk is open-source, so if people want to support non-stock clocks, then they can keep around a patched nxdk to make special builts. There's no good solution for supporting Xbox-like platforms in upstream nxdk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Whether or not it's possible to support all variations is moot. I'm attempting to support this specific thing, and I hate how much I have to try to justify my changes to you and others. Is there such a risk providing an accurate to now clock rate instead of a hard coded value?
These have been done while you've been gone. Replacing the GPU from 1.6 to say 1.0, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's still the same GPU. I'm talking about plugging some Voodoo 3dfx chip inside an Xbox.
No. But:
... and all of this for a niche use case that's not obtainable with a "stock" Xbox.. regardless of how badly it's damaged. All of this matters a lot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
100 bytes? Are we on an 8086? If this were a org standard we'd have inline assembly all over the place.
To quote you...
Application shouldn't assume about the hardware. We're providing a tool (local box variable timing) to avoid, bad assumptions.
I don't disagree. How long are you willing to Sleep on startup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I got this impression it's also relevant to you here: #662 and XboxDev/nxdk-pdclib#56 It's not just 100 bytes. The constant propagation can bite you badly. This could be the difference between evaluating a bunch of code at runtime vs doing it at build-time. Also, 100 bytes can be relevant to me when working on an DXT. Some commercial games are very short on memory and if any of those 100 bytes reach into another page, then that is not just 100 bytes, but likely a multiple of pages (4096 bytes).
Your error is at least 100x worse than what we would be getting for free by fixing the math.
It's a gaming console with a fixed spec. That's one of the main benefits of gaming consoles (and other embedded devices) over PCs. It's possible to approximate these constants by measuring. However you'll be adding measuring and rounding errors (also see my other comment about measuring an untrusted clock by using another untrusted clock).
See my other comment about this. But this wait is avoidable (by better measuring code / math) or by hardcoding the frequency. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why sleep at the start of the loop? To me, it looks like this should only do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I guess I'm confused about this. Is there a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If the caps are so much out of tolerance that you are that far away from 733MHz that it even drifts a second per minute then I'd consider your Xbox broken. You don't use Hence, we can safely assume the reference clock speed. There shouldn't be a need to read the FSB or cpuid - we know what CPU the Xbox platform has. If it doesn't, then it's not an Xbox and you should probably keep these things in local patches to nxdk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As the Xbox continues to age this will become more and more relevant. I recently had a box which would report a 729MHz CPU frequency as the filter capacitors had blown, now, games still worked, and most wouldn't notice such an issue. Why should be not account for this? Why should we continue to make bad assumptions about the platform. No patch I've ever introduced has attempted to move away from Xbox as a platform, to support different AGP cards, add-in PCI cards, FooBar SMBus device, I've only attempted to remove these hard coded values and addresses in an attempt to support more Xboxes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because this will likely affect the entire system. So even if you calibrate against another clock, the result will still be very close to 733.333MHz measured. This is only a problem if the clock on a single component varies. However, most clocks are affected by a huge number of components, so drift on any of them can't cause the result to deviate much (likely there's an opposing force too.. and clock generators are also build to combat this). But even if there is a problem with a single clock, then you don't know which one, you can't calibrate against another clock source because that might also have deteriorated. At best, you can average out the error of all clock sources. You are also introducing a very high cost of synchronizing / measuring clocks and then continue to have a high cost due to a bunch of optimizations you can't do anymore due to lack of known clock frequencies at compile time (which also means you run more code to get the same result = the errors in clock affect you even more). And even if you managed to get accurate measurements, you couldn't account for drift at runtime which would likely exist (in the case of electronic components due to external factors like heat). You can't compare a clock in a computer system to a wall-clock (unless specifically designed to be that.. which are RTCs.. which usually run at lower precision). It's not important that the frequency is perfect. But you still want it to be a very very good average if you don't know how people will use it. Especially on a console like the Xbox, there'll be plenty of application code (out of our control) that will assume If you hardcode the expected frequency you get around all of these issues (and a couple of other benefits), because if one counter runs slower, they likely all will. What is important is that everyone agrees on what these clocks should be running at. All that said, even a bad and drifting counter that's not an RTC or misconfigured is still perfectly valid for relative measurements or measurements which a human couldn't possible notice:
And that's a noble goal. The issue is that it's hard to get away from constant clock frequencies in the code for the reason mentioned above: they can't be measured (without depending on another clock.. which can also have errors). I'm all for removing stupid checks or getting instead of setting clocks (= cases where we depend on hw setup by some XDK code instead of being able to do it ourselves). What we should attempt is to mimic how the XDK does it. The goal should be: if XDK code works, then equivalent nxdk code should work. If the XDK hardcodes a value, then we should do that too. |
||
|
||
s_rdtsc.QuadPart = __rdtsc(); | ||
s_ticks = KeTickCount; | ||
|
@@ -45,11 +36,25 @@ BOOL QueryPerformanceFrequency (LARGE_INTEGER *lpFrequency) | |
// Skip the first result as invalid | ||
if (i) | ||
avg.QuadPart += s_rdtsc.QuadPart; | ||
|
||
// If we call rdtsc too fast we'll end up with div by 0 | ||
Sleep(10); | ||
} | ||
lpFrequency->QuadPart = (avg.QuadPart / (AVG_SET - 1)) * 1000; | ||
frequency.QuadPart = avg.QuadPart / (AVG_SET - 1) * 1000LL; | ||
} | ||
#endif | ||
|
||
BOOL QueryPerformanceCounter (LARGE_INTEGER *lpPerformanceCount) | ||
{ | ||
assert(lpPerformanceCount != NULL); | ||
|
||
lpPerformanceCount->QuadPart = __rdtsc(); | ||
return TRUE; | ||
} | ||
|
||
BOOL QueryPerformanceFrequency (LARGE_INTEGER *lpFrequency) | ||
{ | ||
assert(lpFrequency != NULL); | ||
|
||
#ifdef USE_RDTSC_FOR_FREQ | ||
lpFrequency->QuadPart = frequency.QuadPart; | ||
#else | ||
lpFrequency->QuadPart = 733333333; | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't hide correct behavior behind a compile-time flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XDK only replies with 733MHz, so that would be "correct". As stated, I don't like hard numbers, but don't want to force wasted time on developers if they don't require it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct though? I'd even consider rejecting this with "move it to a fork":
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's assumptions like this that stop nxdk with pbkit running from functioning on my Xbox. There are applications now due to MIT licensing that I can't readily modify the source of, which means I need to hack the xbe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is code in the winapi - so it's probably not mission critical.
If your Xbox drifts too far from 733.333MHz to crash applications then it's probably broken - it's not a fault of nxdk.
If you changed the clocks or the CPU (which I suspect), then it's no longer an Xbox and breaks many assumptions that both, MS XDK and the nxdk have (also the kernel probably.. which you probably had to patch then to even get your Xbox to boot).
... which make the Xbox a gaming console instead of a generic PC.
nxdk winapi isn't a generic winapi and it's not a PC OS / environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's entirely nxdk's fault for killing pbkit if the frequency is not default. https://github.com/XboxDev/nxdk/blob/master/lib/pbkit/pbkit.c#L2812
Again bad assumptions about the box we're running on. I've tried multiple times to introduce similar things (NV base address being one of them) and at every turn I seem to hit this review bomb or ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: That particular check is bad and inprecise - it can easily cause issues and is totally avoidable.
And yes.. and that should be addressed.. but that's not what this PR is doing.
But modifying the NV base address is also not really reasonable. I also needed that for some of my projects.
However, I can see how it's not practical to do such invasive changes in nxdk.
I think one of the things we could do is to expose constants in some header (probably as macro, so they can be changed dynamically, potentially with a source marker so you know where the constant gets used from).
People could just reconfigure their nxdk if necessary then - however, that doesn't help with closed source apps.
However, I don't think it's practical to keep some of these hacks for niche system configurations or even niche software applications (like my attempt to emulate a virtual NIC from a DXT / xbe-loader) in the upstream nxdk codebase.
A lot of these changes quickly become incompatible with much more important (and required) changes.
Testing such niche applications is also a lot more difficult and therefore we'd be constantly introducing new bugs for those niche cases.
We'd also lose the major benefit of a gaming console (at least in the sense up to the 7th console generation or so): fixed hardware which allows massive simplifications and assumptions (which brings performance gains).
I think this is also a huge draw for some of the developers - Xbox being a fixed target makes it an interesting coding challenge and having a fixed spec brings some comfort.
xbox-linux and some of OpenXDK tried to turn Xbox into a PC.. that's also a noble goal, but I think that's a non-goal for nxdk. And even then, the xbox-linux Xbox platform likely has to make a bunch of assumptions.
I feel like this PR has gotten a lot more attention than it deserves.
There aren't many active maintainers (or outside reviewers) in XboxDev, but good changes still gets merged eventually.. occasionally I see merges in my GH notifications.
I'd prefer if some things got merged sooner than later, but this is how it is.
How many PRs did you review today? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, and I tried. Both an frequency accurate timer version, and a dumb "remove the kill code" version. Both were left stale and nitpicked.
We're all just giving our free time to see the change in the world that we envision. If you feel you're giving it more attention than you should, then I'm confused why you are.
None. I don't do software development for work. Hopefully you had a good day at work today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an assumption of good faith: One would expect to gain more than to lose on doing a review.
Even if there's a bad PR we still have to take the time to review and close it.
That's the cost of doing FOSS.
In turn, we hope that the author improves their code so that it might be eventually mergeable (hopefully giving us a return of invest). Or we hope that some author shows up with perfectly mergeable code (which is an instant win).
This PR in particular looked to become mergeable eventually:
However, then it stagnated quickly, leading to many discussions, rather than progress on the code.
The initial arguments faded into the background as some weird arguments came up like "bad caps" or "unstable clocks at XBE launch" which made little sense.. unless you aren't running a stock Xbox.
Both, @thrimbor and I tried to assist or nudge you in the right direction, but you don't seem to care for our guidance (regarding our comments about numerical stability for example, starting at surface level and then even taking time to explain it with examples - still in good faith.. either that this PR is merged, or that others learn from it).
By now, I've lost faith in this PR. I'll do a last pass responding to the remaining open discussions.
It might still be merged eventually, but I don't think it's going to happen because of comments like "Wont fix".