Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework winapi's QueryPerformance* functions to match XDK #663

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
83 changes: 81 additions & 2 deletions lib/winapi/profiling.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,101 @@
// SPDX-FileCopyrightText: 2019 Stefan Schmidt

#include <profileapi.h>
#ifdef USE_RDTSC_FOR_FREQ
#include <synchapi.h>
#endif
#include <assert.h>
#include <stddef.h>
#include <xboxkrnl/xboxkrnl.h>

#ifdef USE_RDTSC_FOR_FREQ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hide correct behavior behind a compile-time flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XDK only replies with 733MHz, so that would be "correct". As stated, I don't like hard numbers, but don't want to force wasted time on developers if they don't require it.

Copy link
Member

@JayFoxRox JayFoxRox Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hide correct behavior behind a compile-time flag.

Is it correct though? I'd even consider rejecting this with "move it to a fork":

  • Added complexity
  • We can safely assume every single official Xbox will be running at 733MHz (and even if they don't, then it's up to the person who over-/underclocked to fix applications, because none of the MS code will work anymore either)

XDK only replies with 733MHz, so that would be "correct".

Agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely assume every single official Xbox will be running at 733MHz

It's assumptions like this that stop nxdk with pbkit running from functioning on my Xbox. There are applications now due to MIT licensing that I can't readily modify the source of, which means I need to hack the xbe.

Copy link
Member

@JayFoxRox JayFoxRox Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's assumptions like this that stop nxdk with pbkit running from functioning on my Xbox

Note that this is code in the winapi - so it's probably not mission critical.

If your Xbox drifts too far from 733.333MHz to crash applications then it's probably broken - it's not a fault of nxdk.
If you changed the clocks or the CPU (which I suspect), then it's no longer an Xbox and breaks many assumptions that both, MS XDK and the nxdk have (also the kernel probably.. which you probably had to patch then to even get your Xbox to boot).

It's assumptions like this ...

... which make the Xbox a gaming console instead of a generic PC.
nxdk winapi isn't a generic winapi and it's not a PC OS / environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a fault of nxdk

It's entirely nxdk's fault for killing pbkit if the frequency is not default. https://github.com/XboxDev/nxdk/blob/master/lib/pbkit/pbkit.c#L2812

Again bad assumptions about the box we're running on. I've tried multiple times to introduce similar things (NV base address being one of them) and at every turn I seem to hit this review bomb or ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's entirely nxdk's fault for killing pbkit if the frequency is not default

Yes: That particular check is bad and inprecise - it can easily cause issues and is totally avoidable.
And yes.. and that should be addressed.. but that's not what this PR is doing.

I've tried multiple times to introduce similar things (NV base address being one of them)

But modifying the NV base address is also not really reasonable. I also needed that for some of my projects.
However, I can see how it's not practical to do such invasive changes in nxdk.

I think one of the things we could do is to expose constants in some header (probably as macro, so they can be changed dynamically, potentially with a source marker so you know where the constant gets used from).
People could just reconfigure their nxdk if necessary then - however, that doesn't help with closed source apps.

However, I don't think it's practical to keep some of these hacks for niche system configurations or even niche software applications (like my attempt to emulate a virtual NIC from a DXT / xbe-loader) in the upstream nxdk codebase.
A lot of these changes quickly become incompatible with much more important (and required) changes.
Testing such niche applications is also a lot more difficult and therefore we'd be constantly introducing new bugs for those niche cases.

We'd also lose the major benefit of a gaming console (at least in the sense up to the 7th console generation or so): fixed hardware which allows massive simplifications and assumptions (which brings performance gains).
I think this is also a huge draw for some of the developers - Xbox being a fixed target makes it an interesting coding challenge and having a fixed spec brings some comfort.

xbox-linux and some of OpenXDK tried to turn Xbox into a PC.. that's also a noble goal, but I think that's a non-goal for nxdk. And even then, the xbox-linux Xbox platform likely has to make a bunch of assumptions.

and at every turn I seem to hit this review bomb or ignore.

I feel like this PR has gotten a lot more attention than it deserves.

There aren't many active maintainers (or outside reviewers) in XboxDev, but good changes still gets merged eventually.. occasionally I see merges in my GH notifications.

I'd prefer if some things got merged sooner than later, but this is how it is.
How many PRs did you review today? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes.. and that should be addressed..

It should, and I tried. Both an frequency accurate timer version, and a dumb "remove the kill code" version. Both were left stale and nitpicked.

I feel like this PR has gotten a lot more attention than it deserves.

We're all just giving our free time to see the change in the world that we envision. If you feel you're giving it more attention than you should, then I'm confused why you are.

How many PRs did you review today?

None. I don't do software development for work. Hopefully you had a good day at work today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're all just giving our free time to see the change in the world that we envision. If you feel you're giving it more attention than you should, then I'm confused why you are.

There's an assumption of good faith: One would expect to gain more than to lose on doing a review.

Even if there's a bad PR we still have to take the time to review and close it.
That's the cost of doing FOSS.

In turn, we hope that the author improves their code so that it might be eventually mergeable (hopefully giving us a return of invest). Or we hope that some author shows up with perfectly mergeable code (which is an instant win).

This PR in particular looked to become mergeable eventually:

  • it started out with good arguments
  • it had fatal design flaws, but it looked like you are willing to fix it (and learn)

However, then it stagnated quickly, leading to many discussions, rather than progress on the code.
The initial arguments faded into the background as some weird arguments came up like "bad caps" or "unstable clocks at XBE launch" which made little sense.. unless you aren't running a stock Xbox.

Both, @thrimbor and I tried to assist or nudge you in the right direction, but you don't seem to care for our guidance (regarding our comments about numerical stability for example, starting at surface level and then even taking time to explain it with examples - still in good faith.. either that this PR is merged, or that others learn from it).

By now, I've lost faith in this PR. I'll do a last pass responding to the remaining open discussions.
It might still be merged eventually, but I don't think it's going to happen because of comments like "Wont fix".

static LARGE_INTEGER frequency = {{0, 0}};
static void __attribute__((constructor)) PrimeQueryPerformanceFrequency ()
{
#define BASE_CLOCK_FLOAT 16.666667f
#define NV_PRAMDAC_PLL_COEFF *(volatile ULONG*)0xFD680500
#define NV_PTIMER_NUM *(volatile ULONG*)0xFD009200
#define NV_PTIMER_DEN *(volatile ULONG*)0xFD009210
#define NV_PTIMER_COUNT *(volatile ULONG*)0xFD009400
#define KE_STALL 10

ULARGE_INTEGER rdtsc_count_1 = {{0, 0}}, rdtsc_count_2 = {{0, 0}};
DWORD ptimer_count_1 = 0, ptimer_count_2 = 0;

// Precalcuate NVCLK & PTIMER freq
double nv_clock = BASE_CLOCK_FLOAT * ((NV_PRAMDAC_PLL_COEFF & 0xFF00) >> 8);
nv_clock /= 1 << ((NV_PRAMDAC_PLL_COEFF & 0x70000) >> 16);
nv_clock /= NV_PRAMDAC_PLL_COEFF & 0xFF;

double ptimer_frequency = (nv_clock / NV_PTIMER_NUM) * NV_PTIMER_DEN;

KeEnterCriticalRegion();

// Turn off caches
__asm
{
cli
sfence
mov eax, cr0
or eax, 1 << 30 // Set CD bit
mov cr0, eax
wbinvd

}

// Reset the counter
NV_PTIMER_COUNT &= ~(0xFFFFFFE0); // First 5 bits are not used

rdtsc_count_1.QuadPart = __rdtsc();
ptimer_count_1 = NV_PTIMER_COUNT;

KeStallExecutionProcessor(KE_STALL);

rdtsc_count_2.QuadPart = __rdtsc();
ptimer_count_2 = NV_PTIMER_COUNT;

__asm
{
sfence
mov eax, cr0
and eax, ~(1 << 30) // Clear CD bit
mov cr0, eax
wbinvd
sti
}

KeLeaveCriticalRegion();

double ptimer_diff = (ptimer_count_2 >> 5) - (ptimer_count_1 >> 5);
double rdtsc_diff = rdtsc_count_2.QuadPart - rdtsc_count_1.QuadPart;

double ptimer_scale = ptimer_diff / ptimer_frequency;
double cpu_freq_float = rdtsc_diff / ptimer_scale;

if (!cpu_freq_float) {
frequency.QuadPart = 733333333;
} else {
frequency.QuadPart = (ULONG)(cpu_freq_float * 1000 * 1000);
}
}
#endif

BOOL QueryPerformanceCounter (LARGE_INTEGER *lpPerformanceCount)
{
assert(lpPerformanceCount != NULL);

lpPerformanceCount->QuadPart = KeQueryPerformanceCounter();
lpPerformanceCount->QuadPart = __rdtsc();
return TRUE;
}

BOOL QueryPerformanceFrequency (LARGE_INTEGER *lpFrequency)
{
assert(lpFrequency != NULL);

lpFrequency->QuadPart = KeQueryPerformanceFrequency();
#ifdef USE_RDTSC_FOR_FREQ
lpFrequency->QuadPart = frequency.QuadPart;
#else
lpFrequency->QuadPart = 733333333;
#endif
return TRUE;
}