Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add specialization for 24MHz QueryPerformanceFrequency #3832

Merged
merged 9 commits into from
Jul 14, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 41 additions & 4 deletions stl/inc/__msvc_chrono.hpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the push_macro/undef/pop_macro magic incantation here for likely and unlikely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely and unlikely are commonly defined as function-like macros, but not object-like macros. <xkeycheck.h> avoids rejecting them for that reason.

Unlike msvc etc., users are technically not supposed to macroize likely and unlikely, so we technically don't need to defend against them. We could but I don't think it's necessary at the moment.

Original file line number Diff line number Diff line change
Expand Up @@ -666,18 +666,53 @@ namespace chrono {
using time_point = _CHRONO time_point<steady_clock>;
static constexpr bool is_steady = true;

#if defined(_M_ARM) || defined(_M_ARM64) // vvv ARM/ARM64 arch vvv

#if _HAS_CXX20
#define _LIKELY_ARM [[likely]]
fsb4000 marked this conversation as resolved.
Show resolved Hide resolved
#elif defined(__clang__)
#define _LIKELY_ARM [[__likely__]]
#else
#define _LIKELY_ARM
#endif
#define _LIKELY_X86

#elif defined(_M_IX86) || defined(_M_X64) // ^^^ ARM/ARM64 arch / X86/X64 arch vvv

#if _HAS_CXX20
#define _LIKELY_X86 [[likely]]
#elif defined(__clang__)
#define _LIKELY_X86 [[__likely__]]
#else
#define _LIKELY_X86
#endif
#define _LIKELY_ARM

#else // ^^^ X86/X64 arch / other arch vvv
#define _LIKELY_ARM
#define _LIKELY_X86
#endif // ^^^ other arch ^^^
_NODISCARD static time_point now() noexcept { // get current time
const long long _Freq = _Query_perf_frequency(); // doesn't change after system boot
const long long _Ctr = _Query_perf_counter();
static_assert(period::num == 1, "This assumes period::num == 1.");
// 10 MHz is a very common QPC frequency on modern PCs. Optimizing for
// 10 MHz is a very common QPC frequency on modern X86 PCs. Optimizing for
fsb4000 marked this conversation as resolved.
Show resolved Hide resolved
// this specific frequency can double the performance of this function by
// avoiding the expensive frequency conversion path.
constexpr long long _TenMHz = 10'000'000;
if (_Freq == _TenMHz) {
constexpr long long _TwentyFourMHz = 24'000'000;
constexpr long long _TenMHz = 10'000'000;
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
// clang-format off
if (_Freq == _TenMHz) _LIKELY_X86 {
static_assert(period::den % _TenMHz == 0, "It should never fail.");
constexpr long long _Multiplier = period::den / _TenMHz;
return time_point(duration(_Ctr * _Multiplier));
} else if (_Freq == _TwentyFourMHz) _LIKELY_ARM {
// The compiler recognizes the constants for frequency and time period and uses shifts and
fsb4000 marked this conversation as resolved.
Show resolved Hide resolved
// multiplies instead of divides to calculate the nanosecond value. This frequency is common on
// ARM64 (Windows devices, and Apple Silicon Macs using Parallels Desktop)
fsb4000 marked this conversation as resolved.
Show resolved Hide resolved
const long long _Whole = (_Ctr / _TwentyFourMHz) * period::den;
const long long _Part = (_Ctr % _TwentyFourMHz) * period::den / _TwentyFourMHz;
return time_point(duration(_Whole + _Part));
} else {
// Instead of just having "(_Ctr * period::den) / _Freq",
// the algorithm below prevents overflow when _Ctr is sufficiently large.
Expand All @@ -688,9 +723,11 @@ namespace chrono {
const long long _Part = (_Ctr % _Freq) * period::den / _Freq;
return time_point(duration(_Whole + _Part));
}
// clang-format on
}
};

#undef _LIKELY_ARM
#undef _LIKELY_X86
_EXPORT_STD using high_resolution_clock = steady_clock;
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
} // namespace chrono

Expand Down