Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add specialization for 24MHz QueryPerformanceFrequency #3832

Merged
merged 9 commits into from
Jul 14, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 28 additions & 6 deletions stl/inc/__msvc_chrono.hpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the push_macro/undef/pop_macro magic incantation here for likely and unlikely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely and unlikely are commonly defined as function-like macros, but not object-like macros. <xkeycheck.h> avoids rejecting them for that reason.

Unlike msvc etc., users are technically not supposed to macroize likely and unlikely, so we technically don't need to defend against them. We could but I don't think it's necessary at the moment.

Original file line number Diff line number Diff line change
Expand Up @@ -666,18 +666,38 @@ namespace chrono {
using time_point = _CHRONO time_point<steady_clock>;
static constexpr bool is_steady = true;

#if defined(_M_ARM) || defined(_M_ARM64) // vvv ARM or ARM64 arch vvv
#define _LIKELY_ARM _LIKELY
#define _LIKELY_X86
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
#elif defined(_M_IX86) || defined(_M_X64) // ^^^ ARM or ARM64 arch / x86 or x64 arch vvv
#define _LIKELY_ARM
#define _LIKELY_X86 _LIKELY
#else // ^^^ x86 or x64 arch / other arch vvv
#define _LIKELY_ARM
#define _LIKELY_X86
#endif // ^^^ other arch ^^^
_NODISCARD static time_point now() noexcept { // get current time
const long long _Freq = _Query_perf_frequency(); // doesn't change after system boot
const long long _Ctr = _Query_perf_counter();
static_assert(period::num == 1, "This assumes period::num == 1.");
// 10 MHz is a very common QPC frequency on modern PCs. Optimizing for
// this specific frequency can double the performance of this function by
// avoiding the expensive frequency conversion path.
constexpr long long _TenMHz = 10'000'000;
if (_Freq == _TenMHz) {
// The compiler recognizes the constants for frequency and time period and uses shifts and
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
// multiplies instead of divides to calculate the nanosecond value.
constexpr long long _TwentyFourMHz = 24'000'000;
constexpr long long _TenMHz = 10'000'000;
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
// clang-format off
if (_Freq == _TenMHz) _LIKELY_X86 {
// 10 MHz is a very common QPC frequency on modern x86 PCs. Optimizing for
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
// this specific frequency can double the performance of this function by
// avoiding the expensive frequency conversion path.
static_assert(period::den % _TenMHz == 0, "It should never fail.");
constexpr long long _Multiplier = period::den / _TenMHz;
return time_point(duration(_Ctr * _Multiplier));
} else if (_Freq == _TwentyFourMHz) _LIKELY_ARM {
// 24 MHz frequency is a common frequency on ARM64, including cases where it emulates x86
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
// (Windows devices, and Apple Silicon Macs using Parallels Desktop)
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
const long long _Whole = (_Ctr / _TwentyFourMHz) * period::den;
const long long _Part = (_Ctr % _TwentyFourMHz) * period::den / _TwentyFourMHz;
return time_point(duration(_Whole + _Part));
} else {
// Instead of just having "(_Ctr * period::den) / _Freq",
// the algorithm below prevents overflow when _Ctr is sufficiently large.
Expand All @@ -688,9 +708,11 @@ namespace chrono {
const long long _Part = (_Ctr % _Freq) * period::den / _Freq;
return time_point(duration(_Whole + _Part));
}
// clang-format on
}
};

#undef _LIKELY_ARM
#undef _LIKELY_X86
_EXPORT_STD using high_resolution_clock = steady_clock;
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
} // namespace chrono

Expand Down
14 changes: 14 additions & 0 deletions stl/inc/yvals_core.h
Original file line number Diff line number Diff line change
Expand Up @@ -525,6 +525,20 @@
#define _FALLTHROUGH
#endif

#ifndef __has_cpp_attribute // vvv no attributes vvv
#define _LIKELY
#define _UNLIKELY
#elif __has_cpp_attribute(likely) >= 201803L && __has_cpp_attribute(unlikely) >= 201803L // ^^^ no attr/C++20 attr vvv
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
#define _LIKELY [[likely]]
#define _UNLIKELY [[unlikely]]
#elif defined(__clang__) // ^^^ C++20 attributes / clang attributes and C++17 or C++14 vvv
#define _LIKELY [[__likely__]]
#define _UNLIKELY [[__unlikely__]]
#else // ^^^ clang attributes and C++17 or C++14 / C1XX attributes and C++17 or C++14 vvv
#define _LIKELY
#define _UNLIKELY
#endif // ^^^ C1XX attributes and C++17 or C++14 ^^^

// _HAS_NODISCARD (in vcruntime.h) controls:
// [[nodiscard]] attributes on STL functions

Expand Down