-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement lzcnt and tzcnt #145
Conversation
Performance changes are: - For i32.clz, compile-time gets 40% faster and runtime gets 5x faster - For i32.ctz, compile-time gets 35-40% faster and runtime gets 5-6x faster (The difference between results probably comes from variance in measurement)
fn arch_has_xzcnt(&self) -> bool { | ||
std::is_x86_feature_detected!("bmi1") // tzcnt | ||
&& std::is_x86_feature_detected!("lzcnt") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot/should not do this – we don’t always run the code we generate on the same machine. Instead should consult with compiler::target::CpuFeature
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally tried to use it, but I can’t figure out how to access CpuFeature
from this impl, which is on top of VecAssembler<X64Relocation>
.
I can add a CpuFeature
argument to arch_has_xzcnt
, and add such a field to compiler_singlepass::config::Singlepass
so that we add support for it; but then it’d just be CpuFeature::for_host()
everywhere anyway. Maybe it’s better than using ambient authority like is_x86_feature_detected!
I guess
However, I’m curious, when do we ever share artifacts between machines? Overall it would seem like a quite bad idea, especially as aarch64 is picking up steam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I’m curious, when do we ever share artifacts between machines? Overall it would seem like a quite bad idea, especially as aarch64 is picking up steam
We store compiled artifacts into the RocksDB store. This database can be moved or shared, I believe. Interestingly our cache key does not depend on the CPU features, but I’d imagine we have enough foresight to change the seed here when we change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A most straightforward example of a database move would be a GCP instance restarting with a different CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for accessing CpuFeature
from within VecAssembler
, I imagine it should be sufficient to thread Target
up to the FuncGen
which could then check for the features stored within Target
before deciding whether to use the fallback or the simple implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm do we actually use no_cpu_compatibility_check
in production? I’d expect us to actually use CPU compatibility checks.
I guess my main issue is that if people compile with lzcnt/tzcnt enabled and then migrate to a machine without that (which seems unlikely but still), then it’ll result in mis-execution, as lzcnt is rep bsr
and that’d execute as bsr
there, thus giving out wrong answers if the value under count was 0. And this could lead to slashing.
I guess one solution would be to assert when creating wasmer2_runner, in new_for_target
, that bmi1/lzcnt are enabled; and add bmi1/lzcnt as required features in our docs. This way we could make sure that no misuse happens. That said, I’m not sure we should support use cases involving moving a rocksdb store to an older CPU… going to start a zulip thread on it I guess!
Either way, this change makes sense, I’ve just pushed it :)
Overall, it seems reasonable to me for us to emit |
3302: Some Refactor of Singlepass compiler to have better error and cpu features handling r=ptitSeb a=ptitSeb # Description * Some refactor of Singlepass compiler to have a better handling of Target CPU Feature and Error propagation * Added support for TZCNT and LZCNT opcode on x86_64 backend (based on near/wasmer#145) For #3305 Co-authored-by: ptitSeb <[email protected]>
Performance changes are:
(The difference between results probably comes from variance in measurement)
We should add BMI1 and LZCNT to at least our recommended feature list on https://near-nodes.io/validator/hardware ; and probably our required feature list.