Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement lzcnt and tzcnt #145

Merged
merged 2 commits into from
Oct 14, 2022
Merged

Conversation

Ekleog-NEAR
Copy link
Contributor

@Ekleog-NEAR Ekleog-NEAR commented Oct 6, 2022

Performance changes are:

  • For i64.clz, compile-time gets 40% faster and runtime gets 5x faster
  • For i64.ctz, compile-time gets 35-40% faster and runtime gets 5-6x faster

(The difference between results probably comes from variance in measurement)

We should add BMI1 and LZCNT to at least our recommended feature list on https://near-nodes.io/validator/hardware ; and probably our required feature list.

Performance changes are:
- For i32.clz, compile-time gets 40% faster and runtime gets 5x faster
- For i32.ctz, compile-time gets 35-40% faster and runtime gets 5-6x faster

(The difference between results probably comes from variance in measurement)
@Ekleog-NEAR Ekleog-NEAR requested review from matklad and nagisa October 6, 2022 14:20
Comment on lines 1422 to 1425
fn arch_has_xzcnt(&self) -> bool {
std::is_x86_feature_detected!("bmi1") // tzcnt
&& std::is_x86_feature_detected!("lzcnt")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot/should not do this – we don’t always run the code we generate on the same machine. Instead should consult with compiler::target::CpuFeatures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally tried to use it, but I can’t figure out how to access CpuFeature from this impl, which is on top of VecAssembler<X64Relocation>.

I can add a CpuFeature argument to arch_has_xzcnt, and add such a field to compiler_singlepass::config::Singlepass so that we add support for it; but then it’d just be CpuFeature::for_host() everywhere anyway. Maybe it’s better than using ambient authority like is_x86_feature_detected! I guess

However, I’m curious, when do we ever share artifacts between machines? Overall it would seem like a quite bad idea, especially as aarch64 is picking up steam

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I’m curious, when do we ever share artifacts between machines? Overall it would seem like a quite bad idea, especially as aarch64 is picking up steam

We store compiled artifacts into the RocksDB store. This database can be moved or shared, I believe. Interestingly our cache key does not depend on the CPU features, but I’d imagine we have enough foresight to change the seed here when we change this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A most straightforward example of a database move would be a GCP instance restarting with a different CPU.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for accessing CpuFeature from within VecAssembler, I imagine it should be sufficient to thread Target up to the FuncGen which could then check for the features stored within Target before deciding whether to use the fallback or the simple implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm do we actually use no_cpu_compatibility_check in production? I’d expect us to actually use CPU compatibility checks.

I guess my main issue is that if people compile with lzcnt/tzcnt enabled and then migrate to a machine without that (which seems unlikely but still), then it’ll result in mis-execution, as lzcnt is rep bsr and that’d execute as bsr there, thus giving out wrong answers if the value under count was 0. And this could lead to slashing.

I guess one solution would be to assert when creating wasmer2_runner, in new_for_target, that bmi1/lzcnt are enabled; and add bmi1/lzcnt as required features in our docs. This way we could make sure that no misuse happens. That said, I’m not sure we should support use cases involving moving a rocksdb store to an older CPU… going to start a zulip thread on it I guess!

Either way, this change makes sense, I’ve just pushed it :)

@nagisa
Copy link
Collaborator

nagisa commented Oct 10, 2022

Overall, it seems reasonable to me for us to emit lzcnt/tzcnt. These instructions are quite commonplace nowadays.

@Ekleog-NEAR Ekleog-NEAR merged commit bd153a6 into near:near-main Oct 14, 2022
bors bot added a commit to wasmerio/wasmer that referenced this pull request Nov 22, 2022
3302: Some Refactor of Singlepass compiler to have better error and cpu features handling r=ptitSeb a=ptitSeb

# Description

* Some refactor of Singlepass compiler to have a better handling of Target CPU Feature and Error propagation
* Added support for TZCNT and LZCNT opcode on x86_64 backend (based on near/wasmer#145)
For #3305 

Co-authored-by: ptitSeb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants