Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Generalise i8x16.any_true so as to support vectorised C strlen, strcmp, strstr, et al. #169

Closed
julian-seward1 opened this issue Jan 7, 2020 · 2 comments · Fixed by #201

Comments

@julian-seward1
Copy link

(w/ apologies in advance if in fact this has already been considered, and I missed it)

It would be nice if the MVP could support basic vectorised operations for C-style strings, in particular strlen and strcmp. The spec already contains one of the key building blocks, i8x16.eq, which makes it possible to find bytes that are "interesting" (string-end zeroes, or non-equal bytes in strcmp).

But it appears to lack the other key operation, which is to find the index of the lowest (or highest, depending on endianness) zero byte lane in a v128. This is necessary at least for strlen.

i8x16.any_true is almost good enough, except it isn't. Because it doesn't produce the actual index, which is necessary to correctly calculate a string length.

Could i8x16.any_true be generalised to, or replaced by, i8x16.highest_true (and also lowest_true if necessary for the opposite endian'd case?) These would produce the index of the lowest or highest zero byte-lane.

At least on Intel, this can be efficiently implemented by using PMOVMSKB followed by an integer-ALU count-leading zeroes operation, or their older equivalents, BSF/BSR. I would be surprised if ARM didn't offer some equivalent mechanism.

Adding the 16-bit-lane equivalents (PMOVMSKW .. does that exist?) would make it possible to handle wchar versions of strlen/strcmp.

@julian-seward1
Copy link
Author

julian-seward1 commented Jan 7, 2020

In fact, the obvious way to implement i8x16.any_true on Intel is exactly using PMOVMSKB followed by an integer-reg comparison against zero. If that's so, it would be trivial to implement i8x16.{lowest,highest}_true merely by replacing the integer comparison by a count leading/trailing zeroes operation.

@zeux
Copy link
Contributor

zeux commented Jan 9, 2020

See also #131. I think the challenge is that on some architectures, there's no straightforward mapping to efficient vector instructions for concepts like this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants