-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exhaustive tests for RegExp Unicode property escapes #971
Add exhaustive tests for RegExp Unicode property escapes #971
Conversation
Initial observations:
const buildString = ({ loneCodePoints, ranges }) => {
let result = String.fromCodePoint(...loneCodePoints);
for (const [start, end] of ranges) {
let codePoints = [];
for (let i = 0, codePoint = start; codePoint <= end; codePoint++) {
codePoints[i++] = codePoint;
if (i === 10000) {
result += String.fromCodePoint(...codePoints);
codePoints.length = i = 0;
}
}
result += String.fromCodePoint(...codePoints);
}
return result;
}; a v8-debug build created the string for ASCII_Hex_Digit in 750ms compared to 9750ms with the current |
@anba Thanks for taking a look!
Did you mean here? That’s in I’ll address your other feedback (excellent points/suggestions!) tomorrow — if you don’t beat me to it with a PR, that is. |
} | ||
} | ||
return result; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if buildString is the same among all tests, we should add a new js file in the /harness/
folder and get it through the includes tag. i.e.:
esid: sec-static-semantics-unicodematchproperty-p
features: [regexp-unicode-property-escapes]
includes: [buildString.js]
---*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
PropList.txt has:
So U+10FFFE and U+10FFFF should match |
These tests are great, and they seem to test the feature in a pretty thorough way. A couple thoughts:
|
I'm accountable for this message. We avoid to use messages as failure messages, thou. They are actually a message of what we expect, a description of the assertion. While we want to improve "matches all proper symbols", it should be using some better description of each assertion. We also agree it should be a different message for each part so it helps finding the errant point. |
Thanks to @anba for the suggestion! tc39/test262#971 (comment)
Thanks to @anba for spotting this! tc39/test262#971 (comment)
@anba You’re absolutely right! Fixed.
@littledan I wonder if they’re any faster now that
@littledan & @leobalter The string currently isn’t checked in parts — it’s all in one go.
I’m tempted to leave it as-is: let’s optimize for the common case, where engines run these tests at build time long after implementing support for this feature. Nicer, more specific error messages only help engineers that are currently implementing the feature, but do so at a performance cost for everyone. |
@mathiasbynens They took 1m32s with the new version on my machine. A huge improvement, but that's still 10% as much time as the rest of test262--I'd say this is a bit borderline, and I'd be interested in input from others. |
For code point specific error messages, just a thought, you could search through the individual code points when one of the bigger strings finds a mismatch. |
ICU 59 doesn't seem to include these properties either. |
Yes, I can help with that in the source as a follow up.
I believe we can fix it just by having less generic strings. I'll follow up with a PR in the source as well. |
Proposal: https://github.com/tc39/proposal-regexp-unicode-property-escapes These tests have been generated by the script at https://github.com/mathiasbynens/unicode-property-escapes-tests. They check all the properties and values that should be supported by implementations against the symbols they’re supposed to match. False positives are detected as well. Ref. #950. Ref. tc39/proposal-regexp-unicode-property-escapes#4.
Running on V8:
I'm compiling SpiderMonkey to check these tests as well and the running time. |
Assigned.js, General_Category_-_Other.js, and General_Category_-_Surrogate.js: |
SpiderMonkey doesn't yet support Unicode property escapes. |
Thanks for letting me know. I needed to build a new version of SM anyway. :) |
Looking into the unexpected failures in V8. (This is a problem with the tests — not V8’s implementation!) |
Fix for those three broken tests: #974 |
Proposal: https://github.com/tc39/proposal-regexp-unicode-property-escapes
These tests have been generated by the script at https://github.com/mathiasbynens/unicode-property-escapes-tests. They check all the properties and values that should be supported by implementations against the symbols they’re supposed to match. False positives are detected as well.
Ref. #950.
Ref. tc39/proposal-regexp-unicode-property-escapes#4.