-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance and document performance tips #193
Comments
Hmmm, immediately after submitting this I got an idea that the label inference might be the culprit and it surely is:
function validateWithOw(objects) {
const predicate = ow.object.exactShape({
foo: ow.string,
one: ow.number,
obj: ow.object,
})
objects.forEach(obj => {
ow(obj, 'x', predicate) // <==== added some label
})
} |
Another potential improvement lies in the execution of Example of our validation: const predicate = ow.object.exactShape({
url: ow.string.url,
id: ow.optional.string,
loadedUrl: ow.optional.string.url,
uniqueKey: ow.optional.string,
method: ow.optional.string,
payload: ow.optional.any(ow.string, ow.buffer),
noRetry: ow.optional.boolean,
retryCount: ow.optional.number,
errorMessages: ow.optional.array.ofType(ow.string),
headers: ow.optional.object,
userData: ow.optional.object,
handledAt: ow.optional.any(ow.string.date, ow.date),
keepUrlFragment: ow.optional.boolean,
useExtendedUniqueKey: ow.optional.boolean,
});
ow(options, 'RequestOptions', predicate); By refactoring to the custom validation loop below, we got a 3-4 times speed increase. const requestOptionalPredicates = {
id: ow.optional.string,
loadedUrl: ow.optional.string.url,
uniqueKey: ow.optional.string,
method: ow.optional.string,
payload: ow.optional.any(ow.string, ow.buffer),
noRetry: ow.optional.boolean,
retryCount: ow.optional.number,
errorMessages: ow.optional.array.ofType(ow.string),
headers: ow.optional.object,
userData: ow.optional.object,
handledAt: ow.optional.any(ow.string.date, ow.date),
keepUrlFragment: ow.optional.boolean,
useExtendedUniqueKey: ow.optional.boolean,
};
ow(options, 'RequestOptions', ow.object);
ow(options.url, 'RequestOptions.url', ow.string.url);
Object.keys(options).forEach((prop) => {
const predicate = requestOptionalPredicates[prop];
const value = options[prop];
if (predicate) {
ow(value, `RequestOptions.${prop}`, predicate);
// 'url' is checked above because it's not optional
} else if (prop !== 'url') {
const msg = `Did not expect property \`${prop}\` to exist, got \`${value}\` in object \`RequestOptions\``;
throw new Error(msg);
}
}); Would it be possible for |
Yeah, it's slow because it needs to read the file from disk and parse it. Label inference is only called when something fails though, which should not be that often for normal function parameter validations calls, which is what Ow was originally intended for. I guess we could add a |
I think this is just an oversight that could be fixed. I don't see how it matter if they're getters/setter. |
The reason it loops over all the properties of the shape instead of all the properties of the input object is because it expects all properties of the shape to be present in the input object. If you turn it around, you can't detect that property |
@sindresorhus In the provided example, nothing fails, yet the speed of validation improves 10-fold without inference. Maybe a bug somewhere and the inference is always called? And yes, we use
Only
@SamVerschueren Yes, that's right. But for I understand this may be an edge case in your opinion, but I think it's quite typical for functions to accept options-objects or "named parameters", where one or two are mandatory and the rest is optional. |
Hi @sindresorhus @SamVerschueren just wanted to bring to your attention that the validations above are about 15 times slower in I might be able to work on this, at some point this quarter so if it's not critical for you, let me handle this in February or so. We'll keep using |
// @vladfrangu FYI |
@mnmkng That would be really great. 🙏 |
That's... odd, that shouldn't be the case, unless generating the error message causes this slowdown (which would probably make sense if your schema is very complex and there's a LOT of errors)... Any benchmarks or performance reports you could share? 🙏 Reproduction samples that I could benchmark? 😓 |
@vladfrangu See the thread above for examples. In real life, it's this test: https://github.com/apify/apify-js/blob/11ff217a2af12cbb900b6de827de710dad030786/test/request.test.js#L123 On 0.22 it runs in 450+ms for me locally, whereas with 0.21 it runs in ~30ms. |
Will try to shoot an eye later this week, could you also link me the schemas you have, please? 🙏 |
Sure, the predicates are here: https://github.com/apify/apify-js/blob/11ff217a2af12cbb900b6de827de710dad030786/src/request.js#L19 And the validation code is here: https://github.com/apify/apify-js/blob/11ff217a2af12cbb900b6de827de710dad030786/src/request.js#L107 We're using the workarounds that I explained in the above comments to speed up the validation. |
@mnmkng so.. I'm in a bit of a predicament right now, as this issue completely amazes me. Since you mentioned there's a possible performance regression in 0.22.0 (ala the "report all errors in the message" version), I reset my local master branch of ow to upstream, cloned apify, nuked all tests, except the one you mentioned, and ran it. And ran it again, and again... you get the gist. I've tinkered with ow's code, and have shaved off some ms and duplicated iterations, I've tried insane things like moving from maps to objects (and back)... So, then I decided to time it DIRECTLY in code (console.time/timeEnd, Date.now subtraction, etc). ow takes... literally between 0 and 7 ms per class constructed to validate that options is an object and options.url is a string. First run was ~7ms, subsequent were very VERY fast (like, between 0 and 2ms)... So then I don't understand why Jest is reporting it's taking almost a full second to create 1k classes and push them to one array... In fact, if I add up all the results from it. I get around Modification in `src/request.js` const start = Date.now();
ow(options, 'RequestOptions', ow.object);
ow(options.url, 'RequestOptions.url', ow.string.url);
// 'ow' validation is slow, because it checks all predicates
// even if the validated object has only 1 property.
// This custom validation loop iterates only over existing
// properties and speeds up the validation cca 3-fold.
// See https://github.com/sindresorhus/ow/issues/193
Object.keys(options).forEach((prop) => {
const predicate = requestOptionalPredicates[prop];
const value = options[prop];
if (predicate) {
ow(value, `RequestOptions.${prop}`, predicate);
// 'url' is checked above because it's not optional
} else if (prop !== 'url') {
const msg = `Did not expect property \`${prop}\` to exist, got \`${value}\` in object \`RequestOptions\``;
throw new ArgumentError({ message: msg, context: this.constructor });
}
});
console.log(`Ow finish: ${Date.now() - start}`); So then I decided to implement the same similar "benchmark" code as provided in the main issue "new" benchmarktest('should have acceptable request creation time', () => {
const requests = [];
const millis = [];
const start = Date.now();
for (let i = 0; i < 1000; i++) {
requests.push(new Apify.Request({ url: `https://example.com/${i}` }));
millis.push(Date.now() - start);
}
const total = millis.reduce((sum, number) => sum + number);
const durationMillis = total / millis.length;
console.log(durationMillis);
console.log(durationMillis);
console.log(durationMillis);
// Under normal load, the Requests are created in ~25-30ms
// In tests the load is high, so we only check if it doesn't
// overshoot some crazy range like 500ms.
expect(durationMillis).toBeLessThan(500);
}); This yielded some more consistent results, aaand... honestly, I'm lost. After removing the console.logs which caused slowdown... Results are...
At this point, I don't even know if it IS a regression or just Jest and benchmarking not being meant for each other... I've tried creating the classes themselves within the benchmark script you provided above, changing the 50 iterations to 1k, making v8 warmup by running the benchmark function 100k times before actually benchmarking it and...
If you have any other ideas of what I should try, or.. want to experiment privately, you can contact me on Discord (if you use that, Vladdy#0002) or here! But as it stands... I'm lost.. |
@vladfrangu Thanks for all the work you clearly dedicated to this, but I'm not sure how Jest is relevant. Your tables clearly show that there's a ten-fold or larger slowdown between 0.21.0 and 0.22.0 (or 23). Yes, the Jest numbers are higher, because Jest instruments code and does whatever magic, but the issue is exactly the same without Jest, as your own numbers prove. I.e. from 23ms to 271ms in one test, or from 5ms to 65ms in the other. That's a pretty big slowdown. Please correct me if I'm missing something here. |
Oh..my..god. Found the culprit. Turns out, generating the stack trace when it's not needed is QUITE AN EXPENSIVE OPERATION!! (aka I dropped from 52.73ms to 3.239ms by just making the function return null) Expect a PR soon! |
Are there some low hanging fruits to improve performance?
ow
seems to be about 15 times slower in validation than pure JS. I understand that some slowness is obviously expected, but 15 times slower makes me think this could be a bug rather than a design limitation.I'm using
[email protected]
Repro code:
It is a problem for us because we use
ow
to validate user provided objects (constructor options) and the number of those objects can be hundreds of thousands in one batch.ow
is by far the slowest part of that process.I'd gladly submit a PR if you can point me in some general direction.
Thanks.
The text was updated successfully, but these errors were encountered: