Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Improve Binary Encode Performance #964

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Ekrekr
Copy link

@Ekrekr Ekrekr commented Aug 23, 2024

Work in progress - I haven't yet come up with a way to efficiently fork a new message directly within the contigious memory space, for length delimited records.

Efficiency optimisations through avoiding copying of array buffers, by writing directly to an expanding array buffer with dynamic contiguous memory space.

TODO: add performance tests and flamegraphs once working.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Ekrekr Ekrekr changed the title Improve Binary Encode Performance (WIP) Improve Binary Encode Performance Aug 23, 2024
Copy link
Member

@timostamm timostamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, Elias!

Unfortunately, resizable ArrayBuffers are relatively new. We'll have to wait until they are more widely available before we can use them.

I left a couple of informative comments.

Comment on lines +195 to +197
// NodeJS strings are by default UTF-8, so we can assume the byte length as the length of
// the string.
const valueBytesLength = value.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this won't work. Strings are UTF-16, and length returns the number of code units.

@@ -80,7 +80,7 @@ function writeFields(
}
if (opts.writeUnknownFields) {
for (const { no, wireType, data } of msg.getUnknown() ?? []) {
writer.tag(no, wireType).raw(data);
writer.tag(no, wireType).bytes(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes method prefixes the length. The change will corrupt data, and needs to be reverted.

for (const item of list) {
writeScalarValue(writer, scalarType, item as ScalarValue);
}
writer.join();
writer.tag(field.number, WireType.LengthDelimited);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fork() and join() will take everything between, and write it length-prefixed. The tag needs to come first, then the length prefix, then the data.

break;
}
writer.join();
writer.tag(field.number, WireType.LengthDelimited);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as L178.

Comment on lines +144 to +146
// TODO(ekrekr): this is really slow, because it has to allocate a whole new array buffer.
// Instead we should be writing the message directly to the original arraybuffer, then inserting
// the length beforehand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Unfortunately, varints have variable length, so this isn't straight-forward.

/**
* Encode UTF-8 text to an existing binary.
*/
encodeInto: (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a breaking change to add a mandatory property to the interface, so we'll have to find other means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants