Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method for marshaling directly into a user-provided buffer. #521

Merged
merged 3 commits into from
May 12, 2024

Conversation

benluddy
Copy link
Contributor

@benluddy benluddy commented Apr 17, 2024

Description

This is an implementation of the proposal in #520.

PR Was Proposed and Welcomed in Currently Open Issue

Checklist (for code PR only, ignore for docs PR)

  • Include unit tests that cover the new code
  • Pass all unit tests
  • Pass all lint checks in CI (goimports, gosec, staticcheck, etc.)
  • Sign each commit with your real name and email.
    Last line of each commit message should be in this format:
    Signed-off-by: Firstname Lastname [email protected]
  • Certify the Developer's Certificate of Origin 1.1
    (see next section).

Certify the Developer's Certificate of Origin 1.1

  • By marking this item as completed, I certify
    the Developer Certificate of Origin 1.1.
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@benluddy
Copy link
Contributor Author

As part of this, I moved the scratch byte array from a pooled buffer field to the stack as-needed.

                                                            │ master.txt  │   user-provided-buffer-inline.txt    │
                                                            │   sec/op    │   sec/op     vs base                 │
Marshal/Go_bool_to_CBOR_bool                                  45.33n ± 0%   45.41n ± 0%  +0.19% (p=0.033 n=10)
Marshal/Go_uint64_to_CBOR_positive_int                        63.97n ± 0%   64.06n ± 0%       ~ (p=0.209 n=10)
Marshal/Go_int64_to_CBOR_negative_int                         50.56n ± 1%   50.61n ± 1%       ~ (p=0.195 n=10)
Marshal/Go_float64_to_CBOR_float                              60.98n ± 0%   61.42n ± 0%  +0.72% (p=0.001 n=10)
Marshal/Go_[]uint8_to_CBOR_bytes                              77.63n ± 0%   74.29n ± 0%  -4.30% (p=0.000 n=10)
Marshal/Go_string_to_CBOR_text                                77.50n ± 1%   76.89n ± 0%  -0.79% (p=0.000 n=10)
Marshal/Go_[]int_to_CBOR_array                                286.9n ± 0%   288.9n ± 0%  +0.71% (p=0.000 n=10)
Marshal/Go_map[string]string_to_CBOR_map                      914.8n ± 1%   876.3n ± 1%  -4.20% (p=0.000 n=10)
Marshal/Go_map[string]interface{}_to_CBOR_map                 2.233µ ± 0%   2.260µ ± 0%  +1.21% (p=0.000 n=10)
Marshal/Go_struct_to_CBOR_map                                 1.391µ ± 1%   1.395µ ± 1%       ~ (p=0.839 n=10)
Marshal/Go_map[int]interface{}_to_CBOR_map                    2.155µ ± 0%   2.188µ ± 0%  +1.51% (p=0.000 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                        1.382µ ± 1%   1.371µ ± 1%  -0.80% (p=0.000 n=10)
Marshal/Go_[]interface{}_to_CBOR_map                          1.617µ ± 0%   1.671µ ± 0%  +3.37% (p=0.000 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                       1.332µ ± 1%   1.329µ ± 1%       ~ (p=0.401 n=10)

I also tried updating the implementation of Marshal to delegated to MarshalToBuffer, which worked fine, but the benchmarks were marginally worse. I think this may just be the overhead of an extra function call being super visible in benchmark cases that are already very fast, or noise:

Marshal/Go_bool_to_CBOR_bool                                  45.33n ± 0%   46.59n ± 0%  +2.78% (p=0.000 n=10)
Marshal/Go_uint64_to_CBOR_positive_int                        63.97n ± 0%   64.98n ± 0%  +1.56% (p=0.000 n=10)
Marshal/Go_int64_to_CBOR_negative_int                         50.56n ± 1%   53.09n ± 1%  +4.98% (p=0.000 n=10)
Marshal/Go_float64_to_CBOR_float                              60.98n ± 0%   63.23n ± 0%  +3.69% (p=0.000 n=10)
Marshal/Go_[]uint8_to_CBOR_bytes                              77.63n ± 0%   74.98n ± 0%  -3.41% (p=0.000 n=10)
Marshal/Go_string_to_CBOR_text                                77.50n ± 1%   78.20n ± 0%  +0.89% (p=0.000 n=10)
Marshal/Go_[]int_to_CBOR_array                                286.9n ± 0%   286.9n ± 0%       ~ (p=0.364 n=10)
Marshal/Go_map[string]string_to_CBOR_map                      914.8n ± 1%   889.5n ± 1%  -2.77% (p=0.000 n=10)
Marshal/Go_map[string]interface{}_to_CBOR_map                 2.233µ ± 0%   2.204µ ± 1%  -1.30% (p=0.000 n=10)
Marshal/Go_struct_to_CBOR_map                                 1.391µ ± 1%   1.400µ ± 1%       ~ (p=0.108 n=10)
Marshal/Go_map[int]interface{}_to_CBOR_map                    2.155µ ± 0%   2.151µ ± 1%       ~ (p=0.342 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                        1.382µ ± 1%   1.336µ ± 0%  -3.33% (p=0.000 n=10)
Marshal/Go_[]interface{}_to_CBOR_map                          1.617µ ± 0%   1.601µ ± 0%  -1.02% (p=0.000 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                       1.332µ ± 1%   1.343µ ± 1%  +0.79% (p=0.022 n=10)

I would still rather see Marshal call MarshalToBuffer so that they both benefit from all of the existing test coverage, unless the benchmark diff is a concern.

@benluddy benluddy force-pushed the user-provided-buffer branch from 7505de0 to 176bf87 Compare April 29, 2024 16:58
@benluddy benluddy marked this pull request as ready for review April 29, 2024 16:58
benluddy added 3 commits May 10, 2024 14:25
There doesn't seem to be any performance advantage to the [16]byte scratch space on each
encoderBuffer versus stack-allocated local array variables. If anything, there may be a spatial
locality advantage to using the stack. B/op and allocs/op are unchanged, sec/op is somewhere between
a wash and very marginally better:

                                                            │ scratch-withbuffer.txt │         scratch-stack.txt          │
                                                            │         sec/op         │   sec/op     vs base               │
Marshal/Go_bool_to_CBOR_bool                                             46.77n ± 1%   47.19n ± 0%  +0.90% (p=0.000 n=10)
Marshal/Go_uint64_to_CBOR_positive_int                                   66.03n ± 1%   66.15n ± 0%       ~ (p=0.393 n=10)
Marshal/Go_int64_to_CBOR_negative_int                                    52.34n ± 1%   52.44n ± 4%       ~ (p=0.306 n=10)
Marshal/Go_float64_to_CBOR_float                                         62.95n ± 1%   63.70n ± 1%  +1.19% (p=0.000 n=10)
Marshal/Go_[]uint8_to_CBOR_bytes                                         76.89n ± 1%   76.22n ± 0%  -0.86% (p=0.000 n=10)
Marshal/Go_string_to_CBOR_text                                           79.52n ± 1%   78.86n ± 1%  -0.82% (p=0.000 n=10)
Marshal/Go_[]int_to_CBOR_array                                           294.6n ± 0%   292.8n ± 0%  -0.64% (p=0.000 n=10)
Marshal/Go_map[string]string_to_CBOR_map                                 950.4n ± 1%   923.9n ± 1%  -2.79% (p=0.000 n=10)
Marshal/Go_map[string]interface{}_to_CBOR_map                            2.300µ ± 1%   2.310µ ± 0%       ~ (p=0.137 n=10)
Marshal/Go_struct_to_CBOR_map                                            1.422µ ± 1%   1.409µ ± 1%  -0.88% (p=0.006 n=10)
Marshal/Go_map[int]interface{}_to_CBOR_map                               2.207µ ± 0%   2.196µ ± 1%  -0.50% (p=0.004 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                                   1.394µ ± 0%   1.397µ ± 0%       ~ (p=0.050 n=10)
Marshal/Go_[]interface{}_to_CBOR_map                                     1.676µ ± 0%   1.632µ ± 1%  -2.63% (p=0.000 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                                  1.370µ ± 1%   1.357µ ± 1%  -0.99% (p=0.000 n=10)
MarshalCanonical/Go_map[string]string_to_CBOR_map                        947.9n ± 1%   918.3n ± 1%  -3.12% (p=0.000 n=10)
MarshalCanonical/Go_map[string]string_to_CBOR_map_canonical              1.533µ ± 1%   1.503µ ± 2%  -1.96% (p=0.002 n=10)
MarshalCanonical/Go_struct_to_CBOR_map                                   307.9n ± 0%   309.0n ± 1%  +0.34% (p=0.014 n=10)
MarshalCanonical/Go_struct_to_CBOR_map_canonical                         309.9n ± 0%   309.4n ± 1%       ~ (p=1.000 n=10)
MarshalCanonical/Go_map[int]int_to_CBOR_map                              196.7n ± 0%   198.2n ± 0%  +0.76% (p=0.000 n=10)
MarshalCanonical/Go_map[int]int_to_CBOR_map_canonical                    197.6n ± 1%   198.6n ± 1%  +0.51% (p=0.005 n=10)
MarshalCOSE/128-Bit_Symmetric_Key                                        340.3n ± 0%   341.6n ± 0%  +0.38% (p=0.001 n=10)
MarshalCOSE/256-Bit_Symmetric_Key                                        344.0n ± 0%   344.1n ± 1%       ~ (p=0.540 n=10)
MarshalCOSE/ECDSA_P256_256-Bit_Key                                       433.2n ± 0%   438.1n ± 1%  +1.12% (p=0.000 n=10)
MarshalCWTClaims                                                         296.5n ± 0%   298.9n ± 1%  +0.83% (p=0.003 n=10)
MarshalSenML                                                             1.522µ ± 0%   1.524µ ± 1%       ~ (p=0.697 n=10)
MarshalSenMLShortestFloat16                                              1.534µ ± 1%   1.557µ ± 1%  +1.53% (p=0.001 n=10)
MarshalWebAuthn                                                          406.5n ± 0%   409.4n ± 1%  +0.71% (p=0.007 n=10)
MarshalCOSEMAC                                                           319.1n ± 0%   319.6n ± 1%       ~ (p=0.446 n=10)
MarshalCOSEMACWithTag                                                    399.8n ± 0%   397.9n ± 0%  -0.49% (p=0.007 n=10)
geomean                                                                  404.3n        403.5n       -0.22%

Signed-off-by: Ben Luddy <[email protected]>
With its only remaining field being an embedded bytes.Buffer, there's no reason to retain the
encoderBuffer type.

Signed-off-by: Ben Luddy <[email protected]>
The application calling Marshal has more information about the object it wants to marshal than this
library does (e.g. what type is it, what is the 95th-percentile serialized size of similar objects,
etc.). It can use this information to increase the utilization of encode buffers beyond what is
possible with a shared buffer pool for all calls to Marshal.

To avoid the backwards-incompatible change of expanding the method set of the EncMode interface,
invoking this method from an external package will require a type assertion, and there is currently
no exported "optional interface" declaration to make that more convenient.

Signed-off-by: Ben Luddy <[email protected]>
@benluddy benluddy force-pushed the user-provided-buffer branch from 176bf87 to 045f722 Compare May 10, 2024 18:45
Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benluddy Thanks for opening this PR! It addresses a valid concern of inefficient use of cached buffer when there are large size and frequency disparities. The solution in this PR is simple and non-intrusive. 👍

I would still rather see Marshal call MarshalToBuffer so that they both benefit from all of the existing test coverage, unless the benchmark diff is a concern.

The benchmark diff is a concern, and I think you are right that diff is caused by overhead of an extra function call.

I can add tests later to call MarshalToBuffer explicitly to increase test coverage.

A new interface interface{ MarshalToBuffer(interface{}, *bytes.Buffer) error } could be declared for convenience. The interface might initially be documented as subject to change.

Yes, it would be great to create a new interface for the new function.

I can open a PR to add the new interface, along with tests I mentioned earlier. For the v2.7.0 release, some new functions or interfaces can be marked as "preview" and "subject to change".

Thanks again for your contribution!

@fxamacker fxamacker merged commit 742c956 into fxamacker:master May 12, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants