Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast base58 codec #4327

Merged
merged 1 commit into from
Mar 5, 2024
Merged

Fast base58 codec #4327

merged 1 commit into from
Mar 5, 2024

Conversation

seelabs
Copy link
Collaborator

@seelabs seelabs commented Oct 19, 2022

This algorithm is about an order of magnitude faster than the existing algorithm (about 10x faster for encoding and about 15x faster for decoding - including the double hash for the checksum). The algorithms use gcc's int128 (fast MS version will have to wait, in the meantime MS falls back to the slow code).

@HowardHinnant
Copy link
Contributor

Can a test be made to run on !_MSC_VER that ensures b58_fast::encodeBase58Token gives the same results as b58_ref::encodeBase58Token (et al.)?

@seelabs
Copy link
Collaborator Author

seelabs commented Nov 16, 2022

@HowardHinnant Added tests in 0d7ee11

Copy link
Collaborator

@thejohnfreeman thejohnfreeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithms use gcc's int128 (fast MS version will have to wait, in the meantime MS falls back to the slow code).

Can we add this explanation in a comment somewhere in the code, perhaps near one of the #ifndef _MSC_VER?

I'm submitting this partial review before I switch gears to another. I will return to finish later.

src/ripple/protocol/impl/b58_utils.h Outdated Show resolved Hide resolved
src/ripple/protocol/impl/b58_utils.h Outdated Show resolved Hide resolved
src/ripple/protocol/impl/tokens.cpp Show resolved Hide resolved
src/ripple/protocol/impl/tokens.cpp Outdated Show resolved Hide resolved
src/ripple/protocol/impl/tokens.cpp Outdated Show resolved Hide resolved
src/ripple/protocol/impl/tokens.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very impressive piece of engineering and the speedup is note worthy.

My biggest concern is that there may only be one person in the world who can maintain this code: @seelabs. I can make my way though the individual pieces and make sense of each part locally. But all the pieces work together to build a machine which, in its current form, I find incomprehensible.

I think the maintainability problem can be fixed by the addition of some paragraph comments describing the approach. Those paragraphs should also include pointers to background information regarding where you got the inspiration for this approach (further reading).

Most of my comments are nits which can be taken or not. The essential change is the addition of the paragraph comments describing how the whole machine works.

As an aside, I know there has been discussion of being able to remove caching once this code is in the code base. I'm hesitating on that, since removing the caching would have a negative impact on people running on Windows. Just a future consideration.

src/ripple/protocol/impl/b58_utils.h Show resolved Hide resolved
src/ripple/protocol/impl/b58_utils.h Outdated Show resolved Hide resolved
src/ripple/protocol/impl/b58_utils.h Outdated Show resolved Hide resolved
src/ripple/protocol/impl/b58_utils.h Outdated Show resolved Hide resolved
src/ripple/protocol/impl/b58_utils.h Outdated Show resolved Hide resolved
src/test/basics/base58_test.cpp Outdated Show resolved Hide resolved
src/ripple/protocol/impl/b58_utils.h Show resolved Hide resolved
src/ripple/protocol/impl/tokens.cpp Show resolved Hide resolved
src/ripple/protocol/impl/tokens.cpp Outdated Show resolved Hide resolved
src/ripple/protocol/tokens.h Outdated Show resolved Hide resolved
@intelliot intelliot added this to the 1.10.1 milestone Mar 4, 2023
@scottschurr
Copy link
Collaborator

Huh. Just FYI, I ran across a couple of MSVC intrinsics today: _udiv128 and _umul128. They are both available in MSVC 2017 which is probably the oldest compiler MS compiler we might support. And, yeah, they are intrinsics, not data types, so the mapping would be a bit awkward.

If you can find a way to make this code work on all three platforms then it becomes much more likely we can remove the caching. Just a thought.

@seelabs
Copy link
Collaborator Author

seelabs commented Mar 23, 2023

I rebased onto the latest develop and pushed a patch addressing comments.

@scottschurr FYI, you say:
If you can find a way to make this code work on all three platforms then it becomes much more likely we can remove the caching. Just a thought.

If windows was the only thing holding us back from removing caching I'd be very comfortable removing the caching. We explicitly say windows is not meant for production.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @seelabs.

Unfortunately the most important change was overlooked. We still need a big paragraph(s) comment describing how and why the algorithm works. Without that there isn't anyone in the world who can maintain the code besides you.

Regarding removing caching once this is in... I'm a little worried about leaving the Windows developers out in the cold. Using the old algorithm and not caching the results might make it so some Windows boxes no longer sync. Yeah, Windows is not a supported platform. But if the box can't sync then code developed on the platform can't be tested.

I'm not dead set either way. I just want to raise that as a concern.

src/ripple/protocol/impl/tokens.cpp Outdated Show resolved Hide resolved
src/test/basics/base58_test.cpp Outdated Show resolved Hide resolved
src/ripple/protocol/tokens.h Outdated Show resolved Hide resolved
src/test/basics/base58_test.cpp Outdated Show resolved Hide resolved
@intelliot
Copy link
Collaborator

@seelabs would you be able to satisfy Scott S's request (specifically, the need for "big paragraph(s) comment describing how and why the algorithm works")?

@seelabs
Copy link
Collaborator Author

seelabs commented May 1, 2023

@intelliot Yes, I'll write something up.

@intelliot intelliot requested a review from sophiax851 January 8, 2024 22:47
@intelliot intelliot added Perf Attn Needed Attention needed from RippleX Performance Team and removed Performance/Resource Improvement labels Jan 8, 2024
@intelliot
Copy link
Collaborator

Internal tracker: RPFC-78

@HowardHinnant HowardHinnant self-requested a review January 12, 2024 17:35
@seelabs
Copy link
Collaborator Author

seelabs commented Jan 19, 2024

Squashed and rebased onto the latest develop branch

@seelabs seelabs added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jan 19, 2024
@intelliot
Copy link
Collaborator

intelliot commented Feb 9, 2024

Perf team report:

A test case was set up with 25K accounts, 50 TPS, and run for 500 seconds.

Test 1: Run comparison tests for 2000 seconds with 50 TPS for 1 client handler (CH) server of API load for both encoders using 100K accounts, compare the results.

Test 2: Run test immediately after the 2000 seconds on the same accounts to see the cache effectiveness.

The test criteria is the first test to ensure there’s no performance response time regression for RPC calls if we change the encoder to fast base58.

Without cache, compared between the PR and rippled 2.0.1, the average response time for the account_info improves from 0.81 ms to 0.73 - that’s around ~10% increase.

Compared between non-cache and cache, for rippled 2.0.1, the response time is 0.81 vs 0.72. For fast base58 PR, the response time is 0.73 vs 0.66.

@seelabs seelabs added the Perf SignedOff RippleX Performance Team has approved label Feb 26, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 78.26087% with 60 lines in your changes missing coverage. Please review.

Project coverage is 61.65%. Comparing base (62dae3c) to head (ffc2be6).
Report is 139 commits behind head on develop.

Files with missing lines Patch % Lines
src/ripple/protocol/impl/token_errors.h 13.79% 24 Missing and 1 partial ⚠️
src/ripple/protocol/impl/tokens.cpp 89.44% 9 Missing and 10 partials ⚠️
src/ripple/protocol/impl/b58_utils.h 76.11% 6 Missing and 10 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4327      +/-   ##
===========================================
+ Coverage    61.59%   61.65%   +0.06%     
===========================================
  Files          804      806       +2     
  Lines        70686    70963     +277     
  Branches     36580    36686     +106     
===========================================
+ Hits         43537    43753     +216     
- Misses       19814    19854      +40     
- Partials      7335     7356      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ximinez ximinez removed the Perf Attn Needed Attention needed from RippleX Performance Team label Feb 29, 2024
seelabs added a commit to seelabs/rippled that referenced this pull request Mar 4, 2024
This algorithm is about an order of magnitude faster than the existing
algorithm (about 10x faster for encoding and about 15x faster for
decoding - including the double hash for the checksum). The algorithms
use gcc's int128 (fast MS version will have to wait, in the meantime MS
falls back to the slow code).
Comment on lines +82 to +92
std::uint64_t carry;
std::tie(a[0], carry) = carrying_add(a[0], b);

for (auto& v : a.subspan(1))
{
if (!carry)
{
return;
}
std::tie(v, carry) = carrying_add(v, 1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own enjoyment, is that the same as this?

std::uint64_t carry = b;
for (auto& v : a)
{
    if (!carry)
    {
        return;
    }
    std::tie(v, carry) = carrying_add(v, carry);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks equivalent. I don't object to that formulations, but I also don't think it's worth having the other reviewers go back and confirm the change if I did change it. The current code is also clear.

Comment on lines +150 to +159
std::uint64_t prev_rem = 0;
int const last_index = numerator.size() - 1;
std::tie(numerator[last_index], prev_rem) =
div_rem(numerator[last_index], divisor);
for (int i = last_index - 1; i >= 0; --i)
{
unsigned __int128 const cur_num = to_u128(prev_rem, numerator[i]);
std::tie(numerator[i], prev_rem) = div_rem_64(cur_num, divisor);
}
return prev_rem;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as this?

std::uint64_t carry = 0;
for (auto i = numerator.rbegin(); i != numerator.rend(); ++i) {
    unsigned __int128 const num = to_u128(carry, *i);
    std::tie(*i, carry) = div_rem_64(num, divisor);
}
return carry;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks equivalent - and I like that code (although I'd probably rename carry to rem). Although I don't think I'll change it at this point.

```

For example, in base 10, the number 437 represents the integer 4*10^2 + 3*10^1 +
7*10^0. In base 16, 437 is the same as 4*16^2 + 3*16^1 7*16^0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
7*10^0. In base 16, 437 is the same as 4*16^2 + 3*16^1 7*16^0.
7*10^0. In base 16, 437 is the same as 4*16^2 + 3*16^1 + 7*16^0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

namespace b58_fast {
namespace detail {
B58Result<std::span<std::uint8_t>>
b256_to_b58(std::span<std::uint8_t const> input, std::span<std::uint8_t> out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see base and endianness info in either the type name (my preference) or variable name, e.g.

b256be_to_b58le(b256be const input, b58le out)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed - although in this function both the input and output are big endian. I changed the name to b256_to_b58_be and added a comment. Will push the patch shortly.

// The largest object encoded as base58 is 33 bytes; This will be encoded in
// at most ceil(log(2^256,58)) bytes, or 46 bytes. 64 is plenty (and there's
// not real benefit making it smaller)
sr.resize(128);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to resize(64) here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the comment. I think I do want 128 there (a single byte can be encoded in more than one base 58 char). Will push the patch shortly and you can take a look.

auto r = b58_fast::encodeBase58Token(type, inSp, outSp);
if (!r)
return {};
sr.resize(r.value().size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't guaranteed to reallocate, is it? If it is, it might be worth finding a way to allocate once, even at the cost of a few wasted bytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sr is initially over-allocated and this is reducing the size. I don't know if the standard guarantees that reducing the size won't reallocate, but in practice I'd bet it will never allocate a smaller buffer and then move the data (that's what shrink_to_fit is for - the existence of that gives me even more confidence that this won't allocate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seelabs Seems we've been making some changes? Do you think we need to redo the perf testing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes have any impact on performance.

This algorithm is about an order of magnitude faster than the existing
algorithm (about 10x faster for encoding and about 15x faster for
decoding - including the double hash for the checksum). The algorithms
use gcc's int128 (fast MS version will have to wait, in the meantime MS
falls back to the slow code).
@seelabs
Copy link
Collaborator Author

seelabs commented Mar 5, 2024

Squashed and rebased. Will merge after CI runs

@seelabs seelabs merged commit cce09b7 into XRPLF:develop Mar 5, 2024
17 checks passed
@ximinez ximinez mentioned this pull request Mar 6, 2024
1 task
@ximinez ximinez mentioned this pull request Apr 4, 2024
1 task
legleux added a commit to legleux/rippled that referenced this pull request Apr 12, 2024
* Price Oracle (XLS-47d): (XRPLF#4789) (XRPLF#4789)

Implement native support for Price Oracles.

 A Price Oracle is used to bring real-world data, such as market prices,
 onto the blockchain, enabling dApps to access and utilize information
 that resides outside the blockchain.

 Add Price Oracle functionality:
 - OracleSet: create or update the Oracle object
 - OracleDelete: delete the Oracle object

 To support this functionality add:
 - New RPC method, `get_aggregate_price`, to calculate aggregate price for a token pair of the specified oracles
 - `ltOracle` object

 The `ltOracle` object maintains:
 - Oracle Owner's account
 - Oracle's metadata
 - Up to ten token pairs with the scaled price
 - The last update time the token pairs were updated

 Add Oracle unit-tests

* fix compile error on gcc 13: (XRPLF#4932)

The compilation fails due to an issue in the initializer list
of an optional argument, which holds a vector of pairs.
The code compiles correctly on earlier gcc versions, but fails on gcc 13.

* Set version to 2.2.0-b1

* Remove default ctors from SecretKey and PublicKey: (XRPLF#4607)

* It is now an invariant that all constructed Public Keys are valid,
  non-empty and contain 33 bytes of data.
* Additionally, the memory footprint of the PublicKey class is reduced.
  The size_ data member is declared as static.
* Distinguish and identify the PublisherList retrieved from the local
  config file, versus the ones obtained from other validators.
* Fixes XRPLF#2942

* Fast base58 codec: (XRPLF#4327)

This algorithm is about an order of magnitude faster than the existing
algorithm (about 10x faster for encoding and about 15x faster for
decoding - including the double hash for the checksum). The algorithms
use gcc's int128 (fast MS version will have to wait, in the meantime MS
falls back to the slow code).

* feat: add user version of `feature` RPC (XRPLF#4781)

* uses same formatting as admin RPC
* hides potentially sensitive data

* build: add STCurrency.h to xrpl_core to fix clio build (XRPLF#4939)

* Embed patched recipe for RocksDB 6.29.5 (XRPLF#4947)

* fix: order book update variable swap: (XRPLF#4890)

This is likely the result of a typo when the code was simplified.

* Fix workflows (XRPLF#4948)

The problem was `CONAN_USERNAME` environment variable, which Conan 1.x uses as the default user in package references.

* Upgrade to xxhash 0.8.2 as a Conan requirement, enable SIMD hashing (XRPLF#4893)

We are currently using old version 0.6.2 of `xxhash`, as a verbatim copy and paste of its header file `xxhash.h`. Switch to the more recent version 0.8.2. Since this version is in Conan Center (and properly protects its ABI by keeping the state object incomplete), add it as a Conan requirement. Switch to the SIMD instructions (in the new `XXH3` family) supported by the new version.

* Update remaining actions (XRPLF#4949)

Downgrade {upload,download}-artifact action to v3 because of unreliability with v4.

* Install more public headers (XRPLF#4940)

Fixes some mistakes in XRPLF#4885

* test: Env unit test RPC errors return a unique result: (XRPLF#4877)

* telENV_RPC_FAILED is a new code, reserved exclusively
  for unit tests when RPC fails. This will
  make those types of errors distinct and easier to test
  for when expected and/or diagnose when not.
* Output RPC command result when result is not expected.

* Fix workflows (XRPLF#4951)

- Update container for Doxygen workflow. Matches Linux workflow, with newer GLIBC version required by newer actions.
- Fixes macOS workflow to install and configure Conan correctly. Still fails on tests, but that does not seem attributable to the workflow.

* perf: improve `account_tx` SQL query: (XRPLF#4955)

The witness server makes heavily use of the `account_tx` RPC command. Perf
testing showed that the SQL query used by `account_tx` became unacceptably slow
when the DB was large and there was a `marker` parameter. The plan for the query
showed only indexed reads. This appears to be an issue with the internal SQLite
optimizer. This patch rewrote the query to use `UNION` instead of `OR` and
significantly improves performance. See RXI-896 and RIPD-1847 for more details.

* `fixEmptyDID`: fix amendment to handle empty DID edge case: (XRPLF#4950)

This amendment fixes an edge case where an empty DID object can be
created. It adds an additional check to ensure that DIDs are
non-empty when created, and returns a `tecEMPTY_DID` error if the DID
would be empty.

* Enforce no duplicate slots from incoming connections: (XRPLF#4944)

We do not currently enforce that incoming peer connection does not have
remote_endpoint which is already used (either by incoming or outgoing
connection), hence already stored in slots_. If we happen to receive a
connection from such a duplicate remote_endpoint, it will eventually result in a
crash (when disconnecting) or weird behavior (when updating slot state), as a
result of an apparently matching remote_endpoint in slots_ being used by a
different connection.

* Remove zaphod.alloy.ee hub from default server list: (XRPLF#4903)

Remove the zaphod.alloy.ee hubs from the bootstrap and default configuration after 5 years. It has been an honor to run these servers, but it is now time for another entity to step into this role.

The zaphod servers will be taken offline in a phased manner keeping all those who have peering arrangements informed.

These would be the preferred attributes of a boostrap set of hubs:

    1. Commitment to run the hubs for a minimum of 2 years
    2. Highly available
    3. Geographically dispersed
    4. Secure and up to date
    5. Committed to ensure that peering information is kept private

* Write improved `forAllApiVersions` used in NetworkOPs (XRPLF#4833)

* Don't reach consensus as quickly if no other proposals seen: (XRPLF#4763)

This fixes a case where a peer can desync under a certain timing
circumstance--if it reaches a certain point in consensus before it receives
proposals. 

This was noticed under high transaction volumes. Namely, when we arrive at the
point of deciding whether consensus is reached after minimum establish phase
duration but before having received any proposals. This could be caused by
finishing the previous round slightly faster and/or having some delay in
receiving proposals. Existing behavior arrives at consensus immediately after
the minimum establish duration with no proposals. This causes us to desync
because we then close a non-validated ledger. The change in this PR causes us to
wait for a configured threshold before making the decision to arrive at
consensus with no proposals. This allows validators to catch up and for brief
delays in receiving proposals to be absorbed. There should be no drawback since,
with no proposals coming in, we needn't be in a huge rush to jump ahead.

* fixXChainRewardRounding: round reward shares down: (XRPLF#4933)

When calculating reward shares, the amount should always be rounded
down. If the `fixUniversalNumber` amendment is not active, this works
correctly. If it is not active, then the amount is incorrectly rounded
up. This patch introduces an amendment so it will be rounded down.

* Remove unused files

* Remove packaging scripts

* Consolidate external libraries

* Simplify protobuf generation

* Rename .hpp to .h

* Format formerly .hpp files

* Rewrite includes

$ find src/ripple/ src/test/ -type f -exec sed -i 's:include\s*["<]ripple/\(.*\)\.h\(pp\)\?[">]:include <ripple/\1.h>:' {} +

* Fix source lists

* Add markers around source lists

* fix: improper handling of large synthetic AMM offers:

A large synthetic offer was not handled correctly in the payment engine.
This patch fixes that issue and introduces a new invariant check while
processing synthetic offers.

* Set version to 2.1.1

* chore: change Github Action triggers for build/test jobs (XRPLF#4956)

Github Actions for the build/test jobs (nix.yml, mac.yml, windows.yml) will only run on branches that build packages (develop, release, master), and branches with names starting with "ci/". This is intended as a compromise between disabling CI jobs on personal forks entirely, and having the jobs run as a free-for-all. Note that it will not affect PR jobs at all.

* Address compiler warnings

* Fix search for protoc

* chore: Default validator-keys-tool to master branch: (XRPLF#4943)

* master is the default branch for that project. There's no point in
  using develop.

* Remove unused lambdas from MultiApiJson_test

* fix Conan component reference typo

* Set version to 2.2.0-b2

* bump version

* 2.2.3

* 2.2.4

* 2.2.5

---------

Co-authored-by: Gregory Tsipenyuk <[email protected]>
Co-authored-by: seelabs <[email protected]>
Co-authored-by: Chenna Keshava B S <[email protected]>
Co-authored-by: Mayukha Vadari <[email protected]>
Co-authored-by: John Freeman <[email protected]>
Co-authored-by: Bronek Kozicki <[email protected]>
Co-authored-by: Ed Hennis <[email protected]>
Co-authored-by: Olek <[email protected]>
Co-authored-by: Alloy Networks <[email protected]>
Co-authored-by: Mark Travis <[email protected]>
Co-authored-by: Gregory Tsipenyuk <[email protected]>
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
This algorithm is about an order of magnitude faster than the existing
algorithm (about 10x faster for encoding and about 15x faster for
decoding - including the double hash for the checksum). The algorithms
use gcc's int128 (fast MS version will have to wait, in the meantime MS
falls back to the slow code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. Perf SignedOff RippleX Performance Team has approved Testable
Projects
Status: 🚢 Released in 2.2.0
Development

Successfully merging this pull request may close these issues.

10 participants