Skip to content

Change randomized prefix to contain only hex characters#13799

Merged
arhimondr merged 1 commit intotrinodb:masterfrom
arhimondr:reduce-partitioning-exchange-key-space
Aug 23, 2022
Merged

Change randomized prefix to contain only hex characters#13799
arhimondr merged 1 commit intotrinodb:masterfrom
arhimondr:reduce-partitioning-exchange-key-space

Conversation

@arhimondr
Copy link
Copy Markdown
Contributor

@arhimondr arhimondr commented Aug 23, 2022

Some storage systems prefer the prefix to be hexadecimal characters

No release notes entries required

@arhimondr arhimondr requested a review from losipiuk August 23, 2022 17:00
@cla-bot cla-bot bot added the cla-signed label Aug 23, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why this exact alphabet is important.

Copy link
Copy Markdown
Contributor Author

@arhimondr arhimondr Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certain storage systems prefer the prefix to be hexadecimal. Discussed offline.

Copy link
Copy Markdown
Member

@electrum electrum Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to increase this now? This takes us from 2176782336 combinations to only 16777216, or 0.0077% of the previous, making the chance of collisions much higher. What are the implications of a collision?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 enforces request limits per prefix (https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/). The request limits are 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests. The space of 16777216 distinct prefixes should be sufficient.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these assumptions be persisted as a code comment, documenting why 6 is still sufficient prefix length?

@electrum
Copy link
Copy Markdown
Member

Since we're enforcing hex, how about simplifying the generation function:

private static final int RANDOMIZED_PREFIX_BYTES = 3;

private static String generateRandomizedPrefix()
{
    byte[] bytes = new byte[RANDOMIZED_PREFIX_BYTES];
    ThreadLocalRandom.current().nextBytes(bytes);
    return BaseEncoding.base16().encode(bytes);
}

Some storage systems prefer the prefix to be hexadecimal characters
@arhimondr arhimondr force-pushed the reduce-partitioning-exchange-key-space branch from e137c10 to 2c988a5 Compare August 23, 2022 18:15
@arhimondr arhimondr changed the title Reduce randomized prefix key space in FileSystemExchange Change randomized prefix to contain only hex characters Aug 23, 2022
@arhimondr
Copy link
Copy Markdown
Contributor Author

@electrum Having the custom function avoids unnecessary allocations and reduces uncertainty whether a lower case or upper case characters are used

@arhimondr arhimondr merged commit 624fedb into trinodb:master Aug 23, 2022
@github-actions github-actions bot added this to the 394 milestone Aug 23, 2022
@arhimondr arhimondr deleted the reduce-partitioning-exchange-key-space branch August 24, 2022 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants