Skip to content

Conversation

@sleeepyjack
Copy link
Collaborator

This PR is part 1/5 of the Bloom filter optimization project and must be merged in the correct order.

This PR introduces two changes that reduce the noise level of the benchmark measurements drastically:

  • Generate input data on-the-fly rather than loading the keys from global memory.
  • Increase the number of input keys to make the kernels run longer (former runtimes were in the single-digit ms range which was too noisy).

@sleeepyjack sleeepyjack added type: improvement Improvement / enhancement to an existing function topic: bloom_filter Issues related to bloom_filter labels Feb 12, 2025
@sleeepyjack sleeepyjack self-assigned this Feb 12, 2025
using BF_WORD = nvbench::uint32_t;

static constexpr auto BF_N = 400'000'000;
static constexpr auto BF_N = 1'000'000'000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright years to be updated otherwise looks good

@sleeepyjack sleeepyjack merged commit ac4ba6b into NVIDIA:dev Feb 13, 2025
18 checks passed
@sleeepyjack sleeepyjack deleted the bf-bench branch February 13, 2025 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic: bloom_filter Issues related to bloom_filter type: improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants