Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

cub::DeviceHistogram::HistogramEven: Incorrect result with integer levels and samples #489

Closed
Melirius opened this issue May 21, 2022 · 1 comment · Fixed by #487
Closed
Assignees
Labels
P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.
Milestone

Comments

@Melirius
Copy link

The problem appears when upper_level - lower_level is not exactly divisible by number of bins. Appears in 1.16 and in #487 too.

Reproduction: the program as is provides [1, 0, 4, 1, 0, 0, 2] as a histogram, but correct one (obtainable by switching to using T = float;) is [1, 5, 0, 2, 1, 0, 0].

#include <cstdlib>
#include <vector>

#include <cub/device/device_histogram.cuh>//<cub/cub.cuh>   // or equivalently <cub/device/device_histogram.cuh>
#include <cuda/std/type_traits>

//using T = float;
using T = int;

// Declare, allocate, and initialize device-accessible pointers for input samples and
// output histogram
int      num_samples = 10;//32 << 20;    // e.g., 10
T*       d_samples;      // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, 0.3, 2.9, 2.1, 6.1, 999.5]
int*     d_histogram;    // e.g., [ -, -, -, -, -, -]
int      num_levels = 8;     // e.g., 7       (seven level boundaries for six bins)
T        lower_level = 0.0;    // e.g., 0.0     (lower sample value boundary of lowest bin)
T        upper_level = 12.0;    // e.g., 12.0    (upper sample value boundary of upper bin)

int main()
{
    std::vector<T> s {2, 6, 7, 2, 3, 0, 2, 2, 6, 999};
    std::vector<int> h(num_levels - 1);

    s.resize(num_samples);
    cudaMalloc(&d_samples, num_samples * sizeof(T));
    cudaMalloc(&d_histogram, (num_levels - 1) * sizeof(int));
    cudaMemcpy(d_samples, s.data(), num_samples * sizeof(T), cudaMemcpyHostToDevice);

    // Determine temporary device storage requirements
    void*    d_temp_storage = NULL;
    size_t   temp_storage_bytes = 0;
    cub::DeviceHistogram::HistogramEven(d_temp_storage, temp_storage_bytes,
        d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples);

    // Allocate temporary storage
    cudaMalloc(&d_temp_storage, temp_storage_bytes);

    // Compute histograms
    cub::DeviceHistogram::HistogramEven(d_temp_storage, temp_storage_bytes,
        d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples);

    // d_histogram   <-- [1, 5, 0, 2, 1, 0, 0];

    cudaMemcpy(h.data(), d_histogram, (num_levels - 1) * sizeof(int), cudaMemcpyDeviceToHost);

    for(int a : h)
        std::cout << a << ", ";

    std::cout << std::endl;

    return 0;
}
@Melirius
Copy link
Author

Melirius commented May 21, 2022

Instead of bin = static_cast<int>((common_sample - min) / scale); it should be bin = static_cast<int>((common_sample - min) * bins / (max - min)); as the scale is of integral type and is therefore wrong. The problem that I can foresee is an overflow in multiplication, that can be avoided by extending multiplication arithmetics to 96 bits. Division should also be long, however, it is not so much difficult as result cannot be larger then 32 bits.

@alliepiper alliepiper self-assigned this Aug 4, 2022
@alliepiper alliepiper added type: bug: functional Does not work as intended. P1: should have Necessary, but not critical. labels Aug 4, 2022
@alliepiper alliepiper added this to the 2.0.0 milestone Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants