This repository was archived by the owner on Mar 21, 2024. It is now read-only.
This repository was archived by the owner on Mar 21, 2024. It is now read-only.
thrust::reduce_by_key error for 2^31 elements #1609
Closed
Description
Tested environment
GA102 (48 GB GPU memory)
CUDA 11.4
Thrust 1.15.0
The code below failed inside the 'reduce_by_key' call with a memory allocation failure; tried to allocate 18446744073694639872 bytes (16 EB) and failed as my system has only 48 GB. This sounds like a bug related to 32 bit signed integer overflow.
cudaDeviceSynchronize();
std::cout << "test thrust reduce_by_key START" << std::endl;
constexpr size_t num_elements = std::size_t{1} << 31;
constexpr int32_t max_key = 8;
thrust::device_vector<int32_t> int_values(num_elements);
thrust::tabulate(thrust::device, int_values.begin(), int_values.end(), [max_key]__device__(auto i) {
return static_cast<int32_t>(i % max_key);
});
thrust::sort(thrust::device, int_values.begin(), int_values.end());
thrust::device_vector<int32_t> keys(max_key);
thrust::device_vector<size_t> values(max_key);
thrust::reduce_by_key(thrust::device, int_values.begin(), int_values.end(), thrust::make_constant_iterator(size_t{1}), keys.data(), values.data());
cudaDeviceSynchronize();
std::cout << "test thrust reduce_by_key END" << std::endl;