Open
Description
Despite the GPU of the openmp-offload-cuda-project and openmp-offload-cuda-runtime buildbots having 4 GiB of memory, the default device heap size as returned by cuCtxGetLimit(CU_LIMIT_MALLOC_HEAP_SIZE, ...)
is 8388608 (8 MiB), while offloading/malloc.c requires ~55MB and offloading/malloc_parallel.c even more.
I don't know how CUDA determines the default heap size limit, but I assume it is constant and inherited from the earliest days of CUDA.
To fix this, we either
- limit the amount of heap allocated by any test to 8 MiB (e.g. reducing the number of teams in parallel.c to 48), or
- set
LIBOMPTARGET_HEAP_SIZE
to the maximum heap size allocated by any test. This patch fixes the two malloc tests, reducing the number of failed tests to 30:
diff --git a/openmp/libomptarget/test/lit.cfg b/openmp/libomptarget/test/lit.cfg
index 6dab31bd35a9..e288827c50f6 100644
--- a/openmp/libomptarget/test/lit.cfg
+++ b/openmp/libomptarget/test/lit.cfg
@@ -31,6 +31,8 @@ if 'LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS' in os.environ:
if 'OMP_TARGET_OFFLOAD' in os.environ:
config.environment['OMP_TARGET_OFFLOAD'] = os.environ['OMP_TARGET_OFFLOAD']
+config.environment['LIBOMPTARGET_HEAP_SIZE'] = '134217728' # 128 MiB
+
# set default environment variables for test
if 'CHECK_OPENMP_ENV' in os.environ:
test_env = os.environ['CHECK_OPENMP_ENV'].split()
A 64 MiB heap is sufficient for the malloc.c test, but not for malloc_parallel.c.