Open
Description
We've observed that running the openmp tests on s390x is sometimes extremely slow, for example they ran for more than 8 hours here:
Slowest Tests:
--------------------------------------------------------------------------
30238.12s: libomp :: worksharing/for/omp_for_collapse.c
25670.15s: libomp :: env/kmp_set_dispatch_buf.c
14940.87s: libomp :: worksharing/single/omp_single.c
14356.12s: libomp :: tasking/omp_taskloop_num_tasks.c
13150.09s: libomp :: worksharing/for/omp_for_schedule_runtime.c
8455.10s: libomp :: worksharing/for/kmp_set_dispatch_buf.c
8144.37s: libomp :: tasking/omp_taskloop_grainsize.c
6818.04s: libomp :: threadprivate/omp_threadprivate.c
5520.38s: libomp :: worksharing/for/omp_doacross.c
3407.81s: libomp :: tasking/omp_task_priority3.c
2711.39s: libomp :: worksharing/for/omp_collapse_many_int.c
2667.53s: libomp :: tasking/task_teams_stress_test.cpp
2137.94s: libomp :: parallel/omp_parallel_num_threads.c
2049.00s: libomp :: worksharing/for/omp_for_reduction.c
1983.83s: libomp :: worksharing/for/omp_for_ordered.c
1845.35s: libomp :: tasking/issue-87307.c
1842.58s: libomp :: atomic/omp_atomic.c
1697.35s: libomp :: worksharing/for/omp_parallel_for_ordered.c
1672.86s: libomp :: worksharing/sections/omp_sections_reduction.c
1604.87s: libomp :: parallel/omp_parallel_reduction.c
Tests Times:
--------------------------------------------------------------------------
[ Range ] :: [ Percentage ] :: [ Count ]
--------------------------------------------------------------------------
[30000s,32000s) :: [ ] :: [ 1/389]
[28000s,30000s) :: [ ] :: [ 0/389]
[26000s,28000s) :: [ ] :: [ 1/389]
[24000s,26000s) :: [ ] :: [ 0/389]
[22000s,24000s) :: [ ] :: [ 0/389]
[20000s,22000s) :: [ ] :: [ 0/389]
[18000s,20000s) :: [ ] :: [ 0/389]
[16000s,18000s) :: [ ] :: [ 0/389]
[14000s,16000s) :: [ ] :: [ 2/389]
[12000s,14000s) :: [ ] :: [ 1/389]
[10000s,12000s) :: [ ] :: [ 0/389]
[ 8000s,10000s) :: [ ] :: [ 2/389]
[ 6000s, 8000s) :: [ ] :: [ 1/389]
[ 4000s, 6000s) :: [ ] :: [ 1/389]
[ 2000s, 4000s) :: [ ] :: [ 6/389]
[ 0s, 2000s) :: [************************************** ] :: [374/389]
--------------------------------------------------------------------------
Testing Time: 30668.21s
Total Discovered Tests: 399
Excluded : 10 (2.51%)
Unsupported : 11 (2.76%)
Passed : 377 (94.49%)
Expectedly Failed: 1 (0.25%)
We've only observed this issue on 32-core configurations in particular. Here is the hardware information for one of them:
CPU info:
Architecture: s390x
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s) per book: 1
Book(s) per drawer: 1
Drawer(s): 32
NUMA node(s): 1
Vendor ID: IBM/S390
Machine type: 8561
CPU dynamic MHz: 5200
CPU static MHz: 5200
BogoMIPS: 3241.00
Hypervisor: z/VM 7.2.0
Hypervisor vendor: IBM
Virtualization type: full
Dispatching mode: horizontal
L1d cache: 128K
L1i cache: 128K
L2d cache: 4096K
L2i cache: 4096K
L3 cache: 262144K
L4 cache: 983040K
NUMA node0 CPU(s): 0-31
Flags: esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx vxd vxe gs vxe2 vxp sort dflt sie
Memory:
total used free shared buff/cache available
Mem: 104721188 1315028 90630200 4161828 12775960 98299752
Swap: 4194300 83168 4111132