Skip to content

libomp tests on s390x sometimes extremely slow #116215

@nikic

Description

@nikic

We've observed that running the openmp tests on s390x is sometimes extremely slow, for example they ran for more than 8 hours here:

Slowest Tests:
--------------------------------------------------------------------------
30238.12s: libomp :: worksharing/for/omp_for_collapse.c
25670.15s: libomp :: env/kmp_set_dispatch_buf.c
14940.87s: libomp :: worksharing/single/omp_single.c
14356.12s: libomp :: tasking/omp_taskloop_num_tasks.c
13150.09s: libomp :: worksharing/for/omp_for_schedule_runtime.c
8455.10s: libomp :: worksharing/for/kmp_set_dispatch_buf.c
8144.37s: libomp :: tasking/omp_taskloop_grainsize.c
6818.04s: libomp :: threadprivate/omp_threadprivate.c
5520.38s: libomp :: worksharing/for/omp_doacross.c
3407.81s: libomp :: tasking/omp_task_priority3.c
2711.39s: libomp :: worksharing/for/omp_collapse_many_int.c
2667.53s: libomp :: tasking/task_teams_stress_test.cpp
2137.94s: libomp :: parallel/omp_parallel_num_threads.c
2049.00s: libomp :: worksharing/for/omp_for_reduction.c
1983.83s: libomp :: worksharing/for/omp_for_ordered.c
1845.35s: libomp :: tasking/issue-87307.c
1842.58s: libomp :: atomic/omp_atomic.c
1697.35s: libomp :: worksharing/for/omp_parallel_for_ordered.c
1672.86s: libomp :: worksharing/sections/omp_sections_reduction.c
1604.87s: libomp :: parallel/omp_parallel_reduction.c
Tests Times:
--------------------------------------------------------------------------
[     Range     ] :: [               Percentage               ] :: [ Count ]
--------------------------------------------------------------------------
[30000s,32000s) :: [                                        ] :: [  1/389]
[28000s,30000s) :: [                                        ] :: [  0/389]
[26000s,28000s) :: [                                        ] :: [  1/389]
[24000s,26000s) :: [                                        ] :: [  0/389]
[22000s,24000s) :: [                                        ] :: [  0/389]
[20000s,22000s) :: [                                        ] :: [  0/389]
[18000s,20000s) :: [                                        ] :: [  0/389]
[16000s,18000s) :: [                                        ] :: [  0/389]
[14000s,16000s) :: [                                        ] :: [  2/389]
[12000s,14000s) :: [                                        ] :: [  1/389]
[10000s,12000s) :: [                                        ] :: [  0/389]
[ 8000s,10000s) :: [                                        ] :: [  2/389]
[ 6000s, 8000s) :: [                                        ] :: [  1/389]
[ 4000s, 6000s) :: [                                        ] :: [  1/389]
[ 2000s, 4000s) :: [                                        ] :: [  6/389]
[    0s, 2000s) :: [**************************************  ] :: [374/389]
--------------------------------------------------------------------------
Testing Time: 30668.21s
Total Discovered Tests: 399
  Excluded         :  10 (2.51%)
  Unsupported      :  11 (2.76%)
  Passed           : 377 (94.49%)
  Expectedly Failed:   1 (0.25%)

We've only observed this issue on 32-core configurations in particular. Here is the hardware information for one of them:

CPU info:
Architecture:        s390x
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s) per book:  1
Book(s) per drawer:  1
Drawer(s):           32
NUMA node(s):        1
Vendor ID:           IBM/S390
Machine type:        8561
CPU dynamic MHz:     5200
CPU static MHz:      5200
BogoMIPS:            3241.00
Hypervisor:          z/VM 7.2.0
Hypervisor vendor:   IBM
Virtualization type: full
Dispatching mode:    horizontal
L1d cache:           128K
L1i cache:           128K
L2d cache:           4096K
L2i cache:           4096K
L3 cache:            262144K
L4 cache:            983040K
NUMA node0 CPU(s):   0-31
Flags:               esan3 zarch stfle msa ldisp eimm dfp edat etf3eh highgprs te vx vxd vxe gs vxe2 vxp sort dflt sie


Memory:
              total        used        free      shared  buff/cache   available
Mem:      104721188     1315028    90630200     4161828    12775960    98299752
Swap:       4194300       83168     4111132

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions