Closed
Description
Here is a fairly minimal reproducer:
In [1]: import mkl, numpy as np
...: import mkl_fft
...: from mkl_fft._scipy_fft_backend import fft as scipy_fft
...:
...: x = np.random.rand(100, 100, 100).astype(np.cdouble)
...: mkl.set_num_threads(1)
...: %timeit scipy_fft(x, workers=8)
...:
...: mkl.set_num_threads(8)
...: %timeit mkl_fft.fft(x)
320 µs ± 420 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.36 ms ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I would expect the mkl_fft.fft
call to use 8 threads and so be as fast as scipy_fft
with workers=8
. What actually happens is that scipy_fft
set the fft domain thread count to 1 and the domain has higher precedence than the global thread setting.
mkl_fft/mkl_fft/_scipy_fft_backend.py
Lines 162 to 163 in 4d8cc2a
Metadata
Metadata
Assignees
Labels
No labels