Skip to content

Improve performance of dpnp.nanmedian #2240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 18, 2024
Merged

Improve performance of dpnp.nanmedian #2240

merged 1 commit into from
Dec 18, 2024

Conversation

vtavana
Copy link
Collaborator

@vtavana vtavana commented Dec 18, 2024

In this PR, implementation of dpnp.nanmedian is update to improve performance.
There is a significant improvement when axis is not None as shown in tables below.

Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1100 12.60.7 [1.6.31294+9]
Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8480+ OpenCL 3.0 (Build 0) [2024.18.9.0.28_160000]
import dpnp, numpy
a = numpy.ones((16*8192, 32))
axis = 1
# Randomly set some elements to NaN:
w = numpy.random.random((len(a.shape), 200)) * numpy.array(a.shape)[:, None]
w = w.astype(numpy.intp)
a[tuple(w)] = numpy.nan
%timeit numpy.nanmedian(a, axis=axis)

a_cpu = dpnp.array(a, device="cpu")
%timeit dpnp.nanmedian(a_cpu, axis=axis); a_cpu.sycl_queue.wait();

a_gpu = dpnp.array(a, device="gpu")
%timeit dpnp.nanmedian(a_gpu, axis=axis); a_gpu.sycl_queue.wait();

Old implementation:

size, axis NumPy CPU-Xeon GPU-PVC
(2048, 2048), axis=None 19.3 ms ± 191 μs 27.1 ms ± 7.1 ms 4.38 ms ± 5.39 μs
(2048, 2048), axis=0 74.7 ms ± 66.7 μs 3.24 s ± 63.7 ms 1.23 s ± 5.14 ms
(2048, 2048), axis=1 34.1 ms ± 56.1 μs 3.21 s ± 58.1 ms 1.25 s ± 9.01 ms
(16*8192, 32), axis=None 18.7 ms ± 41.1 μs 25.5 ms ± 4.5 ms 4.41 ms ± 16.1
(16*8192, 32), axis=0 51.8 ms ± 83 μs 72.5 ms ± 4.56 ms 20.1 ms ± 215 μs
(16*8192, 32), axis=1 89.2 ms ± 265 μs 3min 31s ± 1.53 s 1min 19s ± 1.28 s

New implementation:

size, axis NumPy CPU-Xeon GPU-PVC
(2048, 2048), axis=None 19.3 ms ± 67.7 μs 24.8 ms ± 3.61 ms 4.97 ms ± 15.6 μs
(2048, 2048), axis=0 74.2 ms ± 50.3 μs 29.2 ms ± 2.93 ms 3.99 ms ± 217 μs
(2048, 2048), axis=1 34.1 ms ± 64.3 μs 20.3 ms ± 4.25 ms 3.45 ms ± 81.7 μs
(16*8192, 32), axis=None 20 ms ± 221 μs 24.8 ms ± 4.34 ms 5.03 ms ± 12 μs
(16*8192, 32), axis=0 52.2 ms ± 77.6 μs 24.3 ms ± 1.97 ms 4.57 ms ± 194 μs
(16*8192, 32), axis=1 90.4 ms ± 305 μs 31.7 ms ± 8.73 ms 4.67 ms ± 78 μs
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

@vtavana vtavana self-assigned this Dec 18, 2024
Copy link
Contributor

View rendered docs @ https://intelpython.github.io/dpnp/pull/2240/index.html

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 65.074% (-0.008%) from 65.082%
when pulling a2ab7ca on update-nanmedian
into ea718e3 on master.

@vtavana vtavana marked this pull request as ready for review December 18, 2024 13:49
Copy link
Contributor

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings great performance improvements. Thank you @vtavana, LGTM

@vtavana vtavana merged commit cabc0d7 into master Dec 18, 2024
52 of 54 checks passed
@vtavana vtavana deleted the update-nanmedian branch December 18, 2024 15:29
github-actions bot added a commit that referenced this pull request Dec 18, 2024
In this PR, implementation of `dpnp.nanmedian` is update to improve
performance. cabc0d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants