Description
Creating a new issue to track a PR I'm working on to fix the issue in the title. Related issue from May 2021: #97
I was attempting to track the memory usage of command benchmarks on Windows, but got the following errors when doing so:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 769, in <module>
main()
File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 765, in main
func()
File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 734, in cmd_bench_command
runner.bench_command(name, command)
File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 747, in bench_command
return self._main(task)
^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 460, in _main
bench = self._worker(task)
^^^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 434, in _worker
run = task.create_run()
^^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_worker.py", line 299, in create_run
self.compute()
File "D:\my_project\.venv\Lib\site-packages\pyperf\_command.py", line 70, in compute
raise RuntimeError("failed to get the process RSS")
RuntimeError: failed to get the process RSS
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "\.venv\Scripts\pyperf.exe\__main__.py", line 10, in <module>
File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 765, in main
func()
File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 734, in cmd_bench_command
runner.bench_command(name, command)
File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 747, in bench_command
return self._main(task)
^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 465, in _main
bench = self._manager()
^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 678, in _manager
bench = Manager(self).create_bench()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 243, in create_bench
worker_bench, run = self.create_worker_bench()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 142, in create_worker_bench
suite = self.create_suite()
^^^^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 132, in create_suite
suite = self.spawn_worker(self.calibrate_loops, 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 118, in spawn_worker
raise RuntimeError("%s failed with exit code %s"
RuntimeError: D:\my_project\.venv\Scripts\python.exe failed with exit code 1
I debugged my way through the code and ended up getting the root cause, which is located here:
pyperf/pyperf/_process_time.py
Lines 25 to 42 in e0610c2
In short, this function gets the current process resident set size by using the resource
library, but this library is only available on Linux. When run on Windows, this function simply returns 0, which causes the downstream callers to see this as an error and fail running the benchmark entirely.
I began working on a fork where I instead use psutil
to get the current process' RSS, but I noticed that psutil.Process().memory_info().rss
returns higher values than the measurements from the resource library. I'm seeing roughly 25% - 35% higher RSS size with
psutil`, so that leads to a dilemma in terms of accuracy across operating systems. We have a few options:
psutil
works cross-platform, but therss
values are not accurate with what theresource
module gets. We can opt to only usepsutil
moving forward, but that would invalidate all existing command benchmark results until they are re-run.- We can use
psutil
only for Windows systems, but this leads to a memory usage discrepancy between operating systems. On my Mac Mini, theresource
andpsutil
RSS sizes did not match by a wide margin, so for Windows systems it would falsely appear to have higher memory usage than Mac systems (and presumably Linux ones as well). - We can use some other data point, such as the Unique Set Size from
psutil
throughpsutil.Process().memory_full_info().uss
. USS is closer to what theresource
module gets for RSS, but now USS is about 15% smaller than RSS from theresource
module. USS is supposed to be the closest representation of the process memory usage, which should be more ideal than RSS or peak RSS
I'm not aware of any other ways to get the memory usage of a process without writing some C bindings to do so. What's more confusing is that there is also the _win_memory.py file that uses Windows-native functionality to track memory usage, but from my testing that's not used correctly - if it was then I wouldn't be getting the above error.
I see in both _runner.py and _worker.py that we break down what method to use based on what OS is running. If we go with using psutil
for the unifying the memory tracking of command benchmarks, should we do the same for regular benchmarks?