Windows bench_command memory tracking fails

Creating a new issue to track a PR I'm working on to fix the issue in the title. Related issue from May 2021: #97 

I was attempting to track the memory usage of command benchmarks on Windows, but got the following errors when doing so:
```py
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 769, in <module>
    main()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 765, in main
    func()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 734, in cmd_bench_command
    runner.bench_command(name, command)
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 747, in bench_command
    return self._main(task)
           ^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 460, in _main
    bench = self._worker(task)
            ^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 434, in _worker
    run = task.create_run()
          ^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_worker.py", line 299, in create_run
    self.compute()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_command.py", line 70, in compute
    raise RuntimeError("failed to get the process RSS")
RuntimeError: failed to get the process RSS
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "\.venv\Scripts\pyperf.exe\__main__.py", line 10, in <module>
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 765, in main
    func()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 734, in cmd_bench_command
    runner.bench_command(name, command)
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 747, in bench_command
    return self._main(task)
           ^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 465, in _main
    bench = self._manager()
            ^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 678, in _manager
    bench = Manager(self).create_bench()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 243, in create_bench
    worker_bench, run = self.create_worker_bench()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 142, in create_worker_bench
    suite = self.create_suite()
            ^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 132, in create_suite
    suite = self.spawn_worker(self.calibrate_loops, 0)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 118, in spawn_worker
    raise RuntimeError("%s failed with exit code %s"
RuntimeError: D:\my_project\.venv\Scripts\python.exe failed with exit code 1
```


I debugged my way through the code and ended up getting the root cause, which is located here: https://github.com/psf/pyperf/blob/e0610c2f263c300de870e7bc8e494d46e0fd71be/pyperf/_process_time.py#L25-L42

In short, this function gets the current process resident set size by using the `resource` library, but this library is only available on Linux. When run on Windows, this function simply returns 0, which causes the downstream callers to see this as an error and fail running the benchmark entirely.


I began working on a fork where I instead use `psutil` to get the current process' RSS, but I noticed that `psutil.Process().memory_info().rss` returns higher values than the measurements from the `resource library. I'm seeing roughly 25% - 35% higher RSS size with `psutil`, so that leads to a dilemma in terms of accuracy across operating systems. We have a few options:

1. `psutil` works cross-platform, but the `rss` values are not accurate with what the `resource` module gets. We can opt to only use `psutil` moving forward, but that would invalidate all existing command benchmark results until they are re-run.
2. We can use `psutil` only for Windows systems, but this leads to a memory usage discrepancy between operating systems. On my Mac Mini, the `resource` and `psutil` RSS sizes did not match by a wide margin, so for Windows systems it would falsely appear to have higher memory usage than Mac systems (and presumably Linux ones as well).
3. We can use some other data point, such as the Unique Set Size from `psutil` through `psutil.Process().memory_full_info().uss`. USS is closer to what the `resource` module gets for RSS, but now USS is about 15% smaller than RSS from the `resource` module. USS is supposed to be the closest representation of the process memory usage, which should be more ideal than RSS or peak RSS

I'm not aware of any other ways to get the memory usage of a process without writing some C bindings to do so. What's more confusing is that there is also the [_win_memory.py](https://github.com/psf/pyperf/blob/e0610c2f263c300de870e7bc8e494d46e0fd71be/pyperf/_win_memory.py) file that uses Windows-native functionality to track memory usage, but from my testing that's not used correctly - if it was then I wouldn't be getting the above error. 
I see in both [_runner.py](https://github.com/psf/pyperf/blob/e0610c2f263c300de870e7bc8e494d46e0fd71be/pyperf/_runner.py#L346-L356) and [_worker.py](https://github.com/psf/pyperf/blob/e0610c2f263c300de870e7bc8e494d46e0fd71be/pyperf/_worker.py#L327-L337) that we break down what method to use based on what OS is running. If we go with using `psutil` for the unifying the memory tracking of command benchmarks, should we do the same for regular benchmarks?

	try:
	import resource
	except ImportError:
	resource = None


	def get_max_rss(*, children):
	if resource is not None:
	if children:
	resource_type = resource.RUSAGE_CHILDREN
	else:
	resource_type = resource.RUSAGE_SELF
	usage = resource.getrusage(resource_type)
	if sys.platform == 'darwin':
	return usage.ru_maxrss
	return usage.ru_maxrss * 1024
	else:
	return 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows bench_command memory tracking fails #216

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Windows bench_command memory tracking fails #216

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions