Feat/max error rate #171

markVaykhansky · 2025-05-21T16:11:28Z

Addressing this issue:
#105

--max-error-rate flag was added.
It is only applicable in either or the the following scenarios:

Fixed length dataset
max_requests is set
rate_type is constant and max_duration is set

If error rate is not applicable (i.e non of the above scenarios) we just ignore it (throwing an error would break sweep for example)
Implemented using multiprocessing.Event which is used to trigger a shutdown once we hit max_error_rate.
max_error_rate - added to report
error_rate in the final report is calculated as total errored / total requests finalized i.e we exclude incomplete requests

This reverts commit 6059af1.

nm-red-hat-upstream-automation-bot · 2025-05-21T16:12:27Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15167220831/artifacts/3170182036.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-05-21T16:15:22Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15167280131/artifacts/3170205369.
They will be retained for up to 30 days.

…/max-error-rate

nm-red-hat-upstream-automation-bot · 2025-05-22T10:44:05Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15180648310/artifacts/3176050662.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-05-22T12:41:08Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15186822925/artifacts/3176853207.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-05-27T14:37:05Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15278204420/artifacts/3204756876.
They will be retained for up to 30 days.

markurtz

I left a few minor comments for the less intrusive changes. Also, please be sure to run tox -e style and tox -e types locally and check those are passing -- there are a bunch of styling pieces, such as where parentheses are placed, how new lines are escaped, etc that are inconsistent with the ruff styling setup. Running tox -e style locally applied several fixes and raised some errors that couldn't be resolved without either modifying the code or ignoring them where appropriate.

For implementation, I think defaulting always to use the shutdown event is a good pattern rather than relying on two signals: either None being placed on the queue or the shutdown event being set if the maximum error rate is reached. I'd like to incorporate that into this to reduce complexity.

Also, I think a single parameter, max_errors, would be a great extension where it can be either max_error_rate or an absolute number of errors that can occur, allowing us to support all benchmarking pathways. Along these lines, for ones where we don't know how many requests are going to run, or maybe in general, using a simple sample of the last X requests based on the precision needed could work here to ensure rate can be set for those.

Additionally, with the above, as well as some combination logic in asyncio, we should be able to simplify the changes, so we don't need to make significant changes to the resolve requests pathways and can cancel for any type. For example, we should be able to replace the root process runner to call into async loops and handle the shutdown event globally in a central place with something like the following, where ideally shutdown_poll_interval would grab from the global settings so users can adjust it:

    def run_process(
        self,
        type_: Literal["synchronous", "asynchronous"],
        requests_queue: multiprocessing.Queue,
        results_queue: multiprocessing.Queue,
        shutdown_event: multiprocessing.Event,
        shutdown_poll_interval: float,
        process_id: int,
        max_concurrency: int,
    ):
        async def _process_runner():
            if type_ == "synchronous":
                loop_task = asyncio.create_task(self.synchronous_loop(...))
            elif type_ == "asynchronous":
                loop_task = asyncio.create_task(self.asyncronous_loop(...))
            else:
                raise ValueError(f"Invalid process type: {type_}")
            
            shutdown_task = asyncio.create_task(
                self.wait_for_shutdown(shutdown_event, shutdown_poll_interval)
            )
            
            done, pending = await asyncio.wait(
                [
                    loop_task,
                    shutdown_task,
                ],
                return_when=asyncio.FIRST_EXCEPTION,
            )

            for task in pending:
                task.cancel()
                try:
                    await task
                except asyncio.CancelledError:
                    pass

            for task in done:
                if task.exception():
                    raise task.exception()  # need to check and ignore cancelations
        
        try:
            asyncio.run(_process_runner())
        except Exception as exc:  # noqa: BLE001
            logger.error(
                f"Error in worker process {process_id}: {exc}",
                exc_info=True,
                stack_info=True,
            )
        finally:
            shutdown_event.set()  # ensure shutdown event is set to stop other processes

    async def wait_for_shutdown(
        self, shutdown_event: multiprocessing.Event, shutdown_poll_interval: float
    ):
        while not shutdown_event.is_set():
            await asyncio.sleep(shutdown_poll_interval)

        raise asyncio.CancelledError("Shutdown event set, cancelling process loop.")
    
    async def synchronous_loop(self, ...):
        while True:
            process_request = await self.get_request(requests_queue)
            ...

    async def asyncronous_loop(self, ...):
        while True:
            process_request = await self.get_request(requests_queue)
            ...

src/guidellm/benchmark/aggregator.py

src/guidellm/benchmark/benchmark.py

markurtz · 2025-05-29T08:37:49Z

src/guidellm/benchmark/benchmark.py

+            "The number of errored requests divided by the number "
+            "of errored requests. This can be higher than max_error_rate "
+            "(if applicable) cause it does not take into "
+            "account incomplete requests."


I'm not following this point, the error rate we calculate and compare to should never include incomplete, right?

Isn't max_error also not looking at incomplete requests, though? How would it be higher for error_rate vs max_error since those should be based off the same calculations, right?

markurtz · 2025-05-29T08:50:13Z

src/guidellm/scheduler/scheduler.py

@@ -159,6 +166,17 @@ async def run(
                        run_info,
                    )
                    if iter_result is not None:
+                        if iter_result.request_info.errored \


NIT: should use paranetheses rather than line escapes for if statements across multiple lines, tox -e style should fix

nm-red-hat-upstream-automation-bot · 2025-06-01T09:56:59Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15373675841/artifacts/3236044854.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-01T12:13:51Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15374892878/artifacts/3236270734.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-01T12:28:49Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15375026939/artifacts/3236298177.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-03T07:26:06Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15411127841/artifacts/3247570190.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-04T10:42:56Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15440208848/artifacts/3257998536.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-04T10:55:17Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15440444689/artifacts/3258080259.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-04T12:43:41Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15442589273/artifacts/3258825312.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-05T04:54:37Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15458689300/artifacts/3264746327.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-05T06:30:12Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15460007795/artifacts/3265136314.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-05T08:59:30Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15462798203/artifacts/3266031840.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-05T11:37:32Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15465932166/artifacts/3267086597.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-06-05T11:39:04Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15465961033/artifacts/3267096068.
They will be retained for up to 30 days.

markVaykhansky · 2025-06-05T11:44:09Z

I've changed the logic as suggested to have a single termination method which is a shutdown event.
However, this did not go a smoothly as expected:

After implementing the code like you suggested above I ran into an issue with get_request. After a shutdown was initiated and the _wait_for_shutdown task raised an exception the code moved to cancel the pending task i.e the request sending task. However, it was stuck indefinitely cause the logic there's asyncio.to_thread(requests_queue.get) which is blocking. To resolve the issue I've re-introduced the sleep_intermittently logic which check the shutdown_event periodically inside the get_request. Then I ran into another issue, after sending the shutdown event there's a race condition between the asyncio.wait and checking for errors in get_request, to resolve the issue I've created a separate internal_shutdown_event which is set after the task.cancel().

Another super weird thing that happened is the raising specifically asyncio.CancellationError in _wait _for_shutdown did not trigger the return of asyncio.wait with FIRST_EXCEPTION and it was stuck forever, I'm not sure why couldn't find a reasonable explanation. Ended up throwing a custom error instead.

Also, I added the context of last X request as suggested but I feel that for the scenario where we do know the exact amount, i.e const rate + fixed duration, it is best to look at the entire run.

markurtz · 2025-06-10T13:25:06Z

.pre-commit-config.yaml

@@ -4,40 +4,40 @@ repos:
  hooks:
  - id: trailing-whitespace
  - id: end-of-file-fixer
- repo: https://github.com/astral-sh/ruff-pre-commit


We'll need these reenabled. If you ever need to push and skip recommit, you can pass in the --no-verify flag with your git commit command

Also @sjmonson just landed a diff that might clear up some of the issues you were seeing

markurtz · 2025-06-10T13:27:21Z

README.md

@@ -147,6 +147,8 @@ The `guidellm benchmark` command is used to run benchmarks against a generative

 - `--max-requests`: Sets the maximum number of requests for each benchmark run. If not provided, the benchmark will run until `--max-seconds` is reached or the dataset is exhausted.

+- `--max-error-rate`: The maximum error rate after which a benchmark will stop. Applicable only for finite deterministic scenarios i.e `rate_type` is `constant` and `--max-seconds` exists OR `--max-requests` exists OR the dataset is finite. If `--max-error-rate` is `None` or not applicable, benchmarks will continue regardless of error rate.


Suggested change

- `--max-error-rate`: The maximum error rate after which a benchmark will stop. Applicable only for finite deterministic scenarios i.e `rate_type` is `constant` and `--max-seconds` exists OR `--max-requests` exists OR the dataset is finite. If `--max-error-rate` is `None` or not applicable, benchmarks will continue regardless of error rate.

- `--max-error`: The maximum error rate after which a benchmark will stop. Can either be a rate i.e 0 < rate < 1 or constant number. If rate is given and rate_type is 'constant' and 'max_seconds' exists then the rate will be calculated as part of the total expected requests counts i.e rate * duration. If rate is given and number of requests is not pre-determined than a context window of the last requests will be looked at. Context window size is configurable under GUIDELLM__ERROR_CHECK_WINDOW_SIZE. If a number above 1 is given than we just count the total number of error and check if it's above the threshold.

markurtz · 2025-06-10T13:34:05Z

src/guidellm/scheduler/result.py

@@ -46,12 +50,15 @@ class SchedulerRunInfo(StandardBaseModel):
    end_number: float
    processes: int
    strategy: SchedulingStrategy
+    last_requests_statuses: deque[RequestStatus]


I think we can simplify the logic here so we don't need to add this in especially since this list can be come very long and it is not json serializable so would break any pydantic serialization in the event someone wants to save that state. See note later down where that logic is located

markurtz · 2025-06-10T13:35:21Z

src/guidellm/request/loader.py

    "RequestLoader",
    "RequestLoaderDescription",
 ]


+class GetInfiniteDatasetLengthError(Exception):


I think this exception and logic is no longer needed since if we can get the length then we go through a specific logic route and if we can't, then we fallback on the window

markurtz · 2025-06-10T13:45:35Z

src/guidellm/scheduler/scheduler.py

-        if max_duration is not None and max_duration < 0:
-            raise ValueError(f"Invalid max_duration: {max_duration}")
+        self._validate_scheduler_params(
+            scheduling_strategy, max_duration, max_error, max_number


NIT: could we keep the arg order the same as the surrounding function? Makes it a bit easier to track especially if additions com in later

markurtz · 2025-06-11T10:51:31Z