Description
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
- This is a feature request for the Python library
Describe the feature or improvement you're requesting
Feature request: add support for the millisecond-precision retry-after-ms
variant of the standard retry-after
response header, using its value as a higher-resolution first selection when present that falls back to the lower-resolution standard when not present.
openai-python's retry header handling is cleanly done in _base_client.py and parses the standard retry-after
header, which provides second-resolution guidance on how long a client should wait before initiating a retry.
Some services, including Azure OpenAI and particularly in the context of provisioned customers, can provide a retry-after-ms
header in addition to retry-after
. This millisecond-resolution variant is primarily valuable when retry behavior is being used to efficiently control traffic of service-to-service calls within a topology that often has delays that can be well under a single whole second.
As a reference/comparison, Azure's SDKs use a precedence order of three retry headers, e.g. as per here in the azure-sdk-for-js core logic:
- If the
retry-after-ms
header key is present, use its value as the number of milliseconds to delay - Else, if the
x-ms-retry-after-ms
header key is present, instead use its value as the number of milliseconds to delay - Else, if the
retry-after
header key is present, use its value as the number of whole seconds to delay - Else, fall back to standard fallback heuristics to calculate a retry delay
openai-python
already uses a float value from retry-after
as the input into time.sleep()
, so this superficially looks like a fairly straightforward addition:
retry_after = float(retry_header)
Conceptually, this would just be a float(retry_ms_header) / 1000
style of thing.
Thank you!
Additional context
No response