Feature: support for "retry-after-ms" HTTP header variant

### Confirm this is a feature request for the Python library and not the underlying OpenAI API.

- [X] This is a feature request for the Python library

### Describe the feature or improvement you're requesting

**Feature request**: add support for the millisecond-precision `retry-after-ms` variant of the standard `retry-after` response header, using its value as a higher-resolution first selection when present that falls back to the lower-resolution standard when not present.

openai-python's retry header handling is cleanly done in [_base_client.py](https://github.com/openai/openai-python/blob/e36956673d9049713c91bca6ce7aebe58638f483/src/openai/_base_client.py#L621) and parses the standard `retry-after` header, which provides second-resolution guidance on how long a client should wait before initiating a retry.

Some services, including Azure OpenAI and particularly in the context of provisioned customers, can provide a `retry-after-ms` header in addition to `retry-after`. This millisecond-resolution variant is primarily valuable when retry behavior is being used to efficiently control traffic of service-to-service calls within a topology that often has delays that can be well under a single whole second.

As a reference/comparison, Azure's SDKs use a precedence order of three retry headers, e.g. [as per here in the azure-sdk-for-js core logic](https://github.com/Azure/azure-sdk-for-js/blob/17de1a2b7f3ad61f34ff62876eced7d077c10d4b/sdk/core/core-rest-pipeline/src/retryStrategies/throttlingRetryStrategy.ts#L35):

- If the `retry-after-ms` header key is present, use its value as the number of milliseconds to delay
- Else, if the `x-ms-retry-after-ms` header key is present, instead use its value as the number of milliseconds to delay
- Else, if the `retry-after` header key is present, use its value as the number of whole seconds to delay
- Else, fall back to standard fallback heuristics to calculate a retry delay

`openai-python` already uses a float value from `retry-after` as the input into `time.sleep()`, so this superficially looks like a fairly straightforward addition:
```python
retry_after = float(retry_header)
``` 

Conceptually, this would just be a `float(retry_ms_header) / 1000` style of thing.

Thank you!

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: support for "retry-after-ms" HTTP header variant #957

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: support for "retry-after-ms" HTTP header variant #957

Description

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions