Rate Limit
Rate Limit
Section titled “Rate Limit”Request throttling and usage visibility
Extension URN: urn:vnd:ext:rate-limit
Overview
Section titled “Overview”The rate limit extension provides standardized mechanisms for request throttling, following draft-ietf-httpapi-ratelimit-headers (RateLimit Header Fields for HTTP):
- Proactive visibility — Clients see usage before hitting limits
- Clear errors — Structured error when limits are exceeded
- Recovery guidance — Explicit retry timing
This extension differs from Quota which tracks resource consumption limits (API calls per month, storage used). Rate limiting focuses on request throughput protection.
When to Use
Section titled “When to Use”Rate limiting SHOULD be used for:
- Public-facing APIs
- Services requiring overload protection
- Multi-tenant systems with fairness requirements
- Functions with expensive operations
Rate limiting MAY NOT be needed for:
- Internal service-to-service calls with trusted callers
- Queue-based async processing
- Services with other throttling mechanisms (transport layer, load balancer)
Options (Request)
Section titled “Options (Request)”The rate limit extension typically requires no request options. Clients simply include the extension to receive rate limit metadata:
| Field | Type | Required | Description |
|---|---|---|---|
scope | string | No | Request specific scope info (global, service, function, user) |
Data (Response)
Section titled “Data (Response)”| Field | Type | Description |
|---|---|---|
limit | integer | Maximum requests allowed in the window |
used | integer | Requests used in current window |
remaining | integer | Requests remaining in current window |
window | object | Time window duration (value/unit) |
resets_in | object | Time until window resets (value/unit) |
scope | string | Scope this limit applies to |
warning | string | Optional warning when approaching limit |
Behavior
Section titled “Behavior”Standard Response
Section titled “Standard Response”When the rate limit extension is included:
- Server MUST include rate limit data in extension response
- Server MUST accurately report
remainingcount - Server SHOULD include
warningwhen approaching limit (e.g., < 10% remaining)
Rate Limit Exceeded
Section titled “Rate Limit Exceeded”When a client exceeds the rate limit:
- Server MUST return
RATE_LIMITEDerror - Server MUST set
retryable: trueon the error - Server MUST include
retry_afterin error details - Server MUST still include extension data with current status
Multiple Scopes
Section titled “Multiple Scopes”When multiple rate limit scopes apply, servers SHOULD return all applicable limits:
{ "urn": "urn:vnd:ext:rate-limit", "data": { "scopes": { "global": { "limit": 10000, "used": 4523, "remaining": 5477, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 32, "unit": "second" } }, "service": { "limit": 1000, "used": 847, "remaining": 153, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 32, "unit": "second" } } } }}Rate Limit Scopes
Section titled “Rate Limit Scopes”| Scope | Description |
|---|---|
global | Across all clients (system protection) |
service | Per calling service (identified by context.caller) |
function | Per function (different limits per operation) |
user | Per authenticated user |
Examples
Section titled “Examples”Request with Extension
Section titled “Request with Extension”{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_123", "call": { "function": "orders.create", "version": "1", "arguments": { "customer_id": 42, "items": [{ "sku": "WIDGET-01", "quantity": 1 }] } }, "extensions": [ { "urn": "urn:vnd:ext:rate-limit", "options": {} } ]}Normal Response
Section titled “Normal Response”{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_123", "result": { "order_id": 456, "status": "created" }, "extensions": [ { "urn": "urn:vnd:ext:rate-limit", "data": { "limit": 1000, "used": 42, "remaining": 958, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 47, "unit": "second" }, "scope": "service" } } ]}Approaching Limit
Section titled “Approaching Limit”{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_456", "result": { "success": true }, "extensions": [ { "urn": "urn:vnd:ext:rate-limit", "data": { "limit": 1000, "used": 985, "remaining": 15, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 12, "unit": "second" }, "scope": "service", "warning": "Rate limit nearly exhausted" } } ]}Rate Limited Error
Section titled “Rate Limited Error”{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_789", "result": null, "errors": [{ "code": "RATE_LIMITED", "message": "Rate limit exceeded for orders.create", "retryable": true, "details": { "limit": 100, "used": 100, "window": { "value": 1, "unit": "minute" }, "retry_after": { "value": 23, "unit": "second" }, "scope": "function", "function": "orders.create" } }], "extensions": [ { "urn": "urn:vnd:ext:rate-limit", "data": { "limit": 100, "used": 100, "remaining": 0, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 23, "unit": "second" }, "scope": "function" } } ]}Multiple Scopes
Section titled “Multiple Scopes”{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_multi", "result": { "success": true }, "extensions": [ { "urn": "urn:vnd:ext:rate-limit", "data": { "scopes": { "global": { "limit": 10000, "used": 4523, "remaining": 5477, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 32, "unit": "second" } }, "service": { "limit": 1000, "used": 153, "remaining": 847, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 32, "unit": "second" } }, "function": { "limit": 100, "used": 45, "remaining": 55, "window": { "value": 1, "unit": "minute" }, "resets_in": { "value": 32, "unit": "second" } } } } } ]}HTTP Mapping
Section titled “HTTP Mapping”Rate limit errors MUST map to HTTP 429 per RFC 9110 Section 15.5.29:
| Error Code | HTTP Status | Headers |
|---|---|---|
RATE_LIMITED | 429 Too Many Requests | Retry-After, RateLimit-* |
Retry-After Header
Section titled “Retry-After Header”Servers MUST include the Retry-After HTTP header per RFC 9110 Section 10.2.3:
HTTP/1.1 429 Too Many RequestsContent-Type: application/jsonRetry-After: 23RateLimit-Limit: 100RateLimit-Remaining: 0RateLimit-Reset: 23
{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_789", "errors": [{ ... }]}The Retry-After value MUST be in seconds and MUST match the retry_after.value in error details (when unit is seconds).
Client Behavior
Section titled “Client Behavior”Proactive Throttling
Section titled “Proactive Throttling”Clients SHOULD monitor remaining and throttle requests before hitting limits:
if (remaining < threshold) { delay = calculate_backoff(remaining, resets_in) wait(delay)}Handling RATE_LIMITED
Section titled “Handling RATE_LIMITED”When receiving RATE_LIMITED error:
- Extract
retry_afterfrom error details - Wait the specified duration
- Retry with exponential backoff if still limited
- Set maximum retry attempts
Clients MUST NOT retry immediately without waiting.
Backoff Strategy
Section titled “Backoff Strategy”Recommended exponential backoff:
wait_time = min(retry_after * (2 ^ attempt), max_wait)Where:
retry_after— From error detailsattempt— Retry attempt number (0, 1, 2, …)max_wait— Maximum wait time (e.g., 5 minutes)
Server Implementation
Section titled “Server Implementation”Rate Limit Algorithms
Section titled “Rate Limit Algorithms”Common algorithms:
| Algorithm | Description |
|---|---|
| Fixed Window | Reset counter at fixed intervals |
| Sliding Window | Rolling time window |
| Token Bucket | Tokens replenish over time |
| Leaky Bucket | Requests drain at constant rate |
Requirements
Section titled “Requirements”Servers implementing rate limiting MUST:
- Return
RATE_LIMITEDerror when limit exceeded - Include
retry_afterin error details - Set
retryable: trueon the error - Return extension data on every response (when extension requested)
Servers SHOULD:
- Use consistent window boundaries across requests
- Document rate limit policies
- Include
warningwhen approaching limit
Discovery
Section titled “Discovery”Clients can discover rate limit policies via vend.capabilities:
{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_caps", "call": { "function": "vend.capabilities", "version": "1", "arguments": {} }}Response:
{ "protocol": { "name": "vend", "version": "0.1.0" }, "id": "req_caps", "result": { "service": "orders-api", "extensions": [ { "urn": "urn:vnd:ext:rate-limit", "documentation": "https://docs.example.com/rate-limits" } ], "rate_limits": [ { "scope": "service", "limit": 1000, "window": { "value": 1, "unit": "minute" } }, { "scope": "function", "function": "orders.create", "limit": 100, "window": { "value": 1, "unit": "minute" } } ] }}Rate Limit vs Quota
Section titled “Rate Limit vs Quota”| Aspect | Rate Limit | Quota |
|---|---|---|
| Purpose | Throughput protection | Resource consumption tracking |
| Time scale | Short (seconds/minutes) | Long (hours/days/months) |
| Reset | Automatic window reset | Manual or billing cycle |
| Example | 1000 req/minute | 10,000 API calls/month |
| Error code | RATE_LIMITED | QUOTA_EXCEEDED |
Use rate limiting for burst protection, quota for usage-based limits.