Skip to content

Rate Limit

Request throttling and usage visibility

Extension URN: urn:vnd:ext:rate-limit


The rate limit extension provides standardized mechanisms for request throttling, following draft-ietf-httpapi-ratelimit-headers (RateLimit Header Fields for HTTP):

  1. Proactive visibility — Clients see usage before hitting limits
  2. Clear errors — Structured error when limits are exceeded
  3. Recovery guidance — Explicit retry timing

This extension differs from Quota which tracks resource consumption limits (API calls per month, storage used). Rate limiting focuses on request throughput protection.


Rate limiting SHOULD be used for:

  • Public-facing APIs
  • Services requiring overload protection
  • Multi-tenant systems with fairness requirements
  • Functions with expensive operations

Rate limiting MAY NOT be needed for:

  • Internal service-to-service calls with trusted callers
  • Queue-based async processing
  • Services with other throttling mechanisms (transport layer, load balancer)

The rate limit extension typically requires no request options. Clients simply include the extension to receive rate limit metadata:

FieldTypeRequiredDescription
scopestringNoRequest specific scope info (global, service, function, user)

FieldTypeDescription
limitintegerMaximum requests allowed in the window
usedintegerRequests used in current window
remainingintegerRequests remaining in current window
windowobjectTime window duration (value/unit)
resets_inobjectTime until window resets (value/unit)
scopestringScope this limit applies to
warningstringOptional warning when approaching limit

When the rate limit extension is included:

  1. Server MUST include rate limit data in extension response
  2. Server MUST accurately report remaining count
  3. Server SHOULD include warning when approaching limit (e.g., < 10% remaining)

When a client exceeds the rate limit:

  1. Server MUST return RATE_LIMITED error
  2. Server MUST set retryable: true on the error
  3. Server MUST include retry_after in error details
  4. Server MUST still include extension data with current status

When multiple rate limit scopes apply, servers SHOULD return all applicable limits:

{
"urn": "urn:vnd:ext:rate-limit",
"data": {
"scopes": {
"global": {
"limit": 10000,
"used": 4523,
"remaining": 5477,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 32, "unit": "second" }
},
"service": {
"limit": 1000,
"used": 847,
"remaining": 153,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 32, "unit": "second" }
}
}
}
}

ScopeDescription
globalAcross all clients (system protection)
servicePer calling service (identified by context.caller)
functionPer function (different limits per operation)
userPer authenticated user

{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_123",
"call": {
"function": "orders.create",
"version": "1",
"arguments": {
"customer_id": 42,
"items": [{ "sku": "WIDGET-01", "quantity": 1 }]
}
},
"extensions": [
{
"urn": "urn:vnd:ext:rate-limit",
"options": {}
}
]
}
{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_123",
"result": {
"order_id": 456,
"status": "created"
},
"extensions": [
{
"urn": "urn:vnd:ext:rate-limit",
"data": {
"limit": 1000,
"used": 42,
"remaining": 958,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 47, "unit": "second" },
"scope": "service"
}
}
]
}
{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_456",
"result": { "success": true },
"extensions": [
{
"urn": "urn:vnd:ext:rate-limit",
"data": {
"limit": 1000,
"used": 985,
"remaining": 15,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 12, "unit": "second" },
"scope": "service",
"warning": "Rate limit nearly exhausted"
}
}
]
}
{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_789",
"result": null,
"errors": [{
"code": "RATE_LIMITED",
"message": "Rate limit exceeded for orders.create",
"retryable": true,
"details": {
"limit": 100,
"used": 100,
"window": { "value": 1, "unit": "minute" },
"retry_after": { "value": 23, "unit": "second" },
"scope": "function",
"function": "orders.create"
}
}],
"extensions": [
{
"urn": "urn:vnd:ext:rate-limit",
"data": {
"limit": 100,
"used": 100,
"remaining": 0,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 23, "unit": "second" },
"scope": "function"
}
}
]
}
{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_multi",
"result": { "success": true },
"extensions": [
{
"urn": "urn:vnd:ext:rate-limit",
"data": {
"scopes": {
"global": {
"limit": 10000,
"used": 4523,
"remaining": 5477,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 32, "unit": "second" }
},
"service": {
"limit": 1000,
"used": 153,
"remaining": 847,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 32, "unit": "second" }
},
"function": {
"limit": 100,
"used": 45,
"remaining": 55,
"window": { "value": 1, "unit": "minute" },
"resets_in": { "value": 32, "unit": "second" }
}
}
}
}
]
}

Rate limit errors MUST map to HTTP 429 per RFC 9110 Section 15.5.29:

Error CodeHTTP StatusHeaders
RATE_LIMITED429 Too Many RequestsRetry-After, RateLimit-*

Servers MUST include the Retry-After HTTP header per RFC 9110 Section 10.2.3:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 23
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 23
{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_789",
"errors": [{ ... }]
}

The Retry-After value MUST be in seconds and MUST match the retry_after.value in error details (when unit is seconds).


Clients SHOULD monitor remaining and throttle requests before hitting limits:

if (remaining < threshold) {
delay = calculate_backoff(remaining, resets_in)
wait(delay)
}

When receiving RATE_LIMITED error:

  1. Extract retry_after from error details
  2. Wait the specified duration
  3. Retry with exponential backoff if still limited
  4. Set maximum retry attempts

Clients MUST NOT retry immediately without waiting.

Recommended exponential backoff:

wait_time = min(retry_after * (2 ^ attempt), max_wait)

Where:

  • retry_after — From error details
  • attempt — Retry attempt number (0, 1, 2, …)
  • max_wait — Maximum wait time (e.g., 5 minutes)

Common algorithms:

AlgorithmDescription
Fixed WindowReset counter at fixed intervals
Sliding WindowRolling time window
Token BucketTokens replenish over time
Leaky BucketRequests drain at constant rate

Servers implementing rate limiting MUST:

  1. Return RATE_LIMITED error when limit exceeded
  2. Include retry_after in error details
  3. Set retryable: true on the error
  4. Return extension data on every response (when extension requested)

Servers SHOULD:

  1. Use consistent window boundaries across requests
  2. Document rate limit policies
  3. Include warning when approaching limit

Clients can discover rate limit policies via vend.capabilities:

{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_caps",
"call": {
"function": "vend.capabilities",
"version": "1",
"arguments": {}
}
}

Response:

{
"protocol": { "name": "vend", "version": "0.1.0" },
"id": "req_caps",
"result": {
"service": "orders-api",
"extensions": [
{
"urn": "urn:vnd:ext:rate-limit",
"documentation": "https://docs.example.com/rate-limits"
}
],
"rate_limits": [
{
"scope": "service",
"limit": 1000,
"window": { "value": 1, "unit": "minute" }
},
{
"scope": "function",
"function": "orders.create",
"limit": 100,
"window": { "value": 1, "unit": "minute" }
}
]
}
}

AspectRate LimitQuota
PurposeThroughput protectionResource consumption tracking
Time scaleShort (seconds/minutes)Long (hours/days/months)
ResetAutomatic window resetManual or billing cycle
Example1000 req/minute10,000 API calls/month
Error codeRATE_LIMITEDQUOTA_EXCEEDED

Use rate limiting for burst protection, quota for usage-based limits.