Rate Limiting for AI APIs¶

AI APIs often consume resources based on token usage, and excessive requests can lead to increased costs, throttling, or service disruptions. WSO2 API Manager supports both request-based and token-based rate limiting for AI APIs.

By enforcing rate limits, you can:

Prevent unexpected cost spikes from excessive AI API usage.
Optimize performance by ensuring fair resource distribution.
Protect AI backends from overuse and service degradation.

Subscription-Level Rate Limiting¶

Subscription-level rate limiting applies different quotas based on business plans, allowing API providers to enforce request-based or token-based limits on API subscribers.

AI-specific subscription policies support the following quotas:

Request Count - Limits the total number of requests an application can make.
Total Token Count - Defines the maximum number of tokens consumed by an application across all interactions with an AI API.
Prompt Token Count - Controls the number of tokens used specifically for AI prompt processing.
Completion Token Count - Restricts the number of response tokens generated by an AI API.

Info

AI-specific subscription policies are configured in the Admin Portal under Rate Limiting Policies. As a Publisher, you select which tiers to make available for your API. For information on creating custom AI business plans, see Adding a new Subscription-Level AI Rate Limiting tier.

Backend Rate Limiting¶

Backend rate limiting ensures that AI APIs do not overload backend AI services by controlling token usage and request counts.

Backend throughput limits can be configured for:

Request Count - The maximum number of requests the backend AI service can handle.
Total Token Count - Limits the overall token consumption to prevent resource exhaustion.
Prompt Token Count - Controls the number of input tokens sent to the AI model.
Completion Token Count - Limits the number of tokens generated by the AI model as a response.

Info

Backend rate limiting for AI APIs is configured in the Publisher Portal. For configuration details, see Protect Backend Services.