Failover¶
Failover routing enhances reliability by automatically switching to an alternate AI model if the primary model becomes unresponsive or encounters an error. This strategy ensures continuous service availability without manual intervention.
Tip
You can configure more than one Failover policy.
Configure Failover¶
You can configure failover for your AI API by attaching the Model Failover policy. Here are the steps that you need to follow:
- Login to the Publisher Portal (
https://<hostname>:9443/publisher
). - Select the AI API for which you want to configure load balancing.
- Navigate to API Configurations, and click Policies.
- Look for the policy named Model Failover listed under the Common Policies section within the policy list. Let's, drag and drop the Model Failover policy to the Request flow of
/chat/completions
POST operation. -
Fill in the requested details and click Save.
Section Description Production/Sandbox Select the target model and target endpoint. If the request made to this combination fails, fallback will get triggered. Add any amount of fallback models by clicking on Add Fallback Model button. For each fallback model addition, select a model and an endpoint from the respective dropdowns. Request Timeout Request timeout in seconds. Suspend Duration Suspend duration in seconds. This will be used to suspend any failed model-endpoint pairs. If not configured, knowledge about failed invocations are not persisted. -
Finally, scroll to the bottom of the page and click on Save and deploy.
-
For more information on how to work with API Policies, refer to the API Policies section.