Skip to content

Failover

Failover routing enhances reliability by automatically switching to an alternate AI model if the primary model becomes unresponsive or encounters an error. This strategy ensures continuous service availability without manual intervention.

Tip

You can configure more than one Failover policy.

Configure Failover

You can configure failover for your AI API by attaching the Model Failover policy. Here are the steps that you need to follow:

  1. Login to the Publisher Portal (https://<hostname>:9443/publisher).
  2. Select the AI API for which you want to configure load balancing.
  3. Navigate to API Configurations, and click Policies.
  4. Look for the policy named Model Failover listed under the Common Policies section within the policy list. Let's, drag and drop the Model Failover policy to the Request flow of /chat/completions POST operation.
  5. Fill in the requested details and click Save.

    Section Description
    Production/Sandbox Select the target model and target endpoint. If the request made to this combination fails, fallback will get triggered.
    Add any amount of fallback models by clicking on Add Fallback Model button. For each fallback model addition, select a model and an endpoint from the respective dropdowns.
    Request Timeout Request timeout in seconds.
    Suspend Duration Suspend duration in seconds. This will be used to suspend any failed model-endpoint pairs. If not configured, knowledge about failed invocations are not persisted.

    Model Failover Policy Configuration

  6. Finally, scroll to the bottom of the page and click on Save and deploy.

  7. For more information on how to work with API Policies, refer to the API Policies section.