Semantic Model Routing¶

Semantic Routing is an intelligent routing policy for WSO2 API Manager that directs AI requests to specific models based on the semantic meaning of the user's input. Unlike traditional routing that relies on exact keyword matching or rule-based logic, Semantic Routing uses vector embeddings to understand the intent and context of requests, ensuring they are routed to the most appropriate AI model.

Features¶

Intent-Based Routing: Routes requests to specialized models based on semantic understanding of user queries.
Embedding-Powered Matching: Uses vector embeddings to compute similarity between user requests and predefined utterances.
Confidence Scoring: Ensures high-confidence routing with configurable similarity thresholds.
Default Fallback: Automatically routes to a default model when no semantic match is found.
Multi-Environment Support: Separate configurations for production and sandbox environments.
Multiple Embedding Providers: Support for Mistral, Azure OpenAI, and OpenAI embedding models.

Configure the Environment¶

Before using the Semantic Routing policy, you must configure an embedding provider in the deployment.toml file.

Embedding Provider Configuration¶

Choose one of the following embedding providers and add the configuration to your <APIM_HOME>/repository/conf/deployment.toml file:

MistralAzure OpenAIOpenAI

[apim.ai.embedding_provider]
type = "mistral"
[apim.ai.embedding_provider.properties]
apikey = "<your-mistral-api-key>"
embedding_endpoint = "https://api.mistral.ai/v1/embeddings"
embedding_model = "mistral-embed"

[apim.ai.embedding_provider]
type = "azure"
[apim.ai.embedding_provider.properties]
apikey = "<your-azure-openai-api-key>"
embedding_endpoint = "<your-azure-openai-embedding-endpoint>"

[apim.ai.embedding_provider]
type = "openai"
[apim.ai.embedding_provider.properties]
apikey = "<your-openai-api-key>"
embedding_endpoint = "https://api.openai.com/v1/embeddings"
embedding_model = "<openai-embedding-model>"

How It Works¶

Semantic Routing operates through the following process:

Initialization: When the policy is configured, it precomputes embeddings for all predefined utterances associated with each route.
Request Processing: When a user request arrives, the policy extracts the relevant content using a JSONPath expression.
Embedding Generation: The extracted content is converted into a vector embedding using the configured embedding provider.
Similarity Matching: The request embedding is compared against all precomputed route embeddings using cosine similarity.
Route Selection: The route with the highest similarity score is selected if it meets the configured threshold .
Fallback: If no route meets the criteria, the request is routed to the default model.

How to Use¶

Follow these steps to configure the Semantic Routing policy for your AI API:

Log in to the Publisher Portal (https://<hostname>:9443/publisher).
Select the AI API for which you want to configure semantic routing.
Navigate to API Configurations, and click Policies.
Look for the policy named Semantic Routing listed under the Common Policies section within the policy list. Drag and drop the Semantic Routing policy to the Request flow of /chat/completions POST operation.
Fill in the requested details and click Save.

AWS Bedrock Configuration

When configuring semantic routing with AWS Bedrock as a multi-model provider service, you must select both the Provider (model family) and the Model for each route and the default model. The Provider dropdown lists the model families you have set up in the Admin Portal (such as Meta, Anthropic, DeepSeek, etc.), and once a provider is selected, the Model dropdown will display the specific models available under that provider.

Policy Configuration¶

The Semantic Routing policy requires the following configuration:

Basic Configuration¶

Field	Description	Example
Content Path	JSONPath expression to extract the user's request content from the payload	`$.messages[-1].content`

Route Configuration¶

For each environment (Production/Sandbox), you can configure multiple routes:

Field	Description	Required
Model	The target AI model for this route	Yes
Endpoint	The endpoint to route requests to	Yes
Utterances	List of example phrases that represent this route's intent	Yes
Score Threshold	Minimum similarity score (0.0 to 1.0) required to match this route. Default: 0.90	No

Default Model Configuration¶

Field	Description	Required
Default Model	The model to use when no semantic route matches	Yes
Default Endpoint	The endpoint for the default model	Yes

When to use Semantic Routing vs Intelligent Model Routing

Semantic Routing works best when users send full sentences or phrases that express a clear intent (e.g., "What is the weather forecast for tomorrow?"). It relies on vector similarity between the user's query and the utterances you provide, so very short inputs like single words or 2-word phrases (e.g., "weather", "rain forecast") may not produce reliable similarity scores.

If your users tend to send short keyword-style queries, consider using Intelligent Model Routing instead, which uses an LLM to classify requests based on contextual descriptions rather than embedding similarity.

Example Configuration¶

Click to expand configuration example

Scenario: Route weather-related queries to a specialized weather model and everything else to a general-purpose model.

Create an AI API with multiple model endpoints configured.
Add the Semantic Routing policy with the following configuration:

Content Path: $.messages[-1].content

Production Routes:

Route 1 - Weather Information - Model: gpt-4o-mini - Endpoint: gpt-4o-mini - Score Threshold: 0.8 - Utterances: - "weather" - "temperature" - "forecast" - "rain" - "snow" - "wind" - "humidity"

Default Model: - Model: gpt-4o - Endpoint: gpt-4o

Save and deploy the API.
Test the semantic routing with different queries:

Request 1 (Routes to Weather Information):

{
  "messages": [
    {
      "role": "user",
      "content": "What is the weather forecast for tomorrow in Paris"
    }
  ]
}

Request 2 (Routes to Default Model):

{
  "messages": [
    {
      "role": "user",
      "content": "Generate a small HTML code"
    }
  ]
}

Configuration Parameters¶

Score Threshold¶

The score threshold determines the minimum cosine similarity (0.0 to 1.0) required for a request to match a specific route:

0.0: No similarity (completely different)
1.0: Perfect similarity (identical meaning)
Default: 0.90 (high confidence matching)

Recommendations: - Use 0.85-0.95 for specialized routes requiring high confidence - Use 0.75-0.85 for broader, more flexible matching - Lower thresholds may result in incorrect routing - Higher thresholds may cause more requests to fall back to the default model

Best Practices¶

Utterance Selection: Provide 5-10 diverse example utterances per route that cover different ways users might express the same intent.
Threshold Tuning: Start with the default threshold (0.90) and adjust based on routing accuracy in your use case.
Default Model: Always configure a capable default model to handle requests that don't match any specific route.
Testing: Test with various phrasings of the same intent to ensure consistent routing behavior.
Monitoring: Monitor routing decisions in logs (enable debug logging) to optimize utterances and thresholds.

Troubleshooting¶

Common Issues¶

Issue	Possible Cause	Solution
All requests route to default model	Score threshold too high	Lower the score threshold for your routes (e.g., from 0.90 to 0.85)
Incorrect routing	Insufficient or unclear utterances	Add more diverse example utterances that better represent the route's intent
Requests routing to wrong model	Similar utterances across routes	Make utterances more distinct and specific to each route's purpose
Embedding provider errors	Invalid credentials or endpoint	Verify your embedding provider configuration in `deployment.toml`
Policy not working	Embedding provider not configured	Ensure embedding provider is properly configured and restart the server