Skip to content

Model Routing

Model routing is an intelligent feature of ZenMux that helps you automatically select the most suitable model from a wide range of large language models. The system intelligently balances performance and cost based on the request content, task characteristics, and your preference settings.

Intelligent model selection

No need to manually compare model performance and pricing—ZenMux automatically matches the most suitable model for each request, so you can focus on building your business logic.

Why Model Routing

In real-world applications, different tasks have different model requirements:

  • Simple conversations: using a high-performance model may be wasteful
  • Complex reasoning: a budget model may not meet quality requirements
  • Production environments: you must balance quality, cost, and speed
  • Model selection is hard: dozens of models on the market make manual selection time-consuming

Model routing solves these problems with automated decisions, intelligently matching the optimal model for each request.

Model List

Quick lookup

On the Models page you can view all supported models and their basic information. Use the filters on the left, the search box at the top, and sorting options to quickly locate the model you need. Model cards also provide quick access to supported input/output modalities, input/output pricing, Context, Max Tokens, and other key details.

Model List

Click any model card to open its details page and view model-specific information across different providers, including performance comparisons, price comparisons, and parameter differences. For details, see the Provider Routing documentation.

Core Benefits

BenefitDescription
Intelligent decisionsAutomatically analyzes request content and task characteristics to select the most suitable model
Cost optimizationPrioritizes better cost-performance models while ensuring quality
Flexible configurationSupports custom model pools and preference strategies for different business scenarios
Transparent and controllableReturns the actual model used for easy monitoring and optimization
Continuous optimizationContinuously improves routing strategies based on historical data

Quick Start

Basic Usage

Model routing is easy to use—simply set the model parameter to zenmux/auto and specify the candidate model pool via model_routing_config. If you do not specify model_routing_config.available_models, the system will use the platform’s full model pool.

How to get model slugs

Models on the ZenMux platform have unique slugs. You can get a model’s slug from the Models list page: model-slug Or from the model detail page for a specific model: model-slug

json
{
  "model": "zenmux/auto",
  "model_routing_config": {
    "available_models": [
      "anthropic/claude-4-sonnet", // Provide the model slug
      "openai/gpt-5",
      "google/gemini-2.5-flash-lite"
    ],
    "preference": "balanced"
  },
  "messages": [
    {
      "role": "user",
      "content": "Explain what quantum computing is"
    }
  ]
}
python
from openai import OpenAI

client = OpenAI(
    base_url="https://zenmux.ai/api/v1",
    api_key="<your ZENMUX_API_KEY>"
)

response = client.chat.completions.create(
    model="zenmux/auto",  
    extra_body={  
        "model_routing_config": {  
            "available_models": [  
                "anthropic/claude-4-sonnet",  
                "openai/gpt-5",  
                "google/gemini-2.5-flash-lite"
            ],  
            "preference": "balanced"
        }  
    },  
    messages=[
        {"role": "user", "content": "Explain what quantum computing is"}
    ]
)

print(f"Selected model: {response.model}")
print(f"Answer: {response.choices[0].message.content}")

Model actually used

The model field in the response returns the model selected by intelligent routing, making it easy to monitor and analyze routing behavior.

How It Works

zenmux/auto model

zenmux/auto is a special model identifier in ZenMux. When you specify this model, the system enables intelligent routing.

Routing decision process:

  1. Request analysis: parse prompt content, context length, task type, and other features
  2. Model evaluation: score each model in the candidate pool
  3. Aggregated decision: balance performance, price, and availability according to the preference strategy
  4. Model selection: choose the optimal model and forward the request
  5. Result return: annotate the actual model used in the response
Factors considered in routing decisions
  • Task complexity: simple conversation vs. complex reasoning
  • Context length: short dialogue vs. long document analysis
  • Model performance: accuracy, response speed, creativity
  • Model pricing: input/output token unit price
  • Model availability: real-time load, regional restrictions
  • User preference: performance / balanced / price

Configuration Parameters

model_routing_config object

Configure intelligent routing behavior via the model_routing_config parameter:

ParameterTypeRequiredDescription
available_modelsstring[]YesCandidate model list for routing
preferencestringNoRouting preference strategy, default balanced

available_models - Candidate model pool

Specify the list of models that intelligent routing can choose from. We recommend including 3–5 models across different performance and price tiers.

Notes

  • The model list must include at least 2 models
  • Mix models from different price tiers for optimal balance

preference - Routing preference strategy

Specify the priority strategy used in routing decisions:

balanced - Balanced mode (default)

Seeks the optimal balance between performance and cost; suitable for most application scenarios.

Characteristics:

  • Prioritizes budget models for simple tasks
  • Automatically upgrades to high-performance models for complex tasks
  • Balances quality and cost

Suitable scenarios:

  • General-purpose apps in production environments
  • Mixed scenarios such as conversational assistants and content generation
  • Situations where you must control cost without sacrificing quality

performance - Performance-first mode

Prioritizes the highest-performing models; suitable for scenarios with very high output quality requirements.

Characteristics:

  • Tends to choose top flagship models
  • Ensures the highest answer quality and accuracy
  • Relatively higher cost

Suitable scenarios:

  • Critical business decision support
  • Professional content creation (legal, medical, finance, etc.)
  • Complex code generation and debugging
  • Academic research and data analysis

price - Price-first mode

Prioritizes models with the best cost-effectiveness; suitable for large-scale, cost-sensitive applications.

Characteristics:

  • Prefers the cheapest models
  • Only upgrades to more expensive models when necessary
  • Maximizes cost efficiency

Suitable scenarios:

  • High-concurrency simple conversation applications
  • Internal tools and test environments
  • Education and learning scenarios
  • Budget-limited startup projects

Preference strategy comparison

StrategyPerformanceCostSuitable scenarios
balanced⭐⭐⭐⭐⭐⭐⭐Production, general apps
performance⭐⭐⭐⭐⭐⭐⭐Critical business, professional content
price⭐⭐⭐⭐⭐⭐⭐⭐High concurrency, cost-sensitive

Best Practices

1. Configure the candidate model pool appropriately

Follow these principles when choosing candidate models:

Recommended:

  • Include 3–5 models across different tiers
  • Mix flagship, mid-tier, and budget models
  • Consider model strengths (creativity, reasoning, speed, etc.)
  • Ensure all models have the necessary API keys configured

Avoid:

  • Only choosing models from the same tier (loses routing advantages)
  • Including too many models (increases decision complexity)

FAQ

Q: How much latency does intelligent routing add?

A: Routing decisions typically complete within 50–100 ms, with negligible impact for most applications. The actual request response time mainly depends on the selected model’s processing speed.

Q: How many models should the candidate pool include?

A: We recommend 3–5 models. Too few cannot fully leverage routing advantages; too many increase decision complexity with diminishing returns.

Q: What factors does intelligent routing consider?

A: The routing system considers multiple factors:

  • Prompt content and length
  • Task type (conversation, creation, reasoning, etc.)
  • Model performance metrics (accuracy, speed)
  • Model pricing
  • Current load and availability
  • Your preference setting

Q: Can I view detailed routing decision logs?

A: The response returns the actual model used (response.model). You can also view call logs in the ZenMux user console to see the routing details for each request.

Q: Can I use model routing and provider routing together?

A: Yes. Model routing chooses the most suitable model, while provider routing chooses the optimal provider for the selected model. Using both together enables end-to-end intelligent optimization. For details, see the Provider Routing documentation.

Contact Us

If you encounter any issues during use or have suggestions and feedback, please contact us via:

For more contact options and details, visit our Contact Us page.