Create chat completion

POST https://zenmux.ai/api/v1/chat/completions

The Create chat completions interface is compatible with OpenAI's Create chat completion interface, designed for conversational large language model inference calls.

Below are all parameters that models may support. Different models support different parameters. Please refer to each model's detail page for specific supported parameters.

Request body

messages `array`

Prompts input to the large model in the form of a conversation message list. Depending on the model's capabilities, supported message types may vary, including text, images, audio, and video. For specific supported parameters, please check each model provider's documentation.

Each element in messages represents a conversation message, consisting of role and content. For details, refer to OpenAI's definition: messages.

model `string`

The model ID for this inference call, formatted as <provider>/<model_name>, such as openai/gpt-5. This can be obtained from each model's detail page.

stream `boolean` `Default false`

Specifies whether to use streaming response. Only when explicitly specifying stream: true will the response be streamed using the Server-Sent Event protocol. Otherwise, all generated content is returned at once.

max_completion_tokens `integer`

Limits the length of model-generated content, including the reasoning process. If not provided, the model's default limit will be used. The maximum generation length for each model can be found on the detail page.

temperature `float` `Default 1`

Determines the sampling temperature, typically ranging from 0 to 2, but different models may have different ranges. For example, Claude series models range from 0 to 1. Higher values increase the randomness of generated content.

It is generally not recommended to use together with top_p.

top_p `float` `Default 1`

The proportion of samples to truncate. Higher values result in more samples being included, increasing the randomness of generated content.

It is generally not recommended to use together with temperature.

frequency_penalty `float` `Default 0`

Ranges from -2.0 to 2.0, used in text generation models to control repetitive vocabulary usage by reducing the generation probability of high-frequency words to enhance text diversity. Higher values result in less repetition.

presence_penalty `float` `Default 0`

Parameter for reducing vocabulary repetition by penalizing the generation probability of words that have already appeared, reducing their likelihood of being selected again, thereby enhancing text diversity.

seed `integer`

Used to control the large model to generate the same content as much as possible based on the same seed. If not provided, a different random seed will be used each time.

logit_bias `map` `Default null`

Can be used to adjust the model's preference for specific categories. By increasing or decreasing bias for certain categories, it can influence the model's output results.

For usage, refer to OpenAI's official documentation: logit_bias.

logprobs `boolean` `Default false`

Probability distribution information for each token returned during generation, primarily used for analyzing confidence in the model generation process and debugging the model.

top_logprobs `integer`

An integer between 0 and 20, specifying the number of most likely tokens to return at each token position, each with an associated log probability. If this parameter is used, logprobs must be true.

response_format `object`

Used to control model output of structured content. If not provided, structured output is not used by default. For detailed usage of structured output, see Structured Output.

stop `string/array` `Default null`

Supported by some models only, used to specify stop sequences. Can be a string or an array of strings (to specify multiple). The model's response will not include the stop sequences.

tools `array`

List of tools available to the large model. If not provided, tool calling is not used. Currently only supports function-type tools. For detailed usage of tool calling, see Tool Calls

tool_choice `string/object`

Used to control how the model chooses to use tools, used in conjunction with the tools parameter. 'none' tells the model not to use any tools, 'auto' allows the model to freely decide whether to use tools and which ones, 'required' means the model must choose to use tools. You can also pass an object to tell the model it must choose to use a specified tool.

If tools is empty, defaults to none. If tools is not empty, defaults to auto.

parallel_tool_calls `boolean` `Default true`

Controls whether the model can select multiple tools at once.

stream_options `object`

Used to control the content returned in streaming responses, only available when stream: true.

reasoning `object`

Used to control reasoning output, supports specifying both effort and max_tokens simultaneously. Different models may have different effective fields. For details, see Reasoning Models.

Returns

If stream: true, responds using Server-Sent Event protocol, where each response content is a chat completion chunk. If stream: false, responds with JSON-formatted chat completion.

Chat completion chunk

Represents a data fragment returned by the large model's streaming response. When stream: true, many chat completion chunks are returned in sequence.

id `string`

Represents the generation id for this generation, globally unique. Can be used to query information about this generation, such as usage and cost, through the Get generation interface.

choices `array`

Represents the model's output as a list. The array will contain at most one element. Unlike OpenAI, we do not support multiple simultaneous outputs through n. Additionally, when stream_option.include_usage: true, the choices list of the last chunk will be empty.

choice property definition

delta `object`

Represents a content fragment of the model's output.

content `string`

Represents the normal output content from the model.

reasoning `string`

Represents the reasoning content output by the model.

tool_calls `array`

Represents tool calls output by the model.

finish_reason `string`

Generation end marker. If non-empty, indicates this is the last content fragment. Values typically include stop, length, content_filter, etc. For specific value ranges, please refer to each model provider's official definition.

index `integer`

Which choice this is, related to n. Since we don't support multiple simultaneous outputs through n, there will only be one choice with index value 0.

logprobs `object`

Log probability information for the choice.

usage `object`

Represents usage information for this generation. If stream_options.include_usage: true, an additional chunk with an empty choices array will be output, containing usage information.

Chat Completion

Data structure returned by the interface when stream: false, returning all model-generated content at once, including usage information.

id `string`

Represents the generation id for this generation, globally unique. Can be used to query information about this generation, such as usage and cost, through the Get generation interface.

choices `array`

Represents the model's output as a list. The array will contain at most one element. Unlike OpenAI, we do not support multiple simultaneous outputs through n.

choice property definition

message `object`

Represents a message generated by the model.

content `string`

Represents the normal output content from the model.

reasoning `string`

Represents the reasoning content output by the model.

tool_calls

Represents tool calls output by the model.

finish_reason `string`

Reason for ending generation. Values typically include stop, length, content_filter, etc. For specific value ranges, please refer to each model provider's official definition.

index `integer`

Which choice this is, related to n. Since we don't support multiple simultaneous outputs through n, there will only be one choice with index value 0.

logprobs `object`

Log probability information for the choice. :::

usage `object`

Represents usage information for this generation.

TypeScriptPythonShell

TypeScript

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: 'https://zenmux.ai/api/v1',
  apiKey: '<ZENMUX_API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: "openai/gpt-5", 
    messages: [
      {
        role: "user",
        content: "What is the meaning of life?", 
      },
    ],
  });

  console.log(completion.choices[0].message);
}

main();

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://zenmux.ai/api/v1",
    api_key="<your_ZENMUX_API_KEY>", 
)

completion = client.chat.completions.create(
    model="openai/gpt-5", 
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life?"
        }
    ]
)

print(completion.choices[0].message.content)

Shell

curl https://zenmux.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ZENMUX_API_KEY" \
  -d '{
    "model": "openai/gpt-5",
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  }'

Create chat completion ​

Request body ​

messages array ​

model string ​

stream boolean Default false ​

max_completion_tokens integer ​

temperature float Default 1 ​

top_p float Default 1 ​

frequency_penalty float Default 0 ​

presence_penalty float Default 0 ​

seed integer ​

logit_bias map Default null ​

logprobs boolean Default false ​

top_logprobs integer ​

response_format object ​

stop string/array Default null ​

tools array ​

tool_choice string/object ​

parallel_tool_calls boolean Default true ​

stream_options object ​

reasoning object ​

Returns ​

Chat completion chunk ​

id string ​

choices array ​

delta object ​

content string ​

reasoning string ​

tool_calls array ​

finish_reason string ​

index integer ​

logprobs object ​

usage object ​

Chat Completion ​

id string ​

choices array ​

message object ​

content string ​

reasoning string ​

tool_calls ​

finish_reason string ​

index integer ​

logprobs object ​

usage object ​

Create chat completion

Request body

messages `array`

model `string`

stream `boolean` `Default false`

max_completion_tokens `integer`

temperature `float` `Default 1`

top_p `float` `Default 1`

frequency_penalty `float` `Default 0`

presence_penalty `float` `Default 0`

seed `integer`

logit_bias `map` `Default null`

logprobs `boolean` `Default false`

top_logprobs `integer`

response_format `object`

stop `string/array` `Default null`

tools `array`

tool_choice `string/object`

parallel_tool_calls `boolean` `Default true`

stream_options `object`

reasoning `object`

Returns

Chat completion chunk

id `string`

choices `array`

delta `object`

content `string`

reasoning `string`

tool_calls `array`

finish_reason `string`

index `integer`

logprobs `object`

usage `object`

Chat Completion

id `string`

choices `array`

message `object`

content `string`

reasoning `string`

tool_calls

finish_reason `string`

index `integer`

logprobs `object`

usage `object`