Create chat completion
POST https://zenmux.ai/api/v1/chat/completions
The Create chat completions interface is compatible with OpenAI's Create chat completion interface, designed for conversational large language model inference calls.
Below are all parameters that models may support. Different models support different parameters. Please refer to each model's detail page for specific supported parameters.
Request body
messages array
Prompts input to the large model in the form of a conversation message list. Depending on the model's capabilities, supported message types may vary, including text, images, audio, and video. For specific supported parameters, please check each model provider's documentation.
Each element in messages represents a conversation message, consisting of role and content. For details, refer to OpenAI's definition: messages.
model string
The model ID for this inference call, formatted as <provider>/<model_name>, such as openai/gpt-5. This can be obtained from each model's detail page.
stream boolean
Default false
Specifies whether to use streaming response. Only when explicitly specifying stream: true
will the response be streamed using the Server-Sent Event protocol. Otherwise, all generated content is returned at once.
max_completion_tokens integer
Limits the length of model-generated content, including the reasoning process. If not provided, the model's default limit will be used. The maximum generation length for each model can be found on the detail page.
temperature float
Default 1
Determines the sampling temperature, typically ranging from 0 to 2, but different models may have different ranges. For example, Claude series models range from 0 to 1. Higher values increase the randomness of generated content.
It is generally not recommended to use together with top_p.
top_p float
Default 1
The proportion of samples to truncate. Higher values result in more samples being included, increasing the randomness of generated content.
It is generally not recommended to use together with temperature.
frequency_penalty float
Default 0
Ranges from -2.0 to 2.0, used in text generation models to control repetitive vocabulary usage by reducing the generation probability of high-frequency words to enhance text diversity. Higher values result in less repetition.
presence_penalty float
Default 0
Parameter for reducing vocabulary repetition by penalizing the generation probability of words that have already appeared, reducing their likelihood of being selected again, thereby enhancing text diversity.
seed integer
Used to control the large model to generate the same content as much as possible based on the same seed. If not provided, a different random seed will be used each time.
logit_bias map
Default null
Can be used to adjust the model's preference for specific categories. By increasing or decreasing bias for certain categories, it can influence the model's output results.
For usage, refer to OpenAI's official documentation: logit_bias.
logprobs boolean
Default false
Probability distribution information for each token returned during generation, primarily used for analyzing confidence in the model generation process and debugging the model.
top_logprobs integer
An integer between 0 and 20, specifying the number of most likely tokens to return at each token position, each with an associated log probability. If this parameter is used, logprobs must be true.
response_format object
Used to control model output of structured content. If not provided, structured output is not used by default. For detailed usage of structured output, see Structured Output.
stop string/array
Default null
Supported by some models only, used to specify stop sequences. Can be a string or an array of strings (to specify multiple). The model's response will not include the stop sequences.
tools array
List of tools available to the large model. If not provided, tool calling is not used. Currently only supports function-type tools. For detailed usage of tool calling, see Tool Calls
tool_choice string/object
Used to control how the model chooses to use tools, used in conjunction with the tools parameter. 'none' tells the model not to use any tools, 'auto' allows the model to freely decide whether to use tools and which ones, 'required' means the model must choose to use tools. You can also pass an object to tell the model it must choose to use a specified tool.
If tools is empty, defaults to none. If tools is not empty, defaults to auto.
parallel_tool_calls boolean
Default true
Controls whether the model can select multiple tools at once.
stream_options object
Used to control the content returned in streaming responses, only available when stream: true.
reasoning object
Used to control reasoning output, supports specifying both effort and max_tokens simultaneously. Different models may have different effective fields. For details, see Reasoning Models.
Returns
If stream: true, responds using Server-Sent Event protocol, where each response content is a chat completion chunk. If stream: false, responds with JSON-formatted chat completion.
Chat completion chunk
Represents a data fragment returned by the large model's streaming response. When stream: true, many chat completion chunks are returned in sequence.
id string
Represents the generation id for this generation, globally unique. Can be used to query information about this generation, such as usage and cost, through the Get generation interface.
choices array
Represents the model's output as a list. The array will contain at most one element. Unlike OpenAI, we do not support multiple simultaneous outputs through n. Additionally, when stream_option.include_usage: true, the choices list of the last chunk will be empty.
choice property definition
delta object
Represents a content fragment of the model's output.
content string
Represents the normal output content from the model.
reasoning string
Represents the reasoning content output by the model.
tool_calls array
Represents tool calls output by the model.
finish_reason string
Generation end marker. If non-empty, indicates this is the last content fragment. Values typically include stop, length, content_filter, etc. For specific value ranges, please refer to each model provider's official definition.
index integer
Which choice this is, related to n. Since we don't support multiple simultaneous outputs through n, there will only be one choice with index value 0.
logprobs object
Log probability information for the choice.
usage object
Represents usage information for this generation. If stream_options.include_usage: true, an additional chunk with an empty choices array will be output, containing usage information.
Chat Completion
Data structure returned by the interface when stream: false, returning all model-generated content at once, including usage information.
id string
Represents the generation id for this generation, globally unique. Can be used to query information about this generation, such as usage and cost, through the Get generation interface.
choices array
Represents the model's output as a list. The array will contain at most one element. Unlike OpenAI, we do not support multiple simultaneous outputs through n.
choice property definition
message object
Represents a message generated by the model.
content string
Represents the normal output content from the model.
reasoning string
Represents the reasoning content output by the model.
tool_calls
Represents tool calls output by the model.
finish_reason string
Reason for ending generation. Values typically include stop, length, content_filter, etc. For specific value ranges, please refer to each model provider's official definition.
index integer
Which choice this is, related to n. Since we don't support multiple simultaneous outputs through n, there will only be one choice with index value 0.
logprobs object
Log probability information for the choice. :::
usage object
Represents usage information for this generation.
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: 'https://zenmux.ai/api/v1',
apiKey: '<ZENMUX_API_KEY>',
});
async function main() {
const completion = await openai.chat.completions.create({
model: "openai/gpt-5",
messages: [
{
role: "user",
content: "What is the meaning of life?",
},
],
});
console.log(completion.choices[0].message);
}
main();
from openai import OpenAI
client = OpenAI(
base_url="https://zenmux.ai/api/v1",
api_key="<your_ZENMUX_API_KEY>",
)
completion = client.chat.completions.create(
model="openai/gpt-5",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)
curl https://zenmux.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $ZENMUX_API_KEY" \
-d '{
"model": "openai/gpt-5",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
}'