Google Vertex AI API: Generate Content

Non-streaming:

POST https://zenmux.ai/api/vertex-ai/v1/publishers/{provider}/models/{model}:generateContent

Streaming:

POST https://zenmux.ai/api/anthropic/v1/publishers/{provider}/models/{model}:streamGenerateContent

ZenMux supports the Google Vertex AI API via the Gen AI SDK. For detailed request parameters and response schemas, see the official Google Vertex AI documentation.

Path parameters

provider `string`

Model provider (e.g., google).

model `string`

Model name (e.g., gemini-2.5-pro).

Request headers

Authorization `string`

Bearer token authentication.

Content-Type `string`

Default: application/json

Request body

The request body is JSON.

contents `array<Content>`

The current conversation content (single turn / multi-turn history + current input).

role `string`

The content producer (defaults to user).

user: Indicates the message was sent by a human (typically user-generated).
model: Indicates the message was generated by the model.

parts `Part[]`

At least 1 Part.

part

text string
Text prompt or code snippet.
inlineData Blob
Inline data as raw bytes.
- mimeType string
```
The media type of the file specified in the data or fileUri field. Acceptable values include:
```
  - application/pdf
  - audio/mpeg
  - audio/mp3
  - audio/wav
  - image/png
  - image/jpeg
  - image/webp
  - text/plain
  - video/mov
  - video/mpeg
  - video/mp4
  - video/mpg
  - video/avi
  - video/wmv
  - video/mpegps
  - video/flv
  - video/x-ms-wmv
- data bytes
  Inline data as raw bytes.
fileData FileData
Data stored in a file.
- mimeType string
- fileUri string
  The URI or URL of the file to include in the prompt.
functionCall FunctionCall
Contains a string representing the FunctionDeclaration.name field, along with a structured JSON object containing all parameters for the function call predicted by the model.
- name string
  The name of the function to call.
- args Record<string,any>
  Function arguments and values as a JSON object.
functionResponse FunctionResponse
The output of a FunctionCall, containing a string representing the FunctionDeclaration.name field and a structured JSON object containing any output from the function call. It is used as context for the model.
- name string
  The name of the function to call.
- response Record<string,any>
  Function response as a JSON object.
videoMetadata VideoMetadata
For video inputs: start and end offsets (duration format), and the frame rate.
- startOffset number
  Video start offset (duration format).
- endOffset number
  Video end offset (duration format).
- fps number
  Video frame rate.
mediaResolution enum
Controls how input media is processed. If specified, this overrides the mediaResolution setting in generationConfig. LOW reduces the token count per image/video, which may lose details, but allows longer videos in context. Supported values: HIGH, MEDIUM, LOW.

cachedContent `string`

Cached content resource name (used as context):

projects/{project}/locations/{location}/cachedContents/{cachedContent}

tools `array<Tool>`

A list of tools (e.g., function calling, retrieval, search, code execution, etc.).

toolConfig `ToolConfig`

Tool configuration (shared across all tools in this request).

safetySettings `array<SafetySetting>`

Per-request safety settings (applies to candidates).

category `string`

The safety category for which to configure a threshold.

HARM_CATEGORY_UNSPECIFIED: Harm category unspecified.
HARM_CATEGORY_HATE_SPEECH: Harm category is hate speech.
HARM_CATEGORY_HARASSMENT: Harm category is harassment.
HARM_CATEGORY_SEXUALLY_EXPLICIT: Harm category is sexually explicit content.
HARM_CATEGORY_DANGEROUS_CONTENT: Harm category is dangerous content.

threshold `string`

The threshold for blocking responses that fall into the specified safety category, based on probability.

OFF: Disable safety settings when all categories are turned off
BLOCK_NONE: Block nothing.
BLOCK_ONLY_HIGH: Block only high-threshold content (i.e., block less content).
BLOCK_MEDIUM_AND_ABOVE: Block medium-threshold content and above.
BLOCK_LOW_AND_ABOVE: Block low-threshold content and above (i.e., block more content).
HARM_BLOCK_THRESHOLD_UNSPECIFIED: Unspecified harm block threshold.

method `string`

Specifies whether the threshold is applied to the probability score or the severity score. If unspecified, the system applies the threshold to the probability score.

HARM_BLOCK_METHOD_UNSPECIFIED: Unspecified harm block method.
SEVERITY: Uses both likelihood and severity scores.
PROBABILITY: Uses probability scores.

generationConfig `GenerationConfig`

Generation parameters (controls sampling, length, stop conditions, structured output, logprobs, audio timestamps, thinking, media processing quality, etc.).

temperature `number`

Controls output randomness/diversity. Lower values are more deterministic and “test-like”; higher values are more creative/diverse. 0 tends to pick the highest-probability token every time, so it is closer to deterministic (but may still vary slightly). If responses are too templated/too short, try increasing it; if you see issues like “infinite generation,” you can also try raising temperature to at least 0.1. Ranges/defaults vary by model (e.g., some Gemini Flash models commonly use 0.0~2.0 with a default of 1.0).

topP `number`

Nucleus sampling threshold: the model samples only from the smallest set of tokens whose cumulative probability reaches topP. Lower values are more conservative/less random; higher values are more diverse. Range 0.0~1.0 (defaults vary by model). Generally, it’s recommended to primarily tune either temperature or topP, and not make large adjustments to both.

candidateCount `integer`

Number of candidates to return (response variations). Output tokens for all candidates are billable (inputs are typically billed once). Multi-candidate output is often a preview capability and generally only supported by generateContent (not streamGenerateContent). Different models impose different ranges/maxima (e.g., some support 1~8).

maxOutputTokens `integer`

Maximum number of output tokens to limit response length; roughly, a token can be understood as about 4 characters in English. Smaller values produce shorter outputs; larger values allow longer outputs.

stopSequences `array<string>`

Stop sequence list: generation stops immediately when the model output hits any stop sequence and is truncated at the first occurrence; case-sensitive. Up to 5 items.

presencePenalty `number`

Presence penalty: penalizes tokens that have already appeared in the “generated text,” increasing the likelihood of generating new content/diversity. Range -2.0 ~ <2.0.

frequencyPenalty `number`

Frequency penalty: penalizes tokens that appear repeatedly, reducing the probability of repetitive generation. Range -2.0 ~ <2.0.

seed `integer`

Random seed: with a fixed seed, the model will “try” to return the same result for repeated requests, but full determinism is not guaranteed; changes in model versions or parameters (e.g., temperature) can also cause differences. If omitted, a random seed is used by default.

responseMimeType `string`

Specifies the MIME type of candidate outputs. Common options:

text/plain (default): plain text output
application/json: JSON output (for structured output / JSON mode)
text/x.enum: for classification tasks to output enum values defined by responseSchema

Note: If you want to constrain structured output via responseSchema, you must set responseMimeType to a supported non-text/plain type (e.g., application/json).

responseSchema `object`

Schema for structured output: constrains candidate text to conform to the schema (for “controlled generation / JSON Schema” scenarios). When using this field, you must set responseMimeType to a supported non-text/plain type.

logprobs `integer`

Returns the log probability of the top candidate tokens at each generation step. Range 1~20. You must also enable responseLogprobs=true to use this field; and the token chosen by the model may not be the top candidate token.

audioTimestamp `boolean`

Audio timestamp understanding: timestamp interpretation capability for audio-only files (preview). Supported only by some models (e.g., some Gemini Flash models).

thinkingConfig `object`

“Thinking” configuration for Gemini 2.5 and later. Fields listed in the official docs include:

thinkingBudget integer: token budget for thinking; default is model-controlled, with common maxima around 8192 tokens.
thinkingLevel enum: controls internal reasoning intensity; common values LOW / HIGH. Higher values may improve quality on complex tasks but increase latency and cost.

mediaResolution `enum`

Controls how input media (images/videos) is processed: LOW reduces tokens per image/video (may lose detail but allows longer videos in context). Supported values are typically HIGH, MEDIUM, LOW.

systemInstruction `Content`

System instruction (guides the model’s overall behavior; recommended to use only text parts in parts, with each part as a separate paragraph).

Response (non-streaming)

The response schema provided by the official docs is as follows:

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": string
          }
        ]
      },
      "finishReason": enum (FinishReason),
      "safetyRatings": [
        {
          "category": enum (HarmCategory),
          "probability": enum (HarmProbability),
          "blocked": boolean
        }
      ],
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "uri": string,
            "title": string,
            "license": string,
            "publicationDate": {
              "year": integer,
              "month": integer,
              "day": integer
            }
          }
        ]
      },
      "avgLogprobs": double,
      "logprobsResult": {
        "topCandidates": [
          {
            "candidates": [
              {
                "token": string,
                "logProbability": float
              }
            ]
          }
        ],
        "chosenCandidates": [
          {
            "token": string,
            "logProbability": float
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "promptTokenCount": integer,
    "candidatesTokenCount": integer,
    "totalTokenCount": integer

    // (possible extended stats)
    // "cachedContentTokenCount": integer,
    // "thoughtsTokenCount": integer,
    // "toolUsePromptTokenCount": integer,
    // "promptTokensDetails": [...],
    // "candidatesTokensDetails": [...],
    // "toolUsePromptTokensDetails": [...]
  },
  "modelVersion": string,
  "createTime": string,
  "responseId": string
}

candidates `array<Candidate>`

The list of candidate results returned for this generation.

Candidate.content `object`

Candidate content.

content.parts array
An array of content parts.
- parts[].text string
  The generated text.

Candidate.finishReason `enum (FinishReason)`

Why the model stopped generating tokens; if empty, it indicates generation has not stopped yet.

Common values (as listed in the official docs):

FINISH_REASON_STOP: Natural stopping point or a stop sequence was hit
FINISH_REASON_MAX_TOKENS: Reached the max token limit specified in the request
FINISH_REASON_SAFETY: Stopped for safety reasons (if the filter blocks output, Candidate.content is empty)
FINISH_REASON_RECITATION: Stopped due to flagged unauthorized recitation
FINISH_REASON_BLOCKLIST: Contains blocked terms
FINISH_REASON_PROHIBITED_CONTENT: Contains prohibited content (e.g., CSAM)
FINISH_REASON_IMAGE_PROHIBITED_CONTENT: An image in the prompt contains prohibited content
FINISH_REASON_NO_IMAGE: The prompt was expected to include an image but none was provided
FINISH_REASON_SPII: Contains sensitive personally identifiable information (SPII)
FINISH_REASON_MALFORMED_FUNCTION_CALL: Function call format is invalid/unparseable
FINISH_REASON_OTHER: Other reasons
FINISH_REASON_UNSPECIFIED: Unspecified

Candidate.safetyRatings `array<SafetyRating>`

An array of safety ratings.

safetyRatings[].category enum (HarmCategory)
Safety category (e.g., HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_HATE_SPEECH, HARM_CATEGORY_HARASSMENT, HARM_CATEGORY_DANGEROUS_CONTENT).
safetyRatings[].probability enum (HarmProbability)
Harm probability level: NEGLIGIBLE / LOW / MEDIUM / HIGH, etc.
safetyRatings[].blocked boolean
Indicates whether model input or output was blocked.

Candidate.citationMetadata `object`

Citation info (when the output contains citations).

citationMetadata.citations array<Citation>
- citations[].startIndex integer
  Start position of the citation in content, measured in bytes of the UTF-8 response.
- citations[].endIndex integer
  End position of the citation in content, also in bytes.
- citations[].uri string
  Source URL (the docs describe this field as url/URL; the example schema uses uri as the field name).
- citations[].title string
  Source title.
- citations[].license string
  Associated license.
- citations[].publicationDate object
  Publication date (valid formats: YYYY / YYYY-MM / YYYY-MM-DD).
  - publicationDate.year integer
  - publicationDate.month integer
  - publicationDate.day integer

Candidate.avgLogprobs `double`

Average log probability of the candidate.

Candidate.logprobsResult `object`

Top-ranked candidate tokens (topCandidates) and the tokens actually chosen (chosenCandidates) at each step.

logprobsResult.topCandidates array
- topCandidates[].candidates array
  - candidates[].token string: token (character/word/phrase, etc.)
  - candidates[].logProbability float: log probability (confidence) of the token
logprobsResult.chosenCandidates array
- chosenCandidates[].token string
- chosenCandidates[].logProbability float

usageMetadata `object`

Token usage statistics.

usageMetadata.promptTokenCount integer
Tokens in the request.
usageMetadata.candidatesTokenCount integer
Tokens in the response.
usageMetadata.totalTokenCount integer
Total tokens for request + response.
(may appear) thoughtsTokenCount / toolUsePromptTokenCount / cachedContentTokenCount and per-modality details.

Note: The official docs add that, for billing purposes, in Gemini 3 Pro and later models, tokens consumed when processing “document inputs” are counted as image tokens.

modelVersion `string`

The model and version used for generation (example: gemini-2.0-flash-lite-001).
Below is the per-chunk response body for Vertex AI streamGenerateContent (SSE streaming).
Key points: Streaming returns multiple chunks; each chunk’s JSON schema is still GenerateContentResponse. Intermediate chunks often have an empty finishReason, and the final chunk provides termination info such as finishReason/finishMessage.

createTime `string`

Time when the server received the request (RFC3339 Timestamp).

responseId `string`

Response identifier.

Response (streaming: response body for each stream chunk)

{
  "candidates": [
    {
      "index": integer,
      "content": {
        "role": string,
        "parts": [
          {
            "thought": boolean,
            "thoughtSignature": string, // bytes(base64)
            "mediaResolution": {
              "level": enum,
              "numTokens": integer
            },

            // Union field data (only one of the following fields will appear at a time)
            "text": string,
            "inlineData": { "mimeType": string, "data": string, "displayName": string },
            "fileData": { "mimeType": string, "fileUri": string, "displayName": string },
            "functionCall": {
              "id": string,
              "name": string,
              "args": object,
              "partialArgs": [
                {
                  "jsonPath": string,
                  "stringValue": string,
                  "numberValue": number,
                  "boolValue": boolean,
                  "nullValue": string,
                  "willContinue": boolean
                }
              ],
              "willContinue": boolean
            },
            "functionResponse": {
              "id": string,
              "name": string,
              "response": object,
              "parts": [
                {
                  "inlineData": { /* bytes blob */ },
                  "fileData": { /* file ref */ }
                }
              ],
              "scheduling": enum,
              "willContinue": boolean
            },
            "executableCode": { "language": enum, "code": string },
            "codeExecutionResult": { "outcome": enum, "output": string },

            // Union field metadata (only when inlineData/fileData is video)
            "videoMetadata": {
              "startOffset": string,
              "endOffset": string,
              "fps": number
            }
          }
        ]
      },
      "avgLogprobs": number,
      "logprobsResult": {
        "topCandidates": [
          {
            "candidates": [
              { "token": string, "tokenId": integer, "logProbability": number }
            ]
          }
        ],
        "chosenCandidates": [
          { "token": string, "tokenId": integer, "logProbability": number }
        ]
      },
      "finishReason": enum,
      "safetyRatings": [
        { "category": enum, "probability": enum, "blocked": boolean }
      ],
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "uri": string,
            "title": string,
            "license": string,
            "publicationDate": { "year": integer, "month": integer, "day": integer }
          }
        ]
      },
      "groundingMetadata": {
        "webSearchQueries": [ string ],
        "retrievalQueries": [ string ],
        "groundingChunks": [
          {
            "web": { "uri": string, "title": string, "domain": string },
            "retrievedContext": object,
            "maps": object
          }
        ],
        "groundingSupports": [ object ],
        "sourceFlaggingUris": [ object ],
        "searchEntryPoint": { "renderedContent": string, "sdkBlob": string },
        "retrievalMetadata": { "googleSearchDynamicRetrievalScore": number },
        "googleMapsWidgetContextToken": string
      },
      "urlContextMetadata": {
        "urlMetadata": [
          { "retrievedUrl": string, "urlRetrievalStatus": enum }
        ]
      },
      "finishMessage": string
    }
  ],
  "modelVersion": string,
  "createTime": string,
  "responseId": string,
  "promptFeedback": {
    "blockReason": enum,
    "blockReasonMessage": string,
    "safetyRatings": [
      { "category": enum, "probability": enum, "blocked": boolean }
    ]
  },
  "usageMetadata": {
    "promptTokenCount": integer,
    "candidatesTokenCount": integer,
    "totalTokenCount": integer

    // (possible extended stats)
    // "cachedContentTokenCount": integer,
    // "thoughtsTokenCount": integer,
    // "toolUsePromptTokenCount": integer,
    // "promptTokensDetails": [...],
    // "candidatesTokensDetails": [...],
    // "toolUsePromptTokensDetails": [...]
  }
}

promptFeedback: Returned only in the first stream chunk, and only appears when no candidates are generated due to policy violations.
finishMessage: Returned only when finishReason has a value.

candidates `array<Candidate>`

The candidate list for this chunk.

Candidate.index `integer`

Candidate index (starting from 0).

Candidate.content `object`

Candidate content (multiple parts).

content.role `string`

Producer role: typically 'user' or 'model'.

content.parts `array<Part>`

An array of content parts; each part is a “single-type” block of data (text / inlineData / functionCall …).

Part object（content.parts[]）

Part.thought `boolean`

Whether this part is a “thought/reasoning” part.

Part.thoughtSignature `string(bytes)`

Reusable signature for the thought (base64).

Part.mediaResolution `PartMediaResolution`

Input media resolution (affects media tokenization).

mediaResolution.level enum (PartMediaResolutionLevel): LOW / MEDIUM / HIGH / ULTRA_HIGH / UNSPECIFIED.
mediaResolution.numTokens integer: Expected length of the media token sequence.

Part.data (Union)

Part.text `string`

Text content (the most common place where streaming incremental output appears).

Part.inlineData `Blob`

Inline binary data (base64).

inlineData.mimeType string: IANA MIME Type.
inlineData.data string(bytes): base64 bytes.
inlineData.displayName string: Optional display name (returned only in some scenarios).

Part.fileData `FileData`

Reference to an external file (e.g., GCS).

fileData.mimeType string: IANA MIME Type.
fileData.fileUri string: File URI.
fileData.displayName string: Optional display name.

Part.functionCall `FunctionCall`

A function call predicted by the model.

functionCall.id string: Function call id (used to match with functionResponse).
functionCall.name string: Function name.
functionCall.args object: Function arguments (JSON object).
functionCall.partialArgs array<PartialArg>: Streaming function-argument deltas (available in some APIs/modes).
functionCall.willContinue boolean: Whether additional incremental fragments for this FunctionCall will follow.

PartialArg (for functionCall.partialArgs)

jsonPath string: Path to the parameter being streamed incrementally (RFC 9535).
stringValue / numberValue / boolValue / nullValue: The incremental value in this update (one of four).
willContinue boolean: Whether this jsonPath has more incremental updates to come.

Part.functionResponse `FunctionResponse`

The structure used when you send tool execution results back to the model (and it may also appear in responses in some modes).

functionResponse.id string: Corresponding functionCall.id.
functionResponse.name string: Function name (matches functionCall.name).
functionResponse.response object: Function result (JSON object; conventionally output/error).
functionResponse.parts array<FunctionResponsePart>: Multi-part form of the function response (can include files/inline data).
functionResponse.scheduling enum (FunctionResponseScheduling): SILENT / WHEN_IDLE / INTERRUPT / …
functionResponse.willContinue boolean: Whether more response fragments will follow.

Part.executableCode `ExecutableCode`

Code generated by the model for a code-execution tool.

executableCode.language enum (Language): e.g., PYTHON.
executableCode.code string: Code string.

Part.codeExecutionResult `CodeExecutionResult`

Result of code execution.

codeExecutionResult.outcome enum (Outcome): OUTCOME_OK / OUTCOME_FAILED / OUTCOME_DEADLINE_EXCEEDED.
codeExecutionResult.output string: stdout or error message.

Part.metadata (Union)

Part.videoMetadata `VideoMetadata`

Metadata used only when the part carries video data.

videoMetadata.startOffset string: Start offset.
videoMetadata.endOffset string: End offset.
videoMetadata.fps number: Frame rate.

Candidate.avgLogprobs `number`

Candidate average logprob (length-normalized).

Candidate.logprobsResult `LogprobsResult`

Logprobs details.

logprobsResult.topCandidates array<LogprobsResultTopCandidates>: Per-step top token list.
- topCandidates[].candidates array<LogprobsResultCandidate>: Sorted by logProbability descending.
logprobsResult.chosenCandidates array<LogprobsResultCandidate>: The final sampled/chosen token per step.
LogprobsResultCandidate.token / tokenId / logProbability: Token, tokenId, log probability.

Candidate.finishReason `enum (FinishReason)`

Stop reason; empty means “not finished yet.”

Common values (example enums):

STOP / MAX_TOKENS / SAFETY / RECITATION / BLOCKLIST / PROHIBITED_CONTENT / SPII / MALFORMED_FUNCTION_CALL / OTHER / FINISH_REASON_UNSPECIFIED …

Candidate.safetyRatings `array<SafetyRating>`

Safety ratings for the candidate output (at most one per category).

safetyRatings[].category enum (HarmCategory): e.g., HATE_SPEECH / SEXUALLY_EXPLICIT / DANGEROUS_CONTENT / HARASSMENT / CIVIC_INTEGRITY …
safetyRatings[].probability enum (HarmProbability): NEGLIGIBLE / LOW / MEDIUM / HIGH …
safetyRatings[].blocked boolean: Whether it was filtered due to this rating.

Some APIs/SDKs may also return finer-grained fields such as probabilityScore / severity / severityScore (not guaranteed in all Vertex REST outputs).

Candidate.citationMetadata `CitationMetadata`

Citation info.

citationMetadata.citations array<Citation>
- citations[].startIndex integer: Citation start position
- citations[].endIndex integer: Citation end position
- citations[].uri string: Source URL/URI
- citations[].title string: Source title
- citations[].license string: License
- citations[].publicationDate {year,month,day}: Publication date

Candidate.groundingMetadata `GroundingMetadata`

Retrieval/evidence source info returned when grounding is enabled.

groundingMetadata.webSearchQueries string[]: Queries used for Google Search.
groundingMetadata.retrievalQueries string[]: Queries actually executed by the retrieval tool.
groundingMetadata.groundingChunks array<GroundingChunk>: Evidence chunks.
- groundingChunks[].web {uri,title,domain}: Web evidence.
- groundingChunks[].retrievedContext / maps: Other evidence sources (object shape depends on the source).
groundingMetadata.searchEntryPoint {renderedContent,sdkBlob}: Search entry-point info.
groundingMetadata.retrievalMetadata {googleSearchDynamicRetrievalScore}: Retrieval-related metadata.
Plus sourceFlaggingUris / googleMapsWidgetContextToken, etc. (when Google Maps grounding is used).

Candidate.urlContextMetadata `UrlContextMetadata`

URL retrieval metadata returned when the model uses the urlContext tool.

urlContextMetadata.urlMetadata array<UrlMetadata>: URL list.
- urlMetadata[].retrievedUrl string: The URL that was actually retrieved.
- urlMetadata[].urlRetrievalStatus enum (UrlRetrievalStatus): SUCCESS / ERROR / PAYWALL / UNSAFE / UNSPECIFIED.

Candidate.finishMessage `string`

A more detailed explanation of finishReason (returned only when finishReason has a value).

modelVersion `string`

Model version used for this generation.

createTime `string`

Time when the server received the request (RFC3339 Timestamp).

responseId `string`

Response identifier.

promptFeedback `object`

Prompt content filtering result: only appears in the first stream chunk and only when there are no candidates due to policy violations.

promptFeedback.blockReason enum (BlockedReason): Blocking reason.
promptFeedback.blockReasonMessage string: Human-readable reason (not supported in all environments).
promptFeedback.safetyRatings array<SafetyRating>: Prompt-level safety ratings.

usageMetadata `object`

Token usage.

usageMetadata.promptTokenCount integer: Prompt token count.
usageMetadata.candidatesTokenCount integer: Total candidate output tokens.
usageMetadata.totalTokenCount integer: Total token count.
(may appear) thoughtsTokenCount / toolUsePromptTokenCount / cachedContentTokenCount and per-modality details.

TypeScriptPython

TypeScript

import { GoogleGenAI } from "@google/genai";

const client = GoogleGenAI({
  apiKey: "$ZENMUX_API_KEY",
  vertexai: true,
  httpOptions: {
    baseUrl: "https://zenmux.ai/api/vertex-ai",
    apiVersion: "v1",
  },
});

const response = await client.models.generateContent({
  model: "google/gemini-2.5-pro",
  contents: "How does AI work?",
});
console.log(response);

Python

from google import genai
from google.genai import types

client = genai.Client(
    api_key="$ZENMUX_API_KEY",
    vertexai=True,
    http_options=types.HttpOptions(
        api_version='v1',
        base_url='https://zenmux.ai/api/vertex-ai'
    ),
)

response = client.models.generate_content(
    model="google/gemini-2.5-pro",
    contents="How does AI work?"
)
print(response.text)

json

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Of course. This is a fantastic question. Let's break down how AI works using a simple analogy and then add the technical details.\n\n### The Simple Analogy: Teaching a Child to Recognize a Cat\n\nImagine you're teaching a very young child what a \"cat\" is. You don't write down a long list of rules like \"a cat has pointy ears, four legs, a tail, and whiskers.\" Why? Because some cats have folded ears, some might be missing a leg, and a dog also fits that description.\n\nInstead, you do this:\n\n1.  **Show Examples:** You show the child hundreds of pictures. You point and say, \"That's a cat.\" \"That's also a cat.\" \"This is *not* a cat; it's a dog.\"\n2.  **Let Them Guess:** You show them a new picture and ask, \"Is this a cat?\"\n3.  **Give Feedback:** If they're right, you say \"Yes, good job!\" If they're wrong, you say \"No, that's a fox.\"\n\nOver time, the child's brain, without being told the specific rules, starts to recognize the *patterns* that make a cat a cat. They build an internal, intuitive understanding.\n\n**AI works in almost the exact same way.** It's a system designed to learn patterns from data without being explicitly programmed with rules.\n\n---\n\n### The Core Components of How AI Works\n\nNow, let's replace the child with a computer program. The process has three key ingredients:\n\n#### 1. Data (The Pictures)\n\nThis is the most critical ingredient. AI is fueled by data. For our example, this would be a massive dataset of thousands or millions of images, each one labeled by a human: \"cat,\" \"dog,\" \"hamster,\" etc.\n\n*   **More Data is Better:** The more examples the AI sees, the better it gets at identifying the patterns.\n*   **Good Data is Crucial:** The data must be accurate and diverse. If you only show it pictures of black cats, it will struggle to recognize a white cat.\n\n#### 2. Model / Algorithm (The Child's Brain)\n\nThis is the mathematical framework that learns from the data. Think of it as the \"engine\" that finds the patterns. When you hear terms like **\"Neural Network,\"** this is what they're referring to.\n\nA neural network is inspired by the human brain. It's made of interconnected digital \"neurons\" organized in layers.\n\n*   **Input Layer:** Takes in the raw data (e.g., the pixels of an image).\n*   **Hidden Layers:** This is where the magic happens. Each layer recognizes increasingly complex patterns. The first layer might learn to spot simple edges and colors. The next might combine those to recognize shapes like ears and tails. A deeper layer might combine those shapes to recognize a \"cat face.\"\n*   **Output Layer:** Gives the final answer (e.g., a probability score: \"95% chance this is a cat, 3% dog, 2% fox\").\n\n#### 3. The Training Process (Learning from Feedback)\n\nThis is where the **Model** learns from the **Data**. It's an automated version of showing pictures and giving feedback.\n\n1.  **Prediction (The Guess):** The model is given an input (an image of a cat) and makes a random guess. Early on, its internal settings are all random, so its guess will be wild—it might say \"50% car, 50% dog.\"\n2.  **Compare (Check the Answer):** The program compares its prediction to the correct label (\"cat\"). It then calculates its \"error\" or \"loss\"—a measure of how wrong it was.\n3.  **Adjust (Learn):** This is the key step. The algorithm uses a mathematical process (often called **\"backpropagation\"** and **\"gradient descent\"**) to slightly adjust the millions of internal connections in the neural network. The adjustments are tiny, but they are specifically designed to make the model's guess *less wrong* the next time it sees that same image.\n4.  **Repeat:** This process is repeated **millions or billions of times** with all the data. Each time, the model gets a little less wrong. Over many cycles, these tiny adjustments cause the network to get incredibly accurate at recognizing the patterns it's being shown.\n\nAfter training is complete, you have a **\"trained model.\"** You can now give it brand new data it has never seen before, and it will be able to make accurate predictions.\n\n---\n\n### Major Types of AI Learning\n\nWhile the above is the most common method, there are three main ways AI learns:\n\n**1. Supervised Learning (Learning with an Answer Key)**\nThis is the \"cat\" example we just used. The AI is \"supervised\" because it's trained on data that is already labeled with the correct answers.\n*   **Examples:** Spam filters (emails labeled \"spam\" or \"not spam\"), predicting house prices (houses with known prices), language translation.\n\n**2. Unsupervised Learning (Finding Patterns on its Own)**\nThis is like giving the AI a giant pile of data with *no labels* and asking it to \"find interesting patterns.\" The AI might group the data into clusters based on hidden similarities.\n*   **Examples:** Customer segmentation (finding groups of customers with similar buying habits), identifying anomalies in a computer network.\n\n**3. Reinforcement Learning (Learning through Trial and Error)**\nThis is how you train an AI to play a game or control a robot. The AI takes an action in an environment and receives a reward or a penalty. Its goal is to maximize its total reward over time.\n*   **Examples:** An AI learning to play chess (it gets a reward for winning the game), a robot learning to walk (it gets a reward for moving forward without falling), self-driving car simulations.\n\n### Summary\n\nSo, \"How does AI work?\"\n\n**At its core, modern AI is a system that learns to recognize incredibly complex patterns by processing vast amounts of data, making guesses, and correcting its errors over and over again until it becomes highly accurate.**\n\nIt's less about being \"intelligent\" in a human sense and more about being a phenomenally powerful pattern-matching machine."
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.4167558059635994
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 5,
    "candidatesTokenCount": 1353,
    "totalTokenCount": 2794,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 5
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 1353
      }
    ],
    "thoughtsTokenCount": 1436
  },
  "modelVersion": "google/gemini-2.5-pro",
  "createTime": "2026-01-29T08:40:38.791866Z",
  "responseId": "Bh17abqqMOSS4_UPqqeqoAc"
}

Google Vertex AI API: Generate Content ​

Non-streaming: ​

Streaming: ​

Path parameters ​

provider string ​

model string ​

Request headers ​

Authorization string ​

Content-Type string ​

Request body ​

contents array<Content> ​

role string ​

parts Part[] ​

cachedContent string ​

tools array<Tool> ​

toolConfig ToolConfig ​

safetySettings array<SafetySetting> ​

category string ​

threshold string ​

method string ​

generationConfig GenerationConfig ​

temperature number ​

topP number ​

candidateCount integer ​

maxOutputTokens integer ​

stopSequences array<string> ​

presencePenalty number ​

frequencyPenalty number ​

seed integer ​

responseMimeType string ​

responseSchema object ​

logprobs integer ​

audioTimestamp boolean ​

thinkingConfig object ​

mediaResolution enum ​

systemInstruction Content ​

Response (non-streaming) ​

candidates array<Candidate> ​

Candidate.content object ​

Candidate.finishReason enum (FinishReason) ​

Candidate.safetyRatings array<SafetyRating> ​

Candidate.citationMetadata object ​

Candidate.avgLogprobs double ​

Candidate.logprobsResult object ​

usageMetadata object ​

modelVersion string ​

createTime string ​

responseId string ​

Response (streaming: response body for each stream chunk) ​

candidates array<Candidate> ​

Candidate.index integer ​

Candidate.content object ​

content.role string ​

content.parts array<Part> ​

Part.thought boolean ​

Part.thoughtSignature string(bytes) ​

Part.mediaResolution PartMediaResolution ​

Part.data (Union) ​

Part.text string ​

Part.inlineData Blob ​

Part.fileData FileData ​

Part.functionCall FunctionCall ​

PartialArg (for functionCall.partialArgs) ​

Part.functionResponse FunctionResponse ​

Part.executableCode ExecutableCode ​

Part.codeExecutionResult CodeExecutionResult ​

Part.metadata (Union) ​

Part.videoMetadata VideoMetadata ​

Candidate.avgLogprobs number ​

Candidate.logprobsResult LogprobsResult ​

Candidate.finishReason enum (FinishReason) ​

Candidate.safetyRatings array<SafetyRating> ​

Candidate.citationMetadata CitationMetadata ​

Candidate.groundingMetadata GroundingMetadata ​

Candidate.urlContextMetadata UrlContextMetadata ​

Candidate.finishMessage string ​

modelVersion string ​

createTime string ​

responseId string ​

promptFeedback object ​

Google Vertex AI API: Generate Content

Non-streaming:

Streaming:

Path parameters

provider `string`

model `string`

Request headers

Authorization `string`

Content-Type `string`

Request body

contents `array<Content>`

role `string`

parts `Part[]`

cachedContent `string`

tools `array<Tool>`

toolConfig `ToolConfig`

safetySettings `array<SafetySetting>`

category `string`

threshold `string`

method `string`

generationConfig `GenerationConfig`

temperature `number`

topP `number`

candidateCount `integer`

maxOutputTokens `integer`

stopSequences `array<string>`

presencePenalty `number`

frequencyPenalty `number`

seed `integer`

responseMimeType `string`

responseSchema `object`

logprobs `integer`

audioTimestamp `boolean`

thinkingConfig `object`

mediaResolution `enum`

systemInstruction `Content`

Response (non-streaming)

candidates `array<Candidate>`

Candidate.content `object`

Candidate.finishReason `enum (FinishReason)`

Candidate.safetyRatings `array<SafetyRating>`

Candidate.citationMetadata `object`

Candidate.avgLogprobs `double`

Candidate.logprobsResult `object`

usageMetadata `object`

modelVersion `string`

createTime `string`

responseId `string`

Response (streaming: response body for each stream chunk)

candidates `array<Candidate>`

Candidate.index `integer`

Candidate.content `object`

content.role `string`

content.parts `array<Part>`

Part.thought `boolean`

Part.thoughtSignature `string(bytes)`

Part.mediaResolution `PartMediaResolution`

Part.data (Union)

Part.text `string`

Part.inlineData `Blob`

Part.fileData `FileData`

Part.functionCall `FunctionCall`

PartialArg (for functionCall.partialArgs)

Part.functionResponse `FunctionResponse`

Part.executableCode `ExecutableCode`

Part.codeExecutionResult `CodeExecutionResult`

Part.metadata (Union)

Part.videoMetadata `VideoMetadata`

Candidate.avgLogprobs `number`

Candidate.logprobsResult `LogprobsResult`

Candidate.finishReason `enum (FinishReason)`

Candidate.safetyRatings `array<SafetyRating>`

Candidate.citationMetadata `CitationMetadata`

Candidate.groundingMetadata `GroundingMetadata`

Candidate.urlContextMetadata `UrlContextMetadata`

Candidate.finishMessage `string`

modelVersion `string`

createTime `string`

responseId `string`

promptFeedback `object`