Skip to main content
Portkey provides a robust and secure gateway to facilitate the integration of various Large Language Models (LLMs), and embedding models into your apps, including Google Vertex AI. With Portkey, you can take advantage of features like fast AI gateway access, observability, prompt management, and more, all while ensuring the secure management of your Vertex auth through a Portkey’s Model Catalog
Provider Slug. vertex-ai

Portkey SDK Integration with Google Vertex AI

Portkey provides a consistent API to interact with models from various providers. To integrate Google Vertex AI with Portkey:

1. Install the Portkey SDK

Add the Portkey SDK to your application to interact with Google Vertex AI API through Portkey’s gateway.
npm install --save portkey-ai

2. Initialize Portkey Client

To integrate Vertex AI with Portkey, you’ll need your Vertex Project Id Or Service Account JSON & Vertex Region, with which you can set up the Portkey’s AI Provider. Here’s a guide on how to find your Vertex Project details If you are integrating through Service Account File, refer to this guide.
import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]

})
If you do not want to add your Vertex AI details to Portkey vault, you can directly pass them while instantiating the Portkey client. More on that here.

3. Invoke Chat Completions with Vertex AI

Use the Portkey instance to send requests to any models hosted on Vertex AI. You can also override the Portkey’s AI Provider directly in the API call if needed.
Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request.
const chatCompletion = await portkey.chat.completions.create({
    messages: [{ role: 'user', content: 'Say this is a test' }],
    model: '@VERTEX_PROVIDER/gemini-1.5-pro-latest', // your model slug from Portkey's Model Catalog
}, {Authorization: "Bearer $YOUR_VERTEX_ACCESS_TOKEN"});

console.log(chatCompletion.choices);
To use Anthopic models on Vertex AI, prepend anthropic. to the model name.
Example: @VERTEX_PROVIDER/anthropic.claude-3-5-sonnet@20240620
Similarly, for Meta models, prepend meta. to the model name.
Example: @VERTEX_PROVIDER/meta.llama-3-8b-8192

Using the /messages Route with Vertex AI Models

Access Claude models on Vertex AI through Anthropic’s native/messages endpoint using Portkey’s SDK or Anthropic’s SDK.
This route only works with Claude models. For other models, use the standard OpenAI compliant endpoint.
curl --location 'https://api.portkey.ai/v1/messages' \
--header 'Content-Type: application/json' \
--header 'x-portkey-api-key: YOUR_PORTKEY_API_KEY' \
--data '{
    "model": "@YOUR_VERTEX_PROVIDER/MODEL_NAME",
    "max_tokens": 250,
    "messages": [
        {
            "role": "user",
            "content": "Hello, Claude"
        }
    ]
}'

Counting Tokens

Portkey also supports the token counting endpoint for vertex. Checkout the example in this link for more details.

Using Self-Deployed Models on Vertex AI (Hugging Face, Custom Models)

Portkey supports connecting to self-deployed models on Vertex AI, including models from Hugging Face or any custom models you’ve deployed to a Vertex AI endpoint. Requirements for Self-Deployed Models To use self-deployed models on Vertex AI through Portkey:
  1. Model Naming Convention: When making requests to your self-deployed model, you must prefix the model name with endpoints.
    endpoints.my_endpoint_name
    
  2. Required Permissions: The Google Cloud service account used in your Portkey Model Catalog must have the aiplatform.endpoints.predict permission.
const chatCompletion = await portkey.chat.completions.create({
    messages: [{ role: 'user', content: 'Say this is a test' }],
    model: 'endpoints.my_custom_llm', // Notice the 'endpoints.' prefix
}, {Authorization: "Bearer $YOUR_VERTEX_ACCESS_TOKEN"});

console.log(chatCompletion.choices);
Why the prefix? Vertex AI’s product offering for self-deployed models is called “Endpoints.” This naming convention indicates to Portkey that it should route requests to your custom endpoint rather than a standard Vertex AI model.This approach works for all models you can self-deploy on Vertex AI Model Garden, including Hugging Face models and your own custom models.

Document, Video, Audio Processing

Vertex AI supports attaching webm, mp4, pdf, jpg, mp3, wav, etc. file types to your Gemini messages. Using Portkey, here’s how you can send these media files:
const chatCompletion = await portkey.chat.completions.create({
    messages: [
        { role: 'system', content: 'You are a helpful assistant' },
        { role: 'user', content: [
            {
                type: 'image_url',
                image_url: {
                    url: 'gs://cloud-samples-data/generative-ai/image/scones.jpg'
                }
            },
            {
                type: 'text',
                text: 'Describe the image'
            }
        ]}
    ],
    model: 'gemini-1.5-pro-001',
    max_tokens: 200
});

Document Processing (PDF)

Gemini’s vision capabilities excel at understanding the content of PDF documents, including text, tables, and images.

Gemini Documents Understanding Docs

Method 1: Sending a Document via Google Files URL Upload your PDF using the Files API to get a Google Files URL.
const chatCompletion = await portkey.chat.completions.create({
    model: 'gemini-1.5-pro',
    messages: [{
        role: 'user',
        content: [
            {
                type: 'image_url',
                image_url: {
                    url: 'https://generativelanguage.googleapis.com/v1beta/files/your-pdf-file-id'
                }
            },
            { type: 'text', text: 'Summarize the key findings of this research paper.' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);
Method 2: Sending a Local Document as Base64 Data This is suitable for smaller, local PDF files.
import fs from 'fs';

const pdfBytes = fs.readFileSync('whitepaper.pdf');
const base64Pdf = pdfBytes.toString('base64');
const pdfUri = `data:application/pdf;base64,${base64Pdf}`;

const chatCompletion = await portkey.chat.completions.create({
    model: '@VERTEX_PROVIDER/MODEL_NAME',
    messages: [{
        role: 'user',
        content: [
            { type: 'image_url', image_url: { url: pdfUri }},
            { type: 'text', text: 'What is the main conclusion of this document?' }
        ]
    }],
});
console.log(chatCompletion.choices[0].message.content);
While you can send other document types like .txt or .html, they will be treated as plain text. Gemini’s native document vision capabilities are optimized for the application/pdf MIME type.

Extended Thinking (Reasoning Models) (Beta)

The assistants thinking response is returned in the response_chunk.choices[0].delta.content_blocks array, not the response.choices[0].message.content string.Gemini models do not support plugging back the reasoning into multi turn conversations, so you don’t need to send the thinking message back to the model.
Models like google.gemini-2.5-flash-preview-04-17 anthropic.claude-3-7-sonnet@20250219 support extended thinking. This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well. Note that you will have to set strict_open_ai_compliance=False in the headers to use this feature.

Single turn conversation

from portkey_ai import Portkey

# Initialize the Portkey clien
portkey = Portkey(
    api_key="PORTKEY_API_KEY",  # Replace with your Portkey API key
    strict_open_ai_compliance=False
)

# Create the request
response = portkey.chat.completions.create(
  model="@VERTEX_PROVIDER/anthropic.claude-3-7-sonnet@20250219", # your model slug from Portkey's Model Catalog
  max_tokens=3000,
  thinking={
      "type": "enabled",
      "budget_tokens": 2030
  },
  stream=True,
  messages=[
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "when does the flight from new york to bengaluru land tomorrow, what time, what is its flight number, and what is its baggage belt?"
              }
          ]
      }
  ]
)
print(response)
# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
# response = portkey.chat.completions.create(
#   ...same config as above but with stream: true
# )
# for chunk in response:
#     if chunk.choices[0].delta:
#         content_blocks = chunk.choices[0].delta.get("content_blocks")
#         if content_blocks is not None:
#             for content_block in content_blocks:
#                 print(content_block)
To disable thinking for gemini models like google.gemini-2.5-flash-preview-04-17, you are required to explicitly set budget_tokens to 0.
"thinking": {
    "type": "enabled",
    "budget_tokens": 0
}

Multi turn conversation

from portkey_ai import Portkey

# Initialize the Portkey client
portkey = Portkey(
    api_key="PORTKEY_API_KEY",  # Replace with your Portkey API key
    strict_open_ai_compliance=False
)

# Create the request
response = portkey.chat.completions.create(
  model="@VERTEX_PROVIDER/anthropic.claude-3-7-sonnet@20250219", # your model slug from Portkey's Model Catalog
  max_tokens=3000,
  thinking={
      "type": "enabled",
      "budget_tokens": 2030
  },
  stream=True,
  messages=[
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "when does the flight from baroda to bangalore land tomorrow, what time, what is its flight number, and what is its baggage belt?"
              }
          ]
      },
      {
          "role": "assistant",
          "content": [
                  {
                      "type": "thinking",
                      "thinking": "The user is asking several questions about a flight from Baroda (also known as Vadodara) to Bangalore:\n1. When does the flight land tomorrow\n2. What time does it land\n3. What is the flight number\n4. What is the baggage belt number at the arrival airport\n\nTo properly answer these questions, I would need access to airline flight schedules and airport information systems. However, I don't have:\n- Real-time or scheduled flight information\n- Access to airport baggage claim allocation systems\n- Information about specific flights between these cities\n- The ability to look up tomorrow's specific flight schedules\n\nThis question requires current, specific flight information that I don't have access to. Instead of guessing or providing potentially incorrect information, I should explain this limitation and suggest ways the user could find this information.",
                      "signature": "EqoBCkgIARABGAIiQBVA7FBNLRtWarDSy9TAjwtOpcTSYHJ+2GYEoaorq3V+d3eapde04bvEfykD/66xZXjJ5yyqogJ8DEkNMotspRsSDKzuUJ9FKhSNt/3PdxoMaFZuH+1z1aLF8OeQIjCrA1+T2lsErrbgrve6eDWeMvP+1sqVqv/JcIn1jOmuzrPi2tNz5M0oqkOO9txJf7QqEPPw6RG3JLO2h7nV1BMN6wE="
                  }
          ]
      },
      {
          "role": "user",
          "content": "thanks that's good to know, how about to chennai?"
      }
  ]
)
print(response)

Sending base64 Image

Here, you can send the base64 image data along with the url field too:
"url": "data:image/png;base64,UklGRkacAABXRUJQVlA4IDqcAAC....."
This same message format also works for all other media types — just send your media file in the url field, like "url": "gs://cloud-samples-data/video/animals.mp4" for google cloud urls and "url":"https://download.samplelib.com/mp3/sample-3s.mp3" for public urlsYour URL should have the file extension, this is used for inferring MIME_TYPE which is a required parameter for prompting Gemini models with files

Text Embedding Models

You can use any of Vertex AI’s English and Multilingual models through Portkey, in the familar OpenAI-schema.
The Gemini-specific parameter task_type is also supported on Portkey.
import Portkey from 'portkey-ai';

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY",
});

// Generate embeddings
async function getEmbeddings() {
    const embeddings = await portkey.embeddings.create({
        input: "embed this",
        model: "@VERTEX_PROVIDER/text-multilingual-embedding-002", // your model slug from Portkey's Model Catalog
        // @ts-ignore (if using typescript)
        task_type: "CLASSIFICATION", // Optional
    }, {Authorization: "Bearer $YOUR_VERTEX_ACCESS_TOKEN"});

    console.log(embeddings);
}
await getEmbeddings();

Function Calling

Portkey supports function calling mode on Google’s Gemini Models. Explore this Cookbook for a deep dive and examples:

Managing Vertex AI Prompts

You can manage all prompts to Google Gemini in the Prompt Library. All the models in the model garden are supported and you can easily start testing different prompts. Once you’re ready with your prompt, you can use the portkey.prompts.completions.create interface to use the prompt in your application.

Image Generation Models

Portkey supports the Imagen API on Vertex AI for image generations, letting you easily make requests in the familar OpenAI-compliant schema.
curl https://api.portkey.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -H "x-portkey-provider: $PORTKEY_PROVIDER" \
  -d '{
    "prompt": "Cat flying to mars from moon",
    "model":"@your-model-slug"
  }'
Image Generation API Reference

List of Supported Imagen Models

  • imagen-3.0-generate-001
  • imagen-3.0-fast-generate-001
  • imagegeneration@006
  • imagegeneration@005
  • imagegeneration@002

Video Generation Models

Portkey supports Google’s Veo video generation models on Vertex AI. You can generate videos from text prompts using the Portkey SDK.
Video generation on Vertex AI is a long-running operation that requires polling to check for completion. The example below shows how to start generation, poll for completion, and retrieve the final video.
from portkey_ai import Portkey
import base64
import time

# Initialize Portkey client
client = Portkey(
    api_key="PORTKEY_API_KEY",
    custom_host="https://us-central1-aiplatform.googleapis.com/v1",  # or your region
    provider="@VERTEX_PROVIDER"
)

# Start video generation
operation = client.post(
    url="projects/PROJECT_ID/locations/us-central1/publishers/google/models/veo-3.1-fast-generate-preview:predictLongRunning",
    instances=[{"prompt": "A serene mountain landscape at sunset"}],
    parameters={"sampleCount": "1"}
)

# Get operation name - handle different response structures
if hasattr(operation, 'model_dump'):
    operation_dict = operation.model_dump()
    operation_name = operation_dict.get("name")
else:
    operation_name = operation.name

print(f"Started: {operation_name}")

# Poll for completion
while True:
    check = client.post(
        url="projects/PROJECT_ID/locations/us-central1/publishers/google/models/veo-3.1-fast-generate-preview:fetchPredictOperation",
        operationName=operation_name
    )

    # Handle response properly
    if hasattr(check, 'model_dump'):
        status = check.model_dump()
    else:
        status = check

    if status.get("done"):
        print("Complete!")
        break

    print("Processing...")
    time.sleep(5)

# Save video
video_b64 = status["response"]["videos"][0]["bytesBase64Encoded"]
video_data = base64.b64decode(video_b64)

with open("output.mp4", "wb") as f:
    f.write(video_data)

print(f"Video saved! ({len(video_data) / 1024 / 1024:.1f} MB)")
Replace PROJECT_ID with your Google Cloud project ID and adjust the region as needed.
Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results. Grounding is invoked by passing the google_search tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval (for older models like gemini-1.5-flash) in the tools array.
"tools": [
    {
        "type": "function",
        "function": {
            "name": "google_search" // or google_search_retrieval for older models
        }
    }]
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.

gemini-2.0-flash-thinking-exp and other thinking/reasoning models

gemini-2.0-flash-thinking-exp models return a Chain of Thought response along with the actual inference text, this is not openai compatible, however, Portkey supports this by adding a \r\n\r\n and appending the two responses together. You can split the response along this pattern to get the Chain of Thought response and the actual inference text. If you require the Chain of Thought response along with the actual inference text, pass the strict open ai compliance flag as false in the request. If you want to get the inference text only, pass the strict open ai compliance flag as true in the request.

Multiple Modalities on chat completions endpoint

gemini-2.5-flash-image (nano banana)

The image data is available in the content_parts field in the response and it can be plugged back in for multi turn conversations

single turn conversation

from portkey_ai import Portkey

# Initialize the Portkey clien
portkey = Portkey(
    api_key="PORTKEY_API_KEY",  # Replace with your Portkey API key
    strict_open_ai_compliance=False
)

# Create the request
response = portkey.chat.completions.create(
  model="gemini-2.5-flash-image-preview", # your model slug from Portkey's Model Catalog
  max_tokens=32768,
  stream=False,
  modalities=["text", "image"],
  messages= [
      {
        "role": "system",
        "content": "You are a helpful assistant"
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Add some chocolate drizzle to the croissants. Include text across the top of the image that says \"Made Fresh Daily\"."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "gs://cloud-samples-data/generative-ai/image/croissant.jpeg"
            }
          }
        ]
      }
    ]
)
print(response)
# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
# response = portkey.chat.completions.create(
#   ...same config as above but with stream: true
# )
# for chunk in response:
#     if chunk.choices[0].delta:
#         content_blocks = chunk.choices[0].delta.get("content_blocks")
#         if content_blocks is not None:
#             for content_block in content_blocks:
#                 print(content_block)

Computer Use (Browser Automation) (Preview)

This uses the Gemini computer-use preview model. Set strict_open_ai_compliance to false.

Important Configuration Notes

Region Requirement: The computer use models require the vertex_region to be set to "global". If you get a 404 error about the model not being found, ensure your region is set correctly.

Understanding the OpenAI-Compatible Format

Portkey uses the OpenAI function calling signature to ensure compatibility across frameworks. Here’s how Google’s Computer Use API maps to Portkey:
Google SDKPortkey (OpenAI Signature)
computer_use=types.ComputerUse(environment=...)tools: [{ type: 'function', function: { name: 'computer_use', parameters: { environment: '...' }}}]
excluded_predefined_functions=["drag_and_drop"]parameters: { environment: '...', excluded_predefined_functions: ['drag_and_drop'] }
Multi-turn: append response to messagesSame: append assistant message with tool_calls to messages array

Computer Use Tool Configuration

The basic tool configuration for computer use is:
{
  "type": "function",
  "function": {
    "name": "computer_use",
    "parameters": {
      "environment": "ENVIRONMENT_BROWSER",
      "excluded_predefined_functions": ["drag_and_drop"]  // Optional: exclude specific functions
    }
  }
}
Available Parameters:
  • environment (required): Must be "ENVIRONMENT_BROWSER" for browser automation
  • excluded_predefined_functions (optional): Array of function names to exclude (e.g., ["drag_and_drop", "scroll"])

Single turn conversation

import Portkey from 'portkey-ai';

const portkey = new Portkey({
  apiKey: 'PORTKEY_API_KEY',
  provider: '@VERTEX_PROVIDER',
  vertexRegion: 'global', // Required for computer use models
  strictOpenAiCompliance: false
});

const response = await portkey.chat.completions.create({
  model: 'gemini-2.5-computer-use-preview-10-2025',
  stream: false,
  messages: [
    { role: 'system', content: 'You are a helpful assistant' },
    { role: 'user', content: "Go to google.com and search for 'weather in New York'" }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'computer_use',
        parameters: { 
          environment: 'ENVIRONMENT_BROWSER',
          // Optional: exclude specific predefined functions
          // excluded_predefined_functions: ['drag_and_drop']
        }
      }
    }
  ]
});
console.log(response);

Multi turn conversation

When the model returns tool calls, append the assistant’s message (including the tool_calls) back to your messages array for the next turn. This maintains conversation context.
import Portkey from 'portkey-ai';

const portkey = new Portkey({
  apiKey: 'PORTKEY_API_KEY',
  provider: '@VERTEX_PROVIDER',
  vertexRegion: 'global', // Required for computer use models
  strictOpenAiCompliance: false
});

const response = await portkey.chat.completions.create({
  model: 'gemini-2.5-computer-use-preview-10-2025',
  stream: false,
  messages: [
    { role: 'system', content: 'You are a helpful assistant' },
    { role: 'user', content: "Go to google.com and search for 'weather in New York'" },
    { role: 'assistant', tool_calls: [ { id: 'portkey-50925c03-b8cc-4057-948b-13a9d9de19e0', type: 'function', function: { name: 'open_web_browser', arguments: '{}' } } ] },
    { role: 'user', content: "I've opened the browser" }
  ],
  tools: [{ type: 'function', function: { name: 'computer_use', parameters: { environment: 'ENVIRONMENT_BROWSER' } } }]
});
console.log(response);

Working with Images in Computer Use

You can attach images to your messages in computer use contexts using the standard OpenAI format with image_url:
from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="@VERTEX_PROVIDER",
    vertex_region="global",
    strict_open_ai_compliance=False
)

response = portkey.chat.completions.create(
    model="gemini-2.5-computer-use-preview-10-2025",
    stream=False,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What do you see in this screenshot?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
            ]
        }
    ],
    tools=[{
        "type": "function",
        "function": {
            "name": "computer_use",
            "parameters": {"environment": "ENVIRONMENT_BROWSER"}
        }
    }]
)
Images can be provided as:
  • Public URLs: https://example.com/image.png
  • Base64 data URIs: data:image/png;base64,iVBORw0KGgo...
  • Google Cloud Storage URLs: gs://bucket-name/image.png

Safety Settings

Gemini’s computer use models support safety settings to control content filtering. In Google’s SDK, you configure this with safety_settings. With Portkey’s OpenAI-compatible format, safety settings are automatically handled by the model defaults. Google SDK Mapping:
Google SDKPortkey (OpenAI Format)
safety_settings=[{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE"}]Handled automatically by model defaults
The model will return responses following Google’s default safety guidelines. If content is blocked due to safety filters, you’ll receive an appropriate error response with details about which safety category triggered the block. For custom safety configurations beyond the defaults, please contact Portkey support at support@portkey.ai.

Troubleshooting Common Issues

Problem: Error code: 404 - Publisher Model 'gemini-2.5-computer-use-preview-10-2025' was not foundSolution: Computer use models require the vertex_region to be set to "global". Update your configuration:
portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="@VERTEX_PROVIDER",
    vertex_region="global",  # Required!
    strict_open_ai_compliance=False
)
Problem: Invalid JSON payload received. Unknown name "environment"Solution: Ensure you’re using the correct tool configuration format:
{
  "type": "function",
  "function": {
    "name": "computer_use",  // Must be exactly "computer_use"
    "parameters": {
      "environment": "ENVIRONMENT_BROWSER"
    }
  }
}
Portkey uses OpenAI’s function calling signature for cross-framework compatibility. Here’s a quick reference:Google SDK:
tools=[
    types.Tool(
        computer_use=types.ComputerUse(
            environment=types.Environment.ENVIRONMENT_BROWSER,
            excluded_predefined_functions=["drag_and_drop"]
        )
    )
]
Portkey (OpenAI Format):
tools=[{
    "type": "function",
    "function": {
        "name": "computer_use",
        "parameters": {
            "environment": "ENVIRONMENT_BROWSER",
            "excluded_predefined_functions": ["drag_and_drop"]
        }
    }
}]

multi turn conversation

from portkey_ai import Portkey

# Initialize the Portkey clien
portkey = Portkey(
    api_key="PORTKEY_API_KEY",  # Replace with your Portkey API key
    strict_open_ai_compliance=False
)

# Create the request
response = portkey.chat.completions.create(
  model="gemini-2.5-flash-image-preview", # your model slug from Portkey's Model Catalog
  max_tokens=32768,
  stream=False,
  modalities=["text", "image"],
  messages= [
      {
        "role": "system",
        "content": "You are a helpful assistant"
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Add some chocolate drizzle to the croissants. Include text across the top of the image that says \"Made Fresh Daily\"."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "gs://cloud-samples-data/generative-ai/image/croissant.jpeg"
            }
          }
        ]
      },
        {
        "role": "assistant",
        "content": [
                {
                    "type": "text",
                    "text": "Here are the croissants with chocolate drizzle and the requested text: "
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/jpeg;base64,UKDhasdhj....."
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": "looking good, thanks fam"
        }

    ]
)
print(response)
# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
# response = portkey.chat.completions.create(
#   ...same config as above but with stream: true
# )
# for chunk in response:
#     if chunk.choices[0].delta:
#         content_blocks = chunk.choices[0].delta.get("content_blocks")
#         if content_blocks is not None:
#             for content_block in content_blocks:
#                 print(content_block)

Making Requests Without Portkey’s Model Catalog

You can also pass your Vertex AI details & secrets directly without using the Portkey’s Model Catalog. Vertex AI expects a region, a project ID and the access token in the request for a successful completion request. This is how you can specify these fields directly in your requests:

Example Request

import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY",
    vertexProjectId: "sample-55646",
    vertexRegion: "us-central1",
    provider:"vertex-ai",
    Authorization: "$GCLOUD AUTH PRINT-ACCESS-TOKEN"
})

const chatCompletion = await portkey.chat.completions.create({
    messages: [{ role: 'user', content: 'Say this is a test' }],
    model: 'gemini-pro',
});

console.log(chatCompletion.choices);
For further questions on custom Vertex AI deployments or fine-grained access tokens, reach out to us on support@portkey.ai

How to Find Your Google Vertex Project Details

To obtain your Vertex Project ID and Region, navigate to Google Vertex Dashboard.
  • You can copy the Project ID located at the top left corner of your screen.
  • Find the Region dropdown on the same page to get your Vertex Region.
Logo

Get Your Service Account JSON

When selecting Service Account File as your authentication method, you’ll need to:
  1. Upload your Google Cloud service account JSON file
  2. Specify the Vertex Region
This method is particularly important for using self-deployed models, as your service account must have the aiplatform.endpoints.predict permission to access custom endpoints. Learn more about permission on your Vertex IAM key here.
For Self-Deployed Models: Your service account must have the aiplatform.endpoints.predict permission in Google Cloud IAM. Without this specific permission, requests to custom endpoints will fail.

Using Project ID and Region Authentication

For standard Vertex AI models, you can simply provide:
  1. Your Vertex Project ID (found in your Google Cloud console)
  2. The Vertex Region where your models are deployed
This method is simpler but may not have all the permissions needed for custom endpoints.

Next Steps

The complete list of features supported in the SDK are available on the link below.

SDK

You’ll find more information in the relevant sections:
  1. Add metadata to your requests
  2. Add gateway configs to your Vertex AI requests
  3. Tracing Vertex AI requests
  4. Setup a fallback from OpenAI to Vertex AI APIs