Cohere Embed 4

Cohere Embed 4 (cohere.embed-v4.0) is a multimodal embedding model that generates embeddings from text, one image, or text and one image in the same API payload. Image input is available through the API only.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Access this Model

You can access this model through:

Note

The API inks list the endpoints for all supported commercial, sovereign, and government regions.

Key Features

Matryoshka embeddings: Supports output dimensions of 256, 512, 1,024, and 1,536. This feature isn't supported in Embed 3 models.
Input limits:
- Console: Up to 96 text inputs per run, with each text input under 512 tokens. This limit applies to on-demand mode.
- SDK and API: Up to 128,000 total input tokens per run.
Output dimensions:
- Console:1,536
- API: 1,536 by default; supports 256, 512, 1,024, and 1,536
Input mode:
- API: Supports text only, one image only, or several text inputs with one image in the same payload.
- Only one image is allowed per payload.
- Image input is available through the API only.
Image input:
- Requires a base64-encoded image.
- A 512 x 512 image is about 1,610 tokens.
Language support:
- Text: English and multilingual
- Image: English only

Use Text and Image in the EmbedText API

To include an image with text, use the embedContents attribute in the EmbedTextDetails request body for the EmbedText API.

The embedContents attribute is an array and is supported only for Embed 4 models. Each item in the array is an EmbedContent object. An EmbedContent object can contain either text content or image content.

Use embedContents when you want to send text and image content in the same EmbedText request. You can include several text entries and one image, up to the maximum input size.

The other parameters for the EmbedText API remain the same.

Important

The embedContents attribute is supported only by Embed 4 models. Don't use embedContents with Embed 3 models.

On-Demand Mode

On-demand mode is pay-as-you-go and is useful for experimentation, proof-of-concept work, and model evaluation. On the pricing page, this model is listed as:


Model Name	OCI Model Name	Pricing Page Product Name
Cohere Cohere Embed 4	`cohere.embed-v4.0`	Embed Cohere

Important

Dynamic Throttling Limit Change for On-Demand Mode

OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access. Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.

Tip

Because rate limits can change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.

Dedicated AI Cluster for the Model

For models in on-demand mode, no clusters are required. Access them through the Console playground and API. For models available in the dedicated mode, use endpoints created on dedicated AI clusters. Learn about the Dedicated Mode.

The following table lists hardware unit sizes, available regions, and service limits for dedicated AI clusters. This model isn't available for fine-tuning.


Hardware Unit Size	Available Regions	Limit Name
Cohere_A10_X1	Germany Central (Frankfurt) UAE East (Dubai) US East (Ashburn) US Midwest (Chicago)	Limit Name: `dedicated-unit-a10-count` Request Increase by: 1
Cohere_A100_40G_X1	UAE East (Dubai)	Limit Name: `dedicated-unit-a100-40g-count` Request Increase by: 1
Cohere_A100_80G_X1	US Midwest (Chicago)	Limit Name: `dedicated-unit-a100-80g-count` Request Increase by: 1
Cohere_B200_X1	UAE Central (Abu Dhabi)	Limit Name: `dedicated-unit-b200-count` Request Increase by: 1
Cohere_H100_X1	Brazil East (Sao Paulo) Germany Central (Frankfurt) India South (Hyderabad) Japan Central (Osaka) UK South (London) US East (Ashburn) US Midwest (Chicago)	Limit Name: `dedicated-unit-h100-count` Request Increase by: 1
Cohere_H200_X1	Saudi Arabia Central (Riyadh)	Limit Name: `dedicated-unit-h200-count` Request Increase by: 1

Important

For hardware pricing, see the Cost estimator.
If tenancy limits are insufficient for hosting this model on a dedicated AI cluster, request an increase for the relevant hardware limit. For example, request an increase for the dedicated-unit-h100-count limit by 1. See Creating a Limit Increase Request.

Legacy Generic Shapes

Important

Legacy generic Cohere shapes are being retired from Generative AI. During the retirement period, these shapes remain available in the API only. If you use the API, you might see both the legacy generic shapes and the new hardware unit shapes until the legacy generic shapes are removed from the service.

Use this section only if you have a dedicated AI cluster that uses a legacy generic Cohere shape, or if you use the API to create a cluster with a legacy generic Cohere shape during the retirement period. For new dedicated AI clusters, use the hardware unit shapes listed in Dedicated AI Cluster for the Model.

To reach a model through a dedicated AI cluster in any listed region, you must create an endpoint for that model on a dedicated AI cluster. For the cluster unit size that matches this model, see the following table.


Base Model	Fine-Tuning Cluster	Hosting Cluster	Pricing Page Information	Request Cluster Limit Increase
Model Name: Cohere Embed 4 OCI Model Name: `cohere.embed-v4.0`	Not available for fine-tuning	Unit Size: Embed Cohere Required Units: 1	Pricing Page Product Name: Embed Cohere - Dedicated For Hosting, Multiply the Unit Price: x1	Limit Name: `dedicated-unit-embed-cohere-count` For Hosting, Request Limit Increase by: 1

Tip

If you don't have enough hosting capacity, request an increase for the dedicated-unit-embed-cohere-count limit.

Endpoint Rules for Clusters

A dedicated AI cluster can hold up to 50 endpoints.
Use these endpoints to create aliases that all point either to the same base model or to the same version of a custom model, but not both types.
Several endpoints for the same model make it easy to assign them to different users or purposes.

Tip

To increase the call volume supported by a hosting cluster, increase its instance count by editing the dedicated AI cluster. See Updating a Dedicated AI Cluster.
For more than 50 endpoints per cluster, request an increase for the limit, endpoint-per-dedicated-unit-count. See Creating a Limit Increase Request and Service Limits for Generative AI.

Cluster Performance Benchmarks

Review the Cohere Embed 4 cluster performance benchmarks for different use cases.

OCI Release and Retirement Dates

For release and retirement dates and replacement model options, see the following pages based on the mode (on-demand or dedicated):

Input Data for Text Embeddings

For text embeddings, you can add sentences, phrases, or paragraphs. In the Console, you can enter text directly or upload a .txt file.

If you use an input file, separate each input sentence, phrase, or paragraph with a newline character.

Console limits:

Maximum 96 text inputs per run
Each text input must be under 512 tokens

SDK and API limits:

Up to 128,000 total input tokens per run
Text and image inputs together count toward the total input token limit
Only one image is allowed per payload
Image input must be base64 encoded

If an input is too long, use the truncate parameter to truncate the start or end of the input. If the input exceeds the token limit and truncate is set to None, the request returns an error.

Embedding Model Parameters

You can change the following parameters when using embedding models.

Truncate (truncate): Truncates tokens at the start or end when input exceeds the maximum token limit.

Embedding Types (embeddingTypes)

Supported values:

float (Default)
int8
uint8
binary
ubinary
base64

Output Dimensions (outputDimensions)

Supported values:

256
512
1024
1536 (Default)

Migrating from Embed 3 to Embed 4

When migrating from Embed 3 to Embed 4, we recommend changing the vector size from 1,024 to 1,536 dimensions and using a new index to help avoid downtime.

Create a new vector index

Create a new index or collection in your vector database configured for 1,536 dimensions.
Re-embed the data

Reprocess the source documents with cohere.embed-v4.0 and set outputDimensions=1536. Store the new embeddings in the new index.
Update query logic
Update the application to use Embed 4 for incoming search queries. Use:
- input_type="search_query" for queries
- input_type="search_document" for stored documents
Cut over

After the new index is fully populated and tested, update the application to use the new 1,536-dimension index.

Oracle Cloud Infrastructure Documentation

Cohere Embed 4

Regions for this Model

Access this Model

Key Features

Use Text and Image in the EmbedText API

On-Demand Mode

Dedicated AI Cluster for the Model

Legacy Generic Shapes

Endpoint Rules for Clusters

Cluster Performance Benchmarks

OCI Release and Retirement Dates

Input Data for Text Embeddings

Embedding Model Parameters

Migrating from Embed 3 to Embed 4