Cohere Embed 4 (New)
The cohere.embed-v4.0
is a multimodal model that can create text embeddings from a mixed modality input, which is an input of text and images in a single payload.
Available in These Regions
- Brazil East (Sao Paulo) (dedicated AI cluster only)
- Germany Central (Frankfurt) (dedicated AI cluster only)
- India South (Hyderabad) (dedicated AI cluster only)
- Japan Central (Osaka)
- UAE East (Dubai) (dedicated AI cluster only)
- UK South (London) (dedicated AI cluster only)
- US Midwest (Chicago)
Key Features
- Mode
- Input text or image, but not both.
- To get embeddings for an image, only one image is allowed. You can't combine text and image for the same embedding. Image input through API only.
- Input and Output
- In the Console, each text input must be less than 512 tokens and maximum 96 inputs per run.
- In the SDK and API, all inputs together can add up to 128,000 tokens per embedding per run.
- Model outputs a 1,536-dimensional vector for each embedding.
- Language Support
- Text: English or multilingual.
- Image: English only.
Dedicated AI Cluster for the Model
To reach a model through a dedicated AI cluster in any listed region, you must create an endpoint for that model on a dedicated AI cluster. For the cluster unit size that matches this model, see the following table.
Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
-
If you don't have enough cluster limits in your tenancy for hosting an Embed model on a dedicated AI cluster, request the
dedicated-unit-embed-cohere-count
limit to increase by 1. - Review the Cohere Embed 4 cluster performance benchmarks for different use cases.
Release and Retirement Dates
Model | Release Date | On-Demand Retirement Date | Dedicated Mode Retirement Date |
---|---|---|---|
cohere.embed-v4.0
|
2025-07-03 | At least 6 months after the release of the 1st replacement model. | At least 6 months after the release of the 1st replacement model. |
Embedding Model Parameter
When using the embedding models, you can get a different output by changing the following parameter.
- Truncate
-
Whether to truncate the start or end tokens in a sentence, when that sentence exceeds the maximum number of allowed tokens. For example, a sentence has 516 tokens, but the maximum token size is 512. If you select to truncate the end, the last 4 tokens of that sentence are cut off.