Cohere Rerank 4

Cohere Rerank 4 is a rerank model available in two variants, Pro and Fast.

Reranking improves search relevance by reordering an initial set of retrieved results. After a retrieval step returns candidate documents, the reranking model compares the query with each candidate and ranks the results from most relevant to least relevant.

Cohere Rerank 4 supports multilingual reranking and semi-structured content, including JSON, tables, and code-like content.

What’s New in Rerank 4

Compared with Cohere Rerank 3.5, Rerank 4 adds a larger context window, improved reranking quality, self-learning support, and two variants optimized for different workload requirements

Increased context window

Rerank 4 supports a 32,000-token context window. The larger context window improves handling for long documents and larger candidate inputs, which is useful for dense enterprise content such as reports, contracts, manuals, and technical documentation.

Improved reranking quality

Rerank 4 improves result ordering for enterprise retrieval workloads. It provides stronger relevance ranking for business, finance, technical, and other domain-specific content, which can improve downstream retrieval-augmented generation workflows by surfacing more relevant context.

Self-learning support

Rerank 4 introduces self-learning support, which helps adapt reranking behavior to domain-specific data, terminology, and relevance preferences without requiring annotated training data.

Pro and Fast variants

Rerank 4 is available in two variants:

Pro is optimized for higher-precision reranking and more complex retrieval tasks.
Fast is optimized for lower-latency, higher-throughput workloads.

Multilingual and semi-structured data support

Rerank 4 supports reranking for English and non-English content across more than 100 languages. It also supports semi-structured content, including JSON, tables, and code-like content.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Model Variants

Cohere Rerank 4 includes the following model variants:


Model	OCI Model Name	Description
Cohere Rerank 4 Pro	`cohere.rerank-v4.0-pro`	Multilingual reranking model for English and non-English text and semi-structured JSON data. Best suited for quality-focused and complex reranking workloads.
Cohere Rerank 4 Fast	`cohere.rerank-v4.0-fast`	Lightweight multilingual reranking model for English and non-English text and semi-structured JSON data. Best suited for lower-latency and higher-throughput workloads.

On-Demand Mode

Some Cohere Rerank 4 variants are available on-demand in supported regions. On-demand mode doesn't require a dedicated AI cluster.

See Models by Region to check which model variants are available on-demand and in which regions.


Model Name	OCI Model Name	Pricing Page Product Name
Cohere Rerank 4 Pro	`cohere.rerank-v4.0-pro`	Rerank 4 Pro
Cohere Rerank 4 Fast	`cohere.rerank-v4.0-fast`	Rerank 4 Fast

Pricing is based on 1,000 search units. See the Pricing Page.

Learn about On-Demand Mode.

Dedicated AI Cluster for the Model

For models in on-demand mode, no clusters are required. Access them through the Console playground and API. For models available in the dedicated mode, use endpoints created on dedicated AI clusters. Learn about the Dedicated Mode.

The following table lists hardware unit sizes, available regions, and service limits for dedicated AI clusters available for Cohere Rerank 4 Pro and Cohere Rerank 4 Fast. These models aren't available for fine-tuning.

Cohere Rerank 4 Pro and Fast


Hardware Unit Size	Available Regions	Limit Name
Cohere_A10_X1	Germany Central (Frankfurt) US East (Ashburn) US Midwest (Chicago)	Limit Name: `dedicated-unit-a10-count` Request Increase by: 1
Cohere_A100_80G_X1	US Midwest (Chicago) US West (Phoenix)	Limit Name: `dedicated-unit-a100-80g-count` Request Increase by: 1
Cohere_B200_X1	Brazil East (Sao Paulo) Germany Central (Frankfurt) India South (Hyderabad) Japan Central (Osaka) UK South (London) US East (Ashburn) US Midwest (Chicago) US West (Phoenix)	Limit Name: `dedicated-unit-b200-count` Request Increase by: 1
Cohere_H100_X1	Brazil East (Sao Paulo) Germany Central (Frankfurt) India South (Hyderabad) Japan Central (Osaka) UK South (London) US East (Ashburn) US Midwest (Chicago)	Limit Name: `dedicated-unit-h100-count` Request Increase by: 1
Cohere_H200_X1	Saudi Arabia Central (Riyadh)	Limit Name: `dedicated-unit-h200-count` Request Increase by: 1

Important

For hardware pricing, see the Cost estimator.
If tenancy limits are insufficient for hosting this model on a dedicated AI cluster, request an increase for the relevant hardware limit. For example, request an increase for the dedicated-unit-h100-count limit by 1. See Creating a Limit Increase Request.

Access this Model

To use a Cohere Rerank 4 model, call the RerankText API from a supported region.

Endpoint: https://inference.generativeai.{region}.oci.oraclecloud.com
API operation: POST /20231130/actions/rerankText

In RerankTextDetails, for servingMode, set the servingType attribute based on how you want to access the model:

Use ON_DEMAND for an on-demand model in a supported region.
Use DEDICATED for a model hosted on a dedicated AI cluster endpoint.

For availability and setup details, see the preceding On-Demand Mode and Dedicated AI Cluster for the Model sections.

OCI Release and Retirement Dates

For release and retirement dates and replacement model options, see the following pages based on the mode (on-demand or dedicated):

Rerank Model Parameters

For the Rerank model parameters, see the RerankText API documentation.

Oracle Cloud Infrastructure Documentation