Review performance benchmarks for the cohere.rerank.3-5
(Cohere Rerank 3.5) model hosted on one RERANK_COHERE unit of a dedicated AI cluster in OCI
Generative AI.
A rerank model takes a query and a list of texts as input and ranks the texts based on their relevancy score to the query, that's, how well each text matches the query.
- Rerank 3.5 Benchmark Scenarios
-
- The query is 100 tokens for all scenarios.
- All scenarios have only one supporting document that's 10,000 tokens long.
- Each scenario chunks this 10,000-token document based on a
max_tokens_per_doc
parameter. These values are 64, 128, 256, 512, 1024, 2048, and 4096.
- The maximum chunk size is 4096 tokens which is the maximum tokens that a Rerank 3.5 model can process in one pass.
- Because the document is 10,000 tokens long and the model's context length is 4096 tokens, in all the scenarios, the document is broken into chunks.
- Each chunk includes:
- Padding tokens: To ensure the input fits the model's expected format.
- The query: 100 tokens.
- A document section: For example, for a
max_tokens_per_doc
of 4096 tokens, each chunk includes one of the following document sections:
- Document section 1: Document from 0 to 3,992 tokens.
- Document section 2: Document from 3,993 to 7,985 tokens.
- Document section 3: Document from 7,986 to 9,999 tokens. This section is smaller than the other two sections, because the document is only 10,000 tokens long.
- Each benchmark scenario is defined by R(max_tokens_per_doc, 100).
- See details for the model and review the following sections:
- Available regions for this model.
- Dedicated AI clusters for hosting this model.
- Review the metrics.