Model Limitations in Generative AI

Review the following model requirements for the OCI Generative AI custom and base models to get the most out of your models.

Note

For key features of the pretrained base models, see Pretrained Foundational Models in Generative AI.

Matching Base Models to Clusters

Expand the following sections to review the dedicated AI cluster unit size and units that match each foundational model.

Chat
Important

Some OCI Generative AI foundational pretrained base models supported for the dedicated serving mode are now deprecated and will retire no sooner than 6 months after the release of the 1st replacement model. You can host a base model, or fine-tune a base model and host the fine-tuned model on a dedicated AI cluster (dedicated serving mode) until the base model is retired. For dedicated serving mode retirement dates, see Retiring the Models.
Base Model Fine-Tuning Cluster Hosting Cluster Pricing Page Information Request Cluster Limit Increase
  • Model Name: Cohere Command R
  • OCI Model Name: cohere.command-r-16k (deprecated)
  • Unit Size: Small Cohere V2
  • Required Units: 8
  • Unit Size: Small Cohere V2
  • Required Units: 1
  • Pricing Page Product Name: Small Cohere - Dedicated
  • For Fine-Tuning, Multiply the Unit Price: x8
  • Limit Name: dedicated-unit-small-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • For Fine-Tuning, Request Limit Increase by: 8
  • Model Name: Command R 08-2024
  • OCI Model Name: cohere.command-r-08-2024
  • Unit Size: Small Cohere V2
  • Required Units: 8
  • Unit Size: Small Cohere V2
  • Required Units: 1
  • Pricing Page Product Name: Small Cohere - Dedicated
  • For Fine-Tuning, Multiply the Unit Price: x8
  • Limit Name: dedicated-unit-small-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • For Fine-Tuning, Request Limit Increase by: 8
  • Model Name: Cohere Command R+
  • OCI Model Name: cohere.command-r-plus (deprecated)
Not available for fine-tuning
  • Unit Size: Large Cohere V2_2
  • Required Units: 1
  • Pricing Page Product Name: Large Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • Limit Name: dedicated-unit-large-cohere-count
  • For Hosting, Request Limit Increase by: 2
  • Model Name: Command R+ 08-2024
  • OCI Model Name: cohere.command-r-plus-08-2024
Not available for fine-tuning
  • Unit Size: Large Cohere V2_2
  • Required Units: 1
  • Pricing Page Product Name: Large Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • Limit Name: dedicated-unit-large-cohere-count
  • For Hosting, Request Limit Increase by: 2
  • Model Name: Meta Llama 3.2 11B Vision
  • OCI Model Name: meta.llama-3.2-11b-vision-instruct
Not available for fine-tuning
  • Unit Size: Small Generic V2
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x(0.5)
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by:1
  • Model Name: Meta Llama 3.2 90B Vision
  • OCI Model Name: meta.llama-3.2-90b-vision-instruct
Not available for fine-tuning
  • Unit Size: Large Generic V2
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 2
  • Model Name: Meta Llama 3.1 (70B)
  • OCI Model Name: meta.llama-3.1-70b-instruct
  • Unit Size: Large Generic
  • Required Units: 2
  • Unit Size: Large Generic
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • For Fine-Tuning, Multiply the Unit Price: x4
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 2
  • For Fine-Tuning, Request Limit Increase by: 4
  • Model Name: Meta Llama 3.1 (405B)
  • OCI Model Name: meta.llama-3.1-405b-instruct
Not available for fine-tuning
  • Unit Size: Large Generic 4
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x8
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 8
  • Model Name: Meta Llama 3
  • OCI Model Name: meta.llama-3-70b-instruct (deprecated)
  • Unit Size: Large Generic
  • Required Units: 2
  • Unit Size: Large Generic
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • For Fine-Tuning, Multiply the Unit Price: x4
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 2
  • For Fine-Tuning, Request Limit Increase by: 4
Important

You must request a limit increase to use the following resources:

Meta Llama Family

  • To host a Meta Llama 3.2 11B Vision model, you must request dedicated-unit-llama2-70-count to increase by 1.

  • To host a Meta Llama 3.2 90B Vision model, you must request dedicated-unit-llama2-70-count to increase by 2.

  • To host a Meta Llama 3.1 (70B) model, you must request dedicated-unit-llama2-70-count to increase by 2.

  • To fine-tune a Meta Llama 3.1 (70B) model, you must request dedicated-unit-llama2-70-count to increase by 4.

  • To host a Meta Llama 3.1 (405B) model, you must request dedicated-unit-llama2-70-count to increase by 8.

Cohere Command R Family

  • To host a Cohere Command R (deprecated) model, you must request dedicated-unit-small-cohere-count to increase by 1.

  • To fine-tune a Cohere Command R (deprecated) model, you must request dedicated-unit-small-cohere-count to increase by 8.

  • To host a Cohere Command R 08-2024 model, you must request dedicated-unit-small-cohere-count to increase by 1.

  • To fine-tune a Cohere Command R 08-2024 model, you must request dedicated-unit-small-cohere-count to increase by 8.

  • To host a Cohere Command R+ (deprecated) model, you must request dedicated-unit-large-cohere-count to increase by 2.

  • To host a Cohere Command R+ 08-2024 model, you must request dedicated-unit-large-cohere-count to increase by 2.

References: Service Limits for Generative AI and Request Cluster Limit Increase

Embedding
Base Model Fine-tuning Cluster Hosting Cluster Pricing Page Product Name Request Cluster Limit Increase
  • Model Name: Cohere English Embed V3
  • OCI Model Name: cohere.embed-english-v3.0
Not available for fine-tuning
  • Unit Size: Embed Cohere
  • Required Units: 1
  • Pricing Page Product Name: Embed Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-embed-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • Model Name: Cohere Multilingual Embed V3
  • OCI Model Name: cohere.embed-multilingual-v3.0
Not available for fine-tuning
  • Unit Size: Embed Cohere
  • Required Units: 1
  • Pricing Page Product Name: Embed Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-embed-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • Model Name: Cohere English Light Embed V3
  • OCI Model Name: cohere.embed-english-light-v3.0
Not available for fine-tuning
  • Unit Size: Embed Cohere
  • Required Units: 1
  • Pricing Page Product Name: Embed Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-embed-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • Model Name: Cohere Multilingual Light Embed V3
  • OCI Model Name: cohere.embed-multilingual-light-v3.0
Not available for fine-tuning
  • Unit Size: Embed Cohere
  • Required Units: 1
  • Pricing Page Product Name: Embed Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-embed-cohere-count
  • For Hosting, Request Limit Increase by: 1
Text Generation (Deprecated)
Important

  • Not Available on-demand: All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
  • Can be hosted on clusters: If you host a summarization or a generation model such as cohere.command on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions.
Base Model Fine-tuning Cluster Hosting Cluster Pricing Page Product Name Request Cluster Limit Increase
  • Model Name: Cohere Command XL (52B)
  • OCI Model Name: cohere.command (deprecated)
  • Unit Size: Large Cohere
  • Required Units: 2
  • Unit Size: Large Cohere
  • Required Units: 1
  • Pricing Page Product Name: Large Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • For Fine-Tuning, Multiply the Unit Price: x2
  • Limit Name: dedicated-unit-large-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • For Fine-Tuning, Request Limit Increase by: 2
  • Model Name: Cohere Command Light (6B)
  • OCI Model Name: cohere.command-light (deprecated)
  • Unit Size: Small Cohere
  • Required Units: 2
  • Unit Size: Small Cohere
  • Required Units: 1
  • Pricing Page Product Name: Small Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • For Fine-Tuning, Multiply the Unit Price: x2
  • Limit Name: dedicated-unit-small-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • For Fine-Tuning, Request Limit Increase by: 2
  • Model Name: Meta Llama 2
  • OCI Model Name: meta.llama-2-70b-chat (deprecated)
Not available for fine-tuning
  • Unit Size: Llama2 70
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 1
Summarization (Deprecated)
Important

The cohere.command model supported for the on-demand serving mode is now retired and this model is deprecated for the dedicated serving mode. If you're hosting cohere.command on a dedicated AI cluster, (dedicated serving mode) for summarization, you can continue to use this hosted model replica with the summarization API and in the playground until the cohere.command model retires for the dedicated serving mode. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead which offer the same summarization capabilities, including control over summary length and style.
Base Model Fine-tuning Cluster Hosting Cluster Pricing Page Product Name Request Cluster Limit Increase
  • Model Name: Cohere Command XL (52B)
  • OCI Model Name: cohere.command (deprecated)
Not available for fine-tuning
  • Unit Size: Large Cohere
  • Required Units: 1
  • Pricing Page Product Name: Large Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x1
  • Limit Name: dedicated-unit-large-cohere-count
  • For Hosting, Request Limit Increase by: 1
Units for Fine-Tuning Clusters
Creating a fine-tuning dedicated AI cluster automatically provisions a fixed number of units based on the base model: 8 units for cohere.command-r-16k and 2 units for other models. You can't change this number, but you can use the same cluster to fine-tune several models.
Units for Hosting Clusters
  • When creating a cluster, by default, one unit is created for the selected base model.
  • You can increase throughput or requests per minute (RPM) by adding model replicas. For example, 2 replicas require 2 units. You can add model replicas when creating or editing a hosting cluster.
  • Host up to 50 models on the same cluster, with the following restrictions:
    • Host up to 50 of the same version of a fine-tuned or a pretrained model on the same cluster.
    • Host different versions of the same base model, only if using T-FEW fine-tuning method for cohere.command and cohere.command-light base models.
Note

Instead of committing to dedicated AI clusters, you can pay as you go for on-demand inferencing. With on-demand inferencing you reach the foundational models either through the Console, in the playground or through the API. For on-demand features, see Calculating Cost in Generative AI.

Adding Endpoints to Hosting Clusters

To host a model for inference on a hosting dedicated AI cluster, you must create an endpoint for that model. Then, you can add either add a custom model or a pretrained foundational model to that endpoint.

About Endpoint Aliases and Stack Serving

A hosting dedicated AI cluster can have up to 50 endpoints. Use these endpoints for the following use cases:

Creating Endpoint Aliases

Create aliases with many endpoints. These 50 endpoints must either point to the same base model or the same version of a custom model. Creating many endpoints that point to the same model makes it easier to manage the endpoints, because you can use the endpoints for different users or different purposes.

Stack Serving

Host several versions of a custom model on one cluster. This applies to cohere.command and cohere.command-light models that are fine-tuned with the T-Few training method. Hosting various versions of a fine-tuned model can help you to assess the custom models for different use cases.

Tip

To increase the call volume supported by a hosting cluster, you can increase its instance count.

Expand the following sections to review the requirements for hosting models on the same cluster.

Chat
Important

Some OCI Generative AI foundational pretrained base models supported for the dedicated serving mode are now deprecated and will retire no sooner than 6 months after the release of the 1st replacement model. You can host a base model, or fine-tune a base model and host the fine-tuned model on a dedicated AI cluster (dedicated serving mode) until the base model is retired. For dedicated serving mode retirement dates, see Retiring the Models.

For hosting the pretrained base chat models, or fine-tuned chat models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules that match each base model.

Hosting Cluster Unit Size Matching Rules
Small Generic V2 for the base model, meta.llama-3.2-11b-vision-instruct

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the meta.llama-3.2-11b-vision-instruct model on the same hosting cluster.

Hosting Custom Models

Fine-tuning not available for the meta.llama-3.2-11b-vision-instruct model .

Large Generic for the base model, meta.llama-3.1-70b-instruct

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the meta.llama-3.1-70b-instruct model on the same hosting cluster.

Hosting Custom Models

To host several custom models on the same cluster:

  • Fine-tune one model with the LoRA training method.
  • Use the meta.llama-3.1-70b-instruct model as the base.
  • Create as many endpoints as needed for the custom model (same version).
Large Generic for the base model, meta.llama-3-70b-instruct

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the meta.llama-3-70b-instruct model on the same hosting cluster.

Hosting Custom Models

To host several custom models on the same cluster:

  • Fine-tune one model with the LoRA training method.
  • Use the meta.llama-3-70b-instruct model as the base.
  • Create as many endpoints as needed for the custom model (same version).
Large Generic V2 for the base model, meta.llama-3.2-90b-vision-instruct

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the meta.llama-3.2-90b-vision-instruct model on the same hosting cluster.

Hosting Custom Models

Fine-tuning not available for the meta.llama-3.2-90b-vision-instruct model .

Large Generic 4 for the base model, meta.llama-3.1-405b-instruct

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the meta.llama-3.1-405b-instruct model on the same hosting cluster.

Hosting Custom Models

Fine-tuning not available for the meta.llama-3.1-405b-instruct model.

Small Cohere V2 for the base model, cohere.command-r-16k (deprecated)

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the cohere.command-r-16k model on the same hosting cluster.

Hosting Custom Models

To host several custom models on the same cluster:

  • Fine-tune one model with the T-Few or Vanilla training method.
  • Use the cohere.command-r-16k model as the base.
  • Create as many endpoints as needed for the custom model (same version).

You can't host different versions of a custom model trained on the cohere.command-r-16k base model on the same cluster, as stack serving isn't supported.

Small Cohere V2 for the base model, cohere.command-r-08-2024

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the cohere.command-r-08-2024 model on the same hosting cluster.

Hosting Custom Models

To host several custom models on the same cluster:

  • Fine-tune one model with the T-Few or Vanilla training method.
  • Use the cohere.command-r-08-2024 model as the base.
  • Create as many endpoints as needed for the custom model (same version).

You can't host different versions of a custom model trained on the cohere.command-r-16k base model on the same cluster, as stack serving isn't supported.

Large Cohere V2_2 for the base model, cohere.command-r-plus (deprecated)

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the cohere.command-r-plus model on the same hosting cluster.

Hosting Custom Models

Fine-tuning not available for the cohere.command-r-plus model.

Large Cohere V2_2 for the base model, cohere.command-r-plus-08-2024

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the cohere.command-r-plus-08-2024 model on the same hosting cluster.

Hosting Custom Models

Fine-tuning not available for the cohere.command-r-plus-08-2024 model.

Embedding

For hosting the embedding models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.

Hosting Cluster Unit Size Matching Rules
Embed Cohere for the base models cohere.embed.english-light-v3.0, cohere.embed.english-v3.0, cohere.embed.multilingual-light-v3.0, and cohere.embed.multilingual-v3.0

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for one of the pretrained Cohere Embed models on the same hosting cluster.

Hosting Custom Models

Fine-tuning not available for the Cohere Embed models.

Text Generation (Deprecated)
Important

  • Not Available on-demand: All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
  • Can be hosted on clusters: If you host a summarization or a generation model such as cohere.command on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions.

To host the text generation models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules that match your base model.

Hosting Cluster Unit Size Matching Rules
Small Cohere for the base model, cohere.command-light

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:
  • Create as many endpoints as needed for the cohere.command-light model on the same hosting cluster.

Hosting Custom Models

To host different custom models on the same cluster:

  • Fine-tune all the models with the T-Few training method.
  • Use the cohere.command-light model as the base.
  • Ensure that all base models have the same version.
  • Create an endpoint for each model on the same hosting cluster.
Large Cohere for the base model, cohere.command

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the cohere.command model with the same version on the same hosting cluster.

Hosting Custom Models

To host different custom models on the same cluster:

  • Fine-tune all the models with the T-Few training method.
  • Use the cohere.command model as the base.
  • Ensure that all base models have the same version.
  • Add an endpoint to the hosting cluster for each model.
Llama2 70 for the base model, meta.llama-2-70b-chat

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:
  • Create as many endpoints as needed for the meta.llama-2-70b-chat model on the same hosting cluster.
Summarization (Deprecated)
Important

The cohere.command model supported for the on-demand serving mode is now retired and this model is deprecated for the dedicated serving mode. If you're hosting cohere.command on a dedicated AI cluster, (dedicated serving mode) for summarization, you can continue to use this hosted model replica with the summarization API and in the playground until the cohere.command model retires for the dedicated serving mode. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead which offer the same summarization capabilities, including control over summary length and style.

To host the pretrained cohere.command summarization model on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.

Hosting Cluster Unit Size Matching Rules
Large Cohere for the base model, cohere.command

Hosting Base Models

To host the same pretrained base model through several endpoints on the same cluster:

  • Create as many endpoints as needed for the cohere.command model with the same version on the same hosting cluster.

Hosting Custom Models

To host different custom models on the same cluster:

  • Fine-tune all the models with the T-Few training method.
  • Use the cohere.command model as the base.
  • Ensure that all base models have the same version.
  • Add an endpoint to the hosting cluster for each model.

Training Data

Datasets for training custom models have the following requirements:

  • A maximum of one fine-tuning dataset is allowed per custom model. This dataset is randomly split to a 80:20 ratio for training and validating.
  • Each file must have at least 32 prompt/completion pair examples.
  • The file format is JSONL.
  • Each line in the JSONL file has the following format:

    {"prompt": "<a prompt>", "completion": "<expected response given the prompt>"}\n

  • The file must be stored in an OCI Object Storage bucket.

Learn about Training Data Requirements in Generative AI.

Input Data for Text Embeddings

Input data for creating text embeddings has the following requirements:

  • You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
  • Only files with a .txt extension are allowed.
  • If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
  • A maximum of 96 inputs are allowed for each run.
  • Each input must be less than 512 tokens. If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.

Learn about Creating text embeddings in OCI Generative AI.