Paying for Dedicated AI Clusters
You get the following benefits for using dedicated AI clusters in OCI Generative AI:
- Predictable pricing that doesn’t fluctuate with demand.
- Great for fine-tuning or hosting models.
- Minimum hosting commitment: 744 unit-hours per hosting cluster.
- Minimum fine-tuning commitment: 1 unit-hour per fine-tuning job. (Depending on the model, fine-tuning requires at least 2 units to run).
The following examples calculate dedicated AI cluster cost in OCI Generative AI. For calculating on-demand inferencing cost, see Paying for On-Demand Inferencing.
Matching Models to Dedicated Cluster Unit Prices
If you're hosting foundational models or fine-tuning them on dedicated AI clusters, you're charged by the unit hour rather than by transaction. In this case, see the following table for calculating dedicated AI cluster cost for the chat models.
Some OCI Generative AI foundational pretrained base models supported for the dedicated serving mode are now deprecated and will retire no sooner than 6 months after the release of the 1st replacement model. You can host a base model, or fine-tune a base model and host the fine-tuned model on a dedicated AI cluster (dedicated serving mode) until the base model is retired. For dedicated serving mode retirement dates, see Retiring the Models.
Chat Models
Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
Not available for fine-tuning |
|
|
|
|
Not available for fine-tuning |
|
|
|
|
Not available for fine-tuning |
|
|
|
|
Not available for fine-tuning |
|
|
|
|
|
|
|
|
|
Not available for fine-tuning |
|
|
|
|
|
|
|
|
You must request a limit increase to use the following resources:
Meta Llama Family
-
To host a Meta Llama 3.2 11B Vision model, you must request
dedicated-unit-llama2-70-count
to increase by 1. -
To host a Meta Llama 3.2 90B Vision model, you must request
dedicated-unit-llama2-70-count
to increase by 2. -
To host a Meta Llama 3.1 (70B) model, you must request
dedicated-unit-llama2-70-count
to increase by 2. -
To fine-tune a Meta Llama 3.1 (70B) model, you must request
dedicated-unit-llama2-70-count
to increase by 4. -
To host a Meta Llama 3.1 (405B) model, you must request
dedicated-unit-llama2-70-count
to increase by 8.
Cohere Command R Family
-
To host a Cohere Command R (deprecated) model, you must request
dedicated-unit-small-cohere-count
to increase by 1. -
To fine-tune a Cohere Command R (deprecated) model, you must request
dedicated-unit-small-cohere-count
to increase by 8. -
To host a Cohere Command R 08-2024 model, you must request
dedicated-unit-small-cohere-count
to increase by 1. -
To fine-tune a Cohere Command R 08-2024 model, you must request
dedicated-unit-small-cohere-count
to increase by 8. -
To host a Cohere Command R+ (deprecated) model, you must request
dedicated-unit-large-cohere-count
to increase by 2. -
To host a Cohere Command R+ 08-2024 model, you must request
dedicated-unit-large-cohere-count
to increase by 2.
References: Service Limits for Generative AI and Request Cluster Limit Increase
For text generation, summarization, and text embedding models, see the tables in Matching Base Models to Clusters.
Hosting a Foundational Model Example 1
John wants to host an instance of the Command R+ 08-2024 (cohere.command-r-plus-08-2024
) model on dedicated infrastructure. John deletes the cluster after 40 days and wants to know cost of the cluster. To host a cohere.command-r-plus-08-2024
model, John first needs to identify the unit size that can host the cohere.command-r-plus-08-2024
model. The unit size for cohere.command-r-plus-08-2024
model is a Large Cohere V2_2 unit. See matching base models to clusters.
John needs a minimum of one Large Cohere V2_2 unit to host the cohere.command-r-plus-08-2024
model. Here are the steps to calculate the cost of a hosting cluster with one Large Cohere V2_2 unit.
Hosting a Foundational Model Example 2
Alice wants to host an instance of the Command R 08-2024 (cohere.command-r-08-2024
) model on dedicated infrastructure. To host a cohere.command-r-08-2024
model, Alice first needs to identify the unit size that can host the Command R 08-2024 model. The unit size for Command R 08-2024 is a Small Cohere V2 unit. See matching base models to clusters.
Alice decides to buy three units of Small Cohere V2 to handle a higher call volume to the model than a single unit would provide. Alice plans to delete the cluster after five days. Here are the steps to calculate the cost of a hosting cluster with three Small Cohere V2 units for five days.
Fine-Tuning and Hosting a Model Example
Bob wants to fine-tune a Command R 08-2024 (cohere.command-r-08-2024
) model. Bob creates a fine-tuning dedicated AI cluster with the preset value of eight Small Cohere V2 units. Bob creates a custom model on the fine-tuning dedicated AI cluster and fine-tunes the Command R 08-2024 foundational model with training data. The fine-tuning job takes 5 hours to complete. Bob creates a fine-tuning cluster every week.
To host a cohere.command-r-08-2024
model, Bob needs to identify the unit size that can host the cohere.command-r-08-2024
model. The unit size for cohere.command-r-08-2024
model is a Small Cohere V2 unit. See matching base models to clusters. Bob can host up to 50 fine-tuned models on a single hosting cluster. Here are the steps to calculate the monthly cost for fine-tuning and hosting the models.
In addition to calculating the price, you can estimate the cost by selecting the AI and Machine Learning category and loading the cost estimator for OCI Generative AI.