Paying for Dedicated AI Clusters

You get the following benefits for using dedicated AI clusters in OCI Generative AI:

  • Predictable pricing that doesn’t fluctuate with demand.
  • Great for fine-tuning or hosting models.
  • Minimum hosting commitment: 744 unit-hours per hosting cluster.
  • Minimum fine-tuning commitment: 1 unit-hour per fine-tuning job. (Depending on the model, fine-tuning requires at least 2 units to run).
Note

To find out which models are available for fine-tuning, see Matching Base Models to Clusters.

The following examples calculate dedicated AI cluster cost in OCI Generative AI. For calculating on-demand inferencing cost, see Paying for On-Demand Inferencing.

Matching Models to Dedicated Cluster Unit Prices

If you're hosting foundational models or fine-tuning them on dedicated AI clusters, you're charged by the unit hour rather than by transaction. In this case, see the following table for calculating dedicated AI cluster cost for the chat models.

Important

Some OCI Generative AI foundational pretrained models supported for the dedicated serving mode are now deprecated and will retire no sooner than 6 months after the release of the 1st replacement model. You can fine-tune and host a foundational pretrained model on a dedicated AI cluster (dedicated serving mode) until that model is retired. For dedicated serving mode retirement dates, see Retiring the Models.

Chat Models

Base Model Fine-Tuning Cluster Hosting Cluster Pricing Page Information Request Cluster Limit Increase
  • Model Name: Cohere Command R
  • OCI Model Name: cohere.command-r-16k
  • Unit Size: Small Cohere V2
  • Required Units: 8
  • Unit Size: Small Cohere V2
  • Required Units: 1
  • Pricing Page Product Name: Small Cohere - Dedicated
  • For Fine-Tuning, Multiply the Unit Price: x8
  • Limit Name: dedicated-unit-small-cohere-count
  • For Hosting, Request Limit Increase by: 1
  • For Fine-Tuning, Request Limit Increase by: 8
  • Model Name: Cohere Command R+
  • OCI Model Name: cohere.command-r-plus
Not available for fine-tuning
  • Unit Size: Large Cohere V2_2
  • Required Units: 1
  • Pricing Page Product Name: Large Cohere - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • Limit Name: dedicated-unit-large-cohere-count
  • For Hosting, Request Limit Increase by: 2
  • Model Name: Meta Llama 3.1 (405B)
  • OCI Model Name: meta.llama-3.1-405b-instruct
Not available for fine-tuning
  • Unit Size: Large Generic 4
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x8
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 8
  • Model Name: Meta Llama 3.1 (70B)
  • OCI Model Name: meta.llama-3.1-70b-instruct
  • Unit Size: Large Generic
  • Required Units: 2
  • Unit Size: Large Generic
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • For Fine-Tuning, Multiply the Unit Price: x4
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 2
  • For Fine-Tuning, Request Limit Increase by: 4
  • Model Name: Meta Llama 3
  • OCI Model Name: meta.llama-3-70b-instruct (deprecated)
  • Unit Size: Large Generic
  • Required Units: 2
  • Unit Size: Large Generic
  • Required Units: 1
  • Pricing Page Product Name: Large Meta - Dedicated
  • For Hosting, Multiply the Unit Price: x2
  • For Fine-Tuning, Multiply the Unit Price: x4
  • Limit Name: dedicated-unit-llama2-70-count
  • For Hosting, Request Limit Increase by: 2
  • For Fine-Tuning, Request Limit Increase by: 4
Important

You must request a limit increase to use the following resources:

Meta Llama Family

  • To host a Meta Llama 3.1 (405B) model, you must request dedicated-unit-llama2-70-count to increase by 8.

  • To host a Meta Llama 3.1 (70B) model, you must request dedicated-unit-llama2-70-count to increase by 2.

  • To fine-tune a Meta Llama 3.1 (70B) model, you must request dedicated-unit-llama2-70-count to increase by 4.

Cohere Command R Family

  • To host a Cohere Command R+ model, you must request dedicated-unit-large-cohere-count to increase by 2.

  • To host a Cohere Command R model, you must request dedicated-unit-small-cohere-count to increase by 1.

  • To fine-tune a Cohere Command R model, you must request dedicated-unit-small-cohere-count to increase by 8.

References: Service Limits for Generative AI and Request Cluster Limit Increase

For text generation, summarization, and text embedding models, see the tables in Matching Base Models to Clusters.

Hosting a Foundational Model Example 1

John wants to host an instance of the Cohere Command R+ (cohere.command-r-plus) model on dedicated infrastructure. John deletes the cluster after 40 days and wants to know cost of the cluster. To host a cohere.command-r-plus model, John first needs to identify the unit size that can host the cohere.command-r-plus model. The unit size for cohere.command-r-plus model is a Large Cohere V2_2 unit. See matching base models to clusters.

John needs a minimum of one Large Cohere V2_2 unit to host the cohere.command-r-plus model. Here are the steps to calculate the cost of a hosting cluster with one Large Cohere V2_2 unit.

  1. Calculate the unit hours for 40 days.
    40 days x 24 hours per day x 1 unit = 960 unit hours.
  2. Ensure that the unit hours exceed the minimum commitment for hosting the models.
    960 unit hours > 744 minimum unit hours
  3. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI- Large Cohere - Dedicated, find the <Large-Cohere-dedicated-unit-per-hour-price>.
  4. From the matching base models to clusters page, find the multiplier for the cohere.command-r-plus model:
    For Hosting, Multiply the Unit Price: x 2
  5. Calculate the price for 40 days.
    price = (960 unit hours) x $<Large-Cohere-dedicated-unit-per-hour-price> x 2

Hosting a Foundational Model Example 2

Alice wants to host an instance of the Cohere Command R (cohere.command-r-16k) model on dedicated infrastructure. To host a cohere.command-r-16k model, Alice first needs to identify the unit size that can host the Cohere Command R model. The unit size for Cohere Command R is a Small Cohere V2 unit. See matching base models to clusters.

Alice decides to buy three units of Small Cohere V2 to handle a higher call volume to the model than a single unit would provide. Alice plans to delete the cluster after five days. Here are the steps to calculate the cost of a hosting cluster with three Small Cohere V2 units for five days.

  1. Calculate the unit hours.
    5 days x 24 hours per day x 3 units = 360 unit hours. 
  2. Compare the unit hours to the minimum commitment for hosting the models.
    360 unit hours < 744 minimum unit hours
    Alice is charged for 744 unit hours.
  3. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI- Small Cohere - Dedicated, find the <Small-Cohere-dedicated-unit-per-hour-price>.
  4. From the matching base models to clusters page, find the multiplier for the cohere.command-r-16k model:

    You don't need to multiply the price for hosting cohere.command-r-16k model.

  5. Calculate the cost for five days.
    price = (744 unit hours) x $<Small-Cohere-dedicated-unit-per-hour-price>

Fine-Tuning and Hosting a Model Example

Bob wants to fine-tune a Cohere Command R (cohere.command-r-16k) model. Bob creates a fine-tuning dedicated AI cluster with the preset value of eight Small Cohere V2 units. Bob creates a custom model on the fine-tuning dedicated AI cluster and fine-tunes the Cohere Command R foundational model with training data. The fine-tuning job takes 5 hours to complete. Bob creates a fine-tuning cluster every week.

To host a cohere.command-r-16k model, Bob needs to identify the unit size that can host the cohere.command-r-16k model. The unit size for cohere.command-r-16k model is a Small Cohere V2 unit. See matching base models to clusters. Bob can host up to 50 fine-tuned models on a single hosting cluster. Here are the steps to calculate the monthly cost for fine-tuning and hosting the models.

  1. Calculate the unit hours for each fine-tuning.
    Each fine-tuning cluster requires 8 units and each cluster is active for 5 hours
    fine-tuning per cluster = 40 unit-hours
  2. Compare the unit hours to the minimum commitment for fine-tuning the models.
    40 unit hours > 1 unit hour
  3. Calculate the unit hours for hosting.
    31 days x 24 hours per day x 1 unit = 744 unit hours
  4. Compare the unit hours to the minimum commitment for hosting the models.
    744 unit hours = 744 minimum unit hours
  5. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI- Small Cohere - Dedicated, find the <Small-Cohere-dedicated-unit-per-hour-price>.
  6. Find the total monthly price.
    fine-tuning price = (40 unit hours) per week x (4 weeks) x $<Small-Cohere-dedicated-unit-per-hour-price> 
                                
    fine-tuning price = 160 x <Small-Cohere-dedicated-unit-per-hour-price>
    hosting price = (744 unit hours) x $<Small-Cohere-dedicated-unit-per-hour-price>
    total monthly price = (160 + 744 unit hours) x $<Small-Cohere-dedicated-unit-per-hour-price>
Tip

In addition to calculating the price, you can estimate the cost by selecting the AI and Machine Learning category and loading the cost estimator for OCI Generative AI.