Creating a Dedicated AI Cluster in Generative AI for Hosting Models

Create a dedicated AI cluster resource in OCI Generative AI to host endpoints for pretrained base models and custom models.

Important

All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. If you host a summarization or a generation model such as cohere.command on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead.
  1. In the navigation bar of the Console, select a region with Generative AI, for example, US Midwest (Chicago) or UK South (London). See which models are offered in your region.
  2. Open the navigation menu and click Analytics & AI. Under AI Services, click Generative AI.
  3. Select a compartment in which you want to host the models.
    Ensure that you have permission to use or manage generative-ai-family and object-family resources in this compartment.
  4. In the left navigation, select a compartment that you have permission to work in.
  5. Click Dedicated AI clusters.
  6. Click Create dedicated AI cluster.
  7. Select a compartment to create the dedicated AI cluster in. The default compartment is the one you selected in step 3, but you can select any compartment that you have permission to work in.
  8. (Optional) Enter a name and description for the cluster. If you don't enter a name, the system generates one that you can change later.

    The generated name has the format generativeaidedicatedaicluster<timestamp>. For example: generativeaidedicatedaicluster20240601202357

  9. For Cluster type, click Hosting.
  10. For Base model, select the base model for the models that you want to host on this cluster.

    Chat

    • meta.llama-3.1-70b-instruct - Provisions 1 Large Generic unit.
    • meta.llama-3.1-405b-instruct - Provisions 1 Large Generic 4 unit.
    • cohere.command-r-16k - Provisions 1 Small Cohere V2 unit.
    • cohere.command-r-plus - Provisions 1 Large Cohere V2_2 unit.
    • meta.llama-3-70b-instruct - Provisions 1 Large Generic unit. (This model is deprecated.)

    Summarization (This model is deprecated.)

    Embedding

    • cohere.embed.english-light-v3.0 - Provisions 1 Embed Cohere unit.
    • cohere.embed.english-v3.0 - Provisions 1 Embed Cohere unit.
    • cohere.embed.multilingual-light-v3.0 - Provisions 1 Embed Cohere unit.
    • cohere.embed.multilingual-v3.0 - Provisions 1 Embed Cohere unit.
    Note

    The model list only includes the supported version of the base models.

  11. (Optional) Increase the number of instances in the Model replica field.
    Important

    When you create a cluster for hosting models for inference, by default one unit is created for the base model that you select. To increase the throughput, you can increase the number of instances in the Model replica field now, or later when you edit the cluster. For example, creating two model replicas on this cluster, requires two units.
  12. Read the commitment unit hours for the hosting cluster and select the checkbox to agree to the commitment.
  13. (Optional) Click Show advanced options and assign tags to this cluster.
  14. Click Create.
    Note

    Clusters take a few minutes to create. After the cluster is in an active state, you can select that cluster to host a model, when you create an endpoint for that model.