About Model Retirement

OCI Generative AI retires its large language models (LLMs) based on each model's type and serving mode. The LLMs serve the user requests in either an on-demand mode or a dedicated mode. Review the following sections to learn about each serving mode and how you can get notified before a model retires.

On-Demand Mode

You can reach the pretrained foundational models in Generative AI through two modes: on-demand and dedicated. Here are key features for the on-demand mode:

You pay as you go for each inference call when you use the models in the playground or when you call the models through the API.
Low barrier to start using Generative AI.
Great for experimentation, proof of concept, and model evaluation.
Available for the pretrained models in regions not listed as (dedicated AI cluster only).

Important

Dynamic Throttling Limit Adjustment for On-Demand Mode

OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access.

This adjustment depends on the following factors:

The current maximum throughput supported by the target model.
Any unused system capacity at the time of adjustment.
Each tenancy’s historical throughput usage and any specified override limits set for that tenancy.

Note: Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.

Tip

Because of the dynamic throttling limit adjustment, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of your integration to the service.

Retirement for On-Demand Mode: When a model is retired in the on-demand mode, it's no longer available for use in the Generative AI service playground or through the Generative AI Inference API.
Deprecation for On-Demand Mode: When a model is deprecated in the on-demand mode, it remains available in the Generative AI service, but has a defined amount of time that it can be used before it's retired. This amount of time is longer for the dedicated mode.

For the OCI Generative AI models, see the model retirement dates (on-demand mode).

Dedicated Mode

You get a dedicated set of GPUs for the dedicated AI clusters.
You can create custom models on the dedicated AI clusters, by fine-tuning a subset of the Offered Pretrained Foundational Models in Generative AI listed for fine-tuning.
You can host replicas of the foundational and fine-tuned models on the dedicated AI clusters.
You commit in advance to certain hours of using the dedicated AI clusters. For prices, see the pricing page.
Available for the pretrained models in all listed regions.
You get predictable performance and is suited for production workloads.

Retirement for Dedicated Mode

When a model is retired in the dedicated mode, you can no longer create a dedicated AI cluster for the retired model, but an active dedicated AI cluster running a retired model continues to run. A custom model, that's running off a retired model also continues to be available for active dedicated AI clusters and you can continue to create new dedicated AI clusters with a custom model that was created on a retired model. However, Oracle offers limited support for these scenarios, and Oracle engineering might ask you to upgrade to a supported model to resolve issues related to your model.

To request for a model to stay alive longer than the retirement date in a dedicated mode, create a support ticket.

Deprecation for Dedicated Mode

When a model is deprecated in the dedicated mode, it remains available in the Generative AI service, but has a defined amount of time that it can be used before it's retired. The dedicated mode deprecation time is longer than the on-demand deprecation time of the same model.

For the OCI Generative AI models, see the model retirement dates (dedicated mode).

Getting Notifications for Retirement Dates

You can subscribe to OCI Notifications service to get notified for model retirement dates. When you subscribe, you get model deprecation and retirement messages with the following cadence:

On-Demand Mode: 30 and 14 days before the model retirement date. (2 notifications)
Dedicated Mode: 180, 90, 60, 30, and 14 days before the model retirement date. (5 notifications)

Learn how to subscribe to OCI Announcements to receive notifications. When you create the announcement, for the service, select Oracle Cloud Infrastructure Generative AI Service.

Oracle Cloud Infrastructure Documentation

About Model Retirement

On-Demand Mode

Dedicated Mode

Getting Notifications for Retirement Dates