Paying for On-Demand Inferencing

You get the following benefits with committing to on-demand inferencing in OCI Generative AI:

Low barrier to start using Generative AI.
Access to all available Generative AI foundational models.
Great for experimenting and evaluating the models.
Pay as you go for transactions. See the following note for details.

Note

With on-demand inferencing you pay as you go for the following character lengths:

Chat: prompt length (in characters) + response length (in characters)
Text Embeddings: input length (in characters)

On the Pricing page, 1 character is calculated as 1 transaction.

If you're hosting foundational models or fine-tuning them on dedicated AI clusters, you're charged by the unit hour rather than by transaction. In this case, see Paying for Dedicated AI Clusters to learn how to calculate the dedicated AI cluster costs.

Matching Models to On-Demand Prices

The pricing page lists the price for 10,000 on-demand transactions when using the playground, API, or CLI for inferencing.

Go to the pretrained models page and select the model that you want to work with. In the On-Demand Mode section, find the Pricing Page Information for the model. Then, review the examples in this section to learn how to calculate cost based on the number of input and output characters.

Chat Example

Paul calls the meta.llama-3.3-70b-instruct model with the following prompt, which is 220 characters long:

Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts.

The response from the model is 2,205 characters long. Paul wants to know the cost for this call. Here are the steps to calculate the cost.

Calculate the prompt + response length (in characters).
Let's add up the prompt length (220 characters) and the model response length (2,205 characters).
```
                                    prompt + response length = 220 + 2,205 = 2,425 characters
```

Calculate the number of transactions.

Prices are listed for 10,000 transactions.

10,000 transactions = 10,000 characters, so 1 transaction = 1 character
2,425 characters = 2,425 transactions

Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI - Large Meta, find the <Large-Meta-unit-price>.
Paul uses the meta.llama-3.3-70b-instruct model which matches the product, Generative AI OCI - Large Meta on the AI Pricing page for Generative AI.

Calculate the price for 1,838 characters.

price = (2,425 transactions )/ (10,000 transactions) x $<Large-Meta-unit-price>

Tip

In addition to calculating the price, you can estimate the cost by selecting the AI and Machine Learning category and loading the cost estimator for OCI Generative AI.

Text Embeddings Example

Gina is converting customer contracts into embeddings for a new semantic search application. On average, Gina ingests 16 documents every hour. Each document is about 1,000 characters long. Gina wants to get an estimate of the monthly bill for generating those embeddings. Here are the steps to calculate the cost.

Calculate the input length (in characters).

Let's add up the input character length for each hour.


                                    input character length for 16 documents = 16 x 1,000 = 16,000 characters per hour

Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI - Embed Cohere, find the <Embed-Cohere-unit-price>.
Gina uses the cohere.embed-v4.0 model which matches the product, Oracle Cloud Infrastructure Generative AI - Embed Cohere on the AI Pricing page for Generative AI.
Calculate the number of transactions per hour.
Gina ingests 16,000 characters per hour. Prices are listed for 10,000 transactions.
```
10,000 transactions = 10,000 characters, so 1 transaction = 1 character
16,000 characters = 16,000 transactions
```

Find the hourly price for the 16,000 characters that Gina ingests hourly.

hourly price = 
(16,000 transactions ) / (10,000 transactions) x $<Embed-Cohere-unit-price>

Find the monthly price for the longest month of the year.

One month = 31 x 24 hours = 744 hours
monthly price = 744 hours x hourly price

Oracle Cloud Infrastructure Documentation

Paying for On-Demand Inferencing

Matching Models to On-Demand Prices

Chat Example

Text Embeddings Example