Invoking a Model Deployment

Invoking a model deployment means that you can pass feature vectors or data samples to the predict endpoint, and then the model returns predictions for those data samples.

After a model deployment is in an active lifecycleState, the predict endpoint can successfully receive requests made by clients.

From a model deployment detail page, select Invoking Your Model. The following details are displayed:

  • The model HTTP endpoint. For a private model deployment, HTTP endpoint contains a private FQDN that was set while creating the private endpoint. For more information, see Creating a Private Endpoint.
  • Sample code to invoke the model endpoint using the OCI CLI. Or, use the OCI Python and Java SDKs to invoke the model with the provide code sample.
  • The payload size limit is 10 MB.

  • The timeout on invoking a model is 60 seconds for HTTP calls.

Use the sample code to invoke a model deployment.

Invoking a model deployment calls the predict endpoint of the model deployment URI. This endpoint takes sample data as input and is processed using the predict() function in the score.py model artifact file. The sample data is in JSON format though can be in other formats. Processing means that the sample data could be transformed then passed to a models inference method. The models can generate predictions that can be processed before being returned back to the client.

The API responses are:

HTTP Status Code Error Code Description Retry

200

None

200 Success.

{
  "data": {
    "prediction": [
      "virginica"
    ]
  },
  "headers": {
    "content-length": "28",
    "content-type": "application/json",
    "opc-request-id": "
  },
  "status": "200 OK"
}

None

404

NotAuthorizedOrNotFound

Model deployment not found or authorization failed.

No

405

MethodNotAllowed

Method not allowed.

No

411

LengthRequired

Missing content length header.

No

413

PayloadTooLarge

The payload size limit is 10 MB.

No

429

TooManyRequests

Too Many Requests.

LB bandwidth limit exceeded

Consider increasing the provisioned load balancer bandwidth to avoid these errors by editing the model deployment.

Tenancy request-rate limit exceeded

Maximum number of requests per second per tenancy is set to 150.

If you're consistently receiving error messages after increasing the LB bandwidth, use the OCI Console to submit a support ticket for the tenancy. Include the following details in the ticket.

  • Describe the issue with the error message that occurred, and indicate the new request per second needed for the tenancy.

  • Indicate that it's a minor loss of service.
  • Indicate Analytics & AI and Data Science.

  • Indicate that the issue is creating and managing models.

Yes, with backoff

500

InternalServerError

Internal Server Error.

  • Service Timeout.

    There is a 60 second timeout for the /predict endpoint. This timeout value can't be changed.

  • The score.py file returns an exception.

Yes, with backoff

503

ServiceUnavailable

Model server unavailable.

Yes, with backoff

Model Inference Endpoint (Predict) Request Throttling

Predict endpoint requests might be throttled based on activity and resource consumption over time.

This is to maintain high availability and fair use of resources by protecting model serving application servers from being overwhelmed by too many requests, and prevent denial-of-service attacks. If you make too many requests too quickly, you might see some succeed while others fail. When a request fails because of throttling, the service returns response code 429 with one of the following error codes and description:

 { "code": "TooManyRequests", "message": "Tenancy request-rate limit exceeded. 
Please use the OCI Console to submit a support ticket for your tenancy to increase the RPS."} 

Or

 { "code": "TooManyRequests", "message": "LB bandwidth limit exceeded. 
Consider increasing the provisioned load balancer bandwidth to avoid these errors." } 

Invoking with the OCI Python SDK

This example code is a reference to help you invoke your model deployment:

import requests
import oci
from oci.signer import Signer
import json
 
# model deployment endpoint. Here we assume that the notebook region is the same as the region where the model deployment occurs.
# Alternatively you can also go in the details page of your model deployment in the OCI console. Under "Invoke Your Model", you will find the HTTP endpoint
# of your model.
endpoint = <your-model-deployment-uri>
# your payload:
input_data = <your-json-payload-str>
 
if using_rps: # using resource principal:    
    auth = oci.auth.signers.get_resource_principals_signer()
else: # using config + key:
    config = oci.config.from_file("~/.oci/config") # replace with the location of your oci config file
    auth = Signer(
        tenancy=config['tenancy'],
        user=config['user'],
        fingerprint=config['fingerprint'],
        private_key_file_location=config['key_file'],
        pass_phrase=config['pass_phrase'])
 
# post request to model endpoint:
response = requests.post(endpoint, json=input_data, auth=auth)
 
# Check the response status. Success should be an HTTP 200 status code
assert response.status_code == 200, "Request made to the model predict endpoint was unsuccessful"
 
# print the model predictions. Assuming the model returns a JSON object.
print(json.loads(response.content)) 

Invoking with the OCI CLI

Use a model deployment in the CLI by invoking it.

The CLI is included in the OCI Cloud Shell environment, and is preauthenticated. This example invokes a model deployment with the CLI:

oci raw-request --http-method POST --target-uri
<model-deployment-url>/predict --request-body '{"data": "data"}' 

Invoking a Model Deployment using a Private Endpoint

A model deployment configured with a private endpoint is only accessible through a private network. It can't be accessed through a public endpoint.

For more information on creating a private endpoint, see Creating a Private Endpoint.
Note

This feature is only available in the OC1 realm. For other realms create a service request.

The following steps in the Console ensure the application can access the private endpoint:

  1. Configure the Virtual cloud network (VCN) and subnet.

    The private endpoint connection is at the VCN level. If you have many subnets per VCN, you need to create only one private endpoint for that VCN. Ensure that security rules meet your requirements.

  2. (Optional) Configure Network security groups.
  3. Ensure that the subnet gives access to the private endpoint resource by setting up a security rule for ingress.
  4. Ensure that the subnet has available IP addresses.

    If no IP addresses are available in the specified subnet, then the work request for creating the private endpoint fails. For more information, see Private Endpoint Creation Failure.

    When the endpoint resource is reachable from the application, the predict request to the model deployment can be invoked through the private endpoint URL.

To invoke a model deployment through a private endpoint from the CLI, use the example command and required parameters. If the Notebook session instance is used to access a private model deployment, create it with a network type of custom networking that also resides on the same VCN and subnet as the private endpoint resource. For more information, see Creating a Notebook Session.

Run the following command using a Notebook session instance or a Cloud Shell instance that has access to the same VCN and subnet as that of the private endpoint resource:

oci raw-request --http-method POST --target-uri <private-endpoint-url>/<model-deployment-ocid>/predict --request-body '{"data": "data"}'