Creating and Saving a Model with the OCI Python SDK
Create a model with Python and save it directly to the model catalog.
To create and save a model, you must first create the model artifact.
We recommend that you create and save models to the model catalog programmatically instead, either using ADS or the OCI Python SDK.
-
We recommend that you create and save models to the model catalog programmatically instead, either using ADS or the OCI Python SDK.
-
You can use ADS to create large models. Large models have artifacts limitations of up to 400 GB.
- (Optional)
Upgrade OCI Python SDK with
.pip install oci –upgrade
-
Save a model object to disk. You can use various tools to save a model (Joblib, cloudpickle, pickle, ONNX, and so on). We recommend that you save a model object in the top-level directory of your model artifact and at the same level as the
score.py
andruntime.yaml
files. -
Change your
score.py
file to define theload_model()
andpredict()
functions. Change the body of both functions to support your model as follows:load_model()
-
Reads the model file on disk, and returns the estimator object. Ensure that you use the same library for serializing and deserializing the model object.
predict()
-
Contains two parameters,
data
andmodel
. The required parameter isdata
, which represents a dataset payload while model is an optional parameter. By default,model
is the object returned byload_model()
. Ensure that the data type of the data parameter matches the payload format you expect with model deployment.By default, model deployment assumes that
data
is a JSON payload (MIME type application/json
). Thepredict()
function converts the JSON payload into a model object data format. For example, a Pandas dataframe or a Numpy array when that's the data format supported by the model object. The body ofpredict()
can include data transformations, and other data manipulation tasks before a model prediction is made.
A few more things to consider:
- You can't edit the function signatures of
load_model()
andpredict()
. You can only edit the body of these functions to customize them. - If they're available in the artifact file, any custom Python modules can be imported using
score.py
, or as part of the conda environment used for inference purposes. -
You can save more than one model object in your artifact. You can load more than one estimator object to memory to perform an ensemble evaluation. In this case,
load_model()
can return an array of model objects thatpredict()
processes.
- (Optional)
Test the
score.predict()
function.We recommend that you test the
predict()
function in your local environment before saving the model to the model catalog. The following code snippet shows you how to pass a JSON payload to predict that mimics the behavior of your model deployed using model deployment. This is a good way to ensure that the model object is read byload_model()
. Also, that the predictions returned by your models are correct and in the format you expect. If you run this code snippet in a notebook session, you also get the output of any loggers you define inscore.py
in the output cell.import sys from json import dumps # The local path to your model artifact directory is added to the Python path. # replace <your-model-artifact-path> sys.path.insert(0, f"
<your-model-artifact-path>
") # importing load_model() and predict() that are defined in score.py from score import load_model, predict # Loading the model to memory _ = load_model() # Take a sample of your training or validation dataset and store it as data. # Making predictions on a JSON string object (dumps(data)). Here we assume # that predict() is taking data in JSON format predictions_test = predict(dumps(data), _) # Compare the predictions captured in predictions_test with what you expect for data: predictions_test -
Change the
runtime.yaml
file.This file provides a reference to the conda environment you want to use for the runtime environment for model deployment. Minimally, the file must contain the following fields for a model deployment:
MODEL_ARTIFACT_VERSION: '3.0' MODEL_DEPLOYMENT: INFERENCE_CONDA_ENV: INFERENCE_ENV_SLUG:
<the-slugname>
# for example mlcpuv1 see: https://docs.oracle.com/en-us/iaas/data-science/using/conda-gml-fam.htm INFERENCE_ENV_TYPE:<env-type>
# can either be "published" or "data_science" INFERENCE_ENV_PATH:<conda-path-on-object-storage>
INFERENCE_PYTHON_VERSION:<python-version-of-conda-environment>
Following is an example of a
runtime.yaml
file. The data scientist is selecting the Data Science TensorFlow 2.3 for CPU conda environment.MODEL_ARTIFACT_VERSION: '3.0' MODEL_DEPLOYMENT: INFERENCE_CONDA_ENV: INFERENCE_ENV_SLUG: tensorflow23_p37_cpu_v1 INFERENCE_ENV_TYPE: data_science INFERENCE_ENV_PATH: oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/Tensorflow for CPU Python 3.7/1.0/tensorflow23_p37_cpu_v1 INFERENCE_PYTHON_VERSION: '3.7'
- (Optional)
(Recommended) Before saving a model to the catalog, we recommend that you run a series of introspection tests on your model artifact.
The purpose of these tests is to identify any errors by validating
score.py
andruntime.yaml
files with a set of checks to ensure that they have right syntax, parameters, and versions. Introspection tests are defined as part of the model artifact code template.-
Python version 3.5 or greater is required to run the tests. Before running the tests locally on your machine, you must install the
pyyaml
andrequests
Python libraries. This installation is a one-time operation.Go to your artifact directory. Run the following command to install the required third-party dependencies:
python3 -m pip install --user -r artifact-introspection-test/requirements.txt
-
Run the tests locally by replacing
<artifact-directory>
with the path to the model artifact directory:python3 artifact-introspection-test/model_artifact_validate.py --artifact
<artifact-path>
-
Inspect the test results.
The
model_artifact_validate.py
script generates two output files in the top-level directory of your model artifacts:-
test_json_output.json
-
test_html_output.html
You can open either file to inspect the errors. If you're opening the HTML file, error messages are displayed in the red background.
-
- Repeat steps 2-6 until all tests run successfully. After the tests are running successfully, the model artifact is ready to be saved to the model catalog.
-
Python version 3.5 or greater is required to run the tests. Before running the tests locally on your machine, you must install the
-
Create and save the model to the model catalog using the OCI SDK with an OCI configuration file, which is part of standard SDK access management.
-
Initialize the client with:
# Create a default config using DEFAULT profile in default location # Refer to # https://docs.cloud.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm#SDK_and_CLI_Configuration_File # for more info import oci from oci.data_science.models import CreateModelDetails, Metadata, CreateModelProvenanceDetails, UpdateModelDetails, UpdateModelProvenanceDetails config = oci.config.from_file() data_science_client = oci.data_science.DataScienceClient(config=config) # Initialize service client with user principal (config file) config = oci.config.from_file() data_science_client = oci.data_science.DataScienceClient(config=config) # Alternatively initialize service client with resource principal (for example in a notebook session) # auth = oci.auth.signers.get_resource_principals_signer() # data_science_client = oci.data_science.DataScienceClient({}, signer=auth)
- (Optional)
Document the model provenance.
For example:
provenance_details = CreateModelProvenanceDetails(repository_url="EXAMPLE-repositoryUrl-Value", git_branch="EXAMPLE-gitBranch-Value", git_commit="EXAMPLE-gitCommit-Value", script_dir="EXAMPLE-scriptDir-Value", # OCID of the ML job Run or Notebook session on which this model was # trained training_id="
<Notebooksession or ML Job Run OCID>
" ) - (Optional)
Document the model taxonomy.
For example:
# create the list of defined metadata around model taxonomy: defined_metadata_list = [ Metadata(key="UseCaseType", value="image_classification"), Metadata(key="Framework", value="keras"), Metadata(key="FrameworkVersion", value="0.2.0"), Metadata(key="Algorithm",value="ResNet"), Metadata(key="hyperparameters",value="{\"max_depth\":\"5\",\"learning_rate\":\"0.08\",\"objective\":\"gradient descent\"}") ]
- (Optional)
Add your custom metadata (attributes).
For example:
# Adding your own custom metadata: custom_metadata_list = [ Metadata(key="Image Accuracy Limit", value="70-90%", category="Performance", description="Performance accuracy accepted"), Metadata(key="Pre-trained environment", value="https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/", category="Training environment", description="Environment link for pre-trained model"), Metadata(key="Image Sourcing", value="https://lionbridge.ai/services/image-data/", category="other", description="Source for image training data") ]
- (Optional)
Document the model input and output data schema definitions.
Important
The schema definition for both the input feature vector and model predictions are used for documentation purposes. This guideline applies to tabular datasets only.
For example:
import json from json import load # Declare input/output schema for our model - this is optional # It must be a valid json or yaml string # Schema like model artifact is immutable hence it is allowed only at the model creation time and cannot be updated # Schema json sample in appendix input_schema = load(open('SR_input_schema.json','rb')) input_schema_str= json.dumps(input_schema) output_schema = load(open('SR_output_schema.json','rb')) output_schema_str= json.dumps(output_schema)
- (Optional)
Document the introspection test results.
For example:
# Provide the introspection test results test_results = load(open('test_json_output.json','rb')) test_results_str = json.dumps(test_results) defined_metadata_list.extend([Metadata(key="ArtifactTestResults", value=test_results_str)])
- (Optional)
Set the client timeout value to avoid a Data Science service timeout error when saving large model artifacts:
import oci config = oci.config.from_file() data_science_client = oci.data_science.DataScienceClient(config=config) # Change the timeout value to 1800 sec (30 mins) data_science_client.base_client.timeout = 30 * 60
-
Create a zip archive of the model artifact:
import zipfile import os def zipdir(target_zip_path, ziph, source_artifact_directory): ''' Creates a zip archive of a model artifact directory. Parameters: - target_zip_path: the path where you want to store the zip archive of your artifact - ziph: a zipfile.ZipFile object - source_artifact_directory: the path to the artifact directory. Returns a zip archive in the target_zip_path you specify. ''' for root, dirs, files in os.walk(source_artifact_directory): for file in files: ziph.write(os.path.join(root, file), os.path.relpath(os.path.join(root,file), os.path.join(target_zip_path,'.'))) zipf = zipfile.ZipFile('<relpath-to-artifact-directory>.zip', 'w', zipfile.zip_DEFLATED) zipdir('.', zipf, "<relpath-to-artifact-directory>") zipf.close()
-
Create (save) the model in the model catalog:
# creating a model details object: model_details = CreateModelDetails( compartment_id='<compartment-ocid-of-model>', project_id='<project-ocid>', display_name='<display-name-of-model>', description='<description-of-model>', custom_metadata_list=custom_metadata_list, defined_metadata_list=defined_metadata_list, input_schema=input_schema_str, output_schema=output_schema_str) # creating the model object: model = data_science_client.create_model(model_details) # adding the provenance: data_science_client.create_model_provenance(model.data.id, provenance_details) # adding the artifact: with open('<relpath-to-artifact-directory>.zip','rb') as artifact_file: artifact_bytes = artifact_file.read() data_science_client.create_model_artifact(model.data.id, artifact_bytes, content_disposition='attachment; filename="<relpath-to-artifact-directory>.zip"')
-
Initialize the client with:
- Now you can view the model details and view the model information including any optional metadata that you defined.
Use these sample code files, and notebook examples to further help you design a model store.