Preparing Model Metadata
Model metadata is optional though recommended.
Model Provenance Metadata
You can document the model provenance. This is optional. The following table lists the supported model provenance metadata:
Metadata | Description |
---|---|
git_branch |
Branch of the Git repository. |
git_commit |
Commit id. |
repository_url |
URL of the remote Git repository. |
script_dir |
Local path to the artifact directory. |
training_id |
OCID of the resource used to train the model, notebook session or job run. You can use these environment variables when you save a model with the OCI SDK:
|
Example
provenance_details = CreateModelProvenanceDetails(repository_url="EXAMPLE-repositoryUrl-Value",
git_branch="EXAMPLE-gitBranch-Value",
git_commit="EXAMPLE-gitCommit-Value",
script_dir="EXAMPLE-scriptDir-Value",
# OCID of the ML job Run or Notebook session on which this model was
# trained
training_id="<<Notebooksession or ML Job Run OCID>>"
)
Model Taxonomy Metadata
You can document the model taxonomy. This is optional.
The metadata fields associated with model taxonomy let you describe the machine learning use case and framework behind the model. The defined metadata tags are the list of allowed values for use case type and framework for defined metadata and category values for custom metadata.
Preset Model Taxonomy
The following table lists the supported model taxonomy metadata:
Metadata | Description |
---|---|
UseCaseType |
Describes the machine learning use case associated with the model using one of the listed values like:
|
|
The machine learning framework associated with the model using one of the listed values like:
|
FrameworkVersion |
The machine learning framework version. This is a free text
value. For example, PyTorch 1.9 . |
Algorithm |
The algorithm or model instance class. This is a free text value.
For example, CART algorithm . |
Hyperparameters |
The hyperparameters of the model object. This is a JSON format. |
ArtifactTestResults |
The JSON output of the artifact tests run on the client side. |
Example
This example shows you how to document the model taxonomy, by capturing each
key-value pair which creates a list of Metadata()
objects:
# create the list of defined metadata around model taxonomy:
defined_metadata_list = [
Metadata(key="UseCaseType", value="image_classification"),
Metadata(key="Framework", value="keras"),
Metadata(key="FrameworkVersion", value="0.2.0"),
Metadata(key="Algorithm",value="ResNet"),
Metadata(key="hyperparameters",value="{\"max_depth\":\"5\",\"learning_rate\":\"0.08\",\"objective\":\"gradient descent\"}")
]
Custom Model Taxonomy
You can add your own custom metadata to document your model. The maximum allowed file size for the combined defined and custom metadata is 32000 bytes.
Each custom metadata has these four attributes:
Field or Key | Required? | Description |
---|---|---|
key |
Required |
The key and label of your custom metadata. |
value |
Required |
The value attached to the key. |
category |
Optional |
The category of the metadata. Select one of these five values:
The category attribute is useful to filter custom metadata. This is handy when one has a large number of custom metadata for a given model. |
description |
Optional |
A description of the custom medata. |
Example
This example shows how you can add custom metadata to capture the model accuracy, the environment, and the source of the training data:
# Adding your own custom metadata:
custom_metadata_list = [
Metadata(key="Image Accuracy Limit", value="70-90%", category="Performance",
description="Performance accuracy accepted"),
Metadata(key="Pre-trained environment",
value="https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/",
category="Training environment", description="Environment link for pre-trained model"),
Metadata(key="Image Sourcing", value="https://lionbridge.ai/services/image-data/", category="other",
description="Source for image training data")
]
Model Data Schemas Definition
You can document the model input and output data schemas. The input data schema
definition provides the blueprint of the data
parameter of the
score.py
file predict()
function. You can think of
the input data schema as the definition of the input feature vector that your model
requires to make successful predictions. The output schema definition documents what the
predict()
function returns.
The maximum allowed file size for the combined input and output schemas is 32000 bytes.
The schema definition for both input feature vector and model predictions are used for documentation purposes. This guideline applies to tabular datasets only.
The schema of the model input feature vector and output predictions is a JSON object. The
object has a top-level list with a key called schema
. The schema
definition of each column is a different entry in the list.
You can use ADS to automatically extract the schema definition from a specific training dataset.
For each column, the schema can be fully defined by assigning values to all these attributes:
Field or Key | Type | Required? | Description |
---|---|---|---|
name |
STRING |
Required |
The name of the column. |
description |
STRING |
Optional |
The description of the column. |
required |
BOOL |
Required |
Whether the column is a required input feature to make a model prediction. |
dtype |
STRING |
Required |
The data type of the column. |
domain |
OBJECT |
Optional |
The range of allowed values that the feature can take. |
The domain
field is an dictionary containing the following keys:
Field or Key | Type | Required? | Description | Notes |
---|---|---|---|---|
domain.constraints |
LIST |
Optional |
Supports a list of predicates to constraints the range of allowed values for the feature. You can input a language specific string expression template, which
can be evaluated by the language interpreter and compiler. With
Python, the string format is expected to follow
Constraints can be expressed using a list of expressions. For
example, |
You can apply more than one constraint. Example of an expression:
|
domain.stats |
OBJECT |
Optional |
A dictionary of summary statistics describing the feature. For
For category:
|
In ADS, the statistics are automatically generated based on the
|
domain.values |
STRING |
Optional |
Represent the semantic type of the column. Supported values include:
|
|
domain.name |
STRING |
Optional |
Name of the attribute. |
|
domain.dtype |
STRING |
Required |
The Pandas data type of the data. For example:
|
|
domain.dtype |
STRING |
Required |
The feature type of the data. For example:
|
Example of an Input Data Schema
schema:
- description: Description of the column
domain:
constraints:
- expression: '($x > 10 and $x <100) or ($x < -1 and $x > -500)' # Here user can input language specific string expression template which can be evaluated by the language interpreter/compiler. In case of python the string format expected to follow string.Template recognized format.
language: python
stats: # This section is flexible key value pair. The stats will depend on what user wants to save. By default, the stats will be automatically generated based on the `feature_stat` in feature types
mean: 20
median: 21
min: 5
values: numbers # The key idea is to communicate what should be the domain of values that are acceptable. Eg rational numbers, discreet numbers, list of values, etc
name: MSZoing # Name of the attribute
required: false # If it is a nullable column
Example of an Output Data Schema
{
"predictionschema": [
{
"description": "Category of SR",
"domain": {
"constraints": [],
"stats": [],
"values": "Free text"
},
"name": "category",
"required": true,
"type": "category"
}
]
}
Model Introspection Testing
Using ADS for Introspection Testing
You can invoke the introspection manually by calling .introspect()
method on the ModelArtifact
object.
rf_model_artifact.introspect()
rf_model_artifact.metadata_taxonomy['ArtifactTestResults']
The result of model introspection is automatically saved to the taxonomy metadata and model artifacts. Model introspection is automatically triggered when the .prepare()
method is invoked to prepare model artifact.
The .save()
method doesn't perform a model introspection because this is normally done during the model artifact preparation stage. However, setting ignore_introspection
to False
causes model introspection to be performed during the save operation.