Custom Models
You can upload custom models to a cluster with the Bring Your Own Model (BYOM) process.
To use custom models, you need a cluster configured to use OpenSearch version 2.11. By default, new clusters use version 2.11. To create a cluster, see Creating an OpenSearch Cluster.
For existing clusters configured for version 2.3, you can perform an inline upgrade to version 2.11, for more information, see Inline Upgrade for OpenSearch Clusters.
To upgrade existing clusters configured for version 1.2.3 to 2.11, use the upgrade process described in Upgrading an OpenSearch Cluster.
The BYOM process to import custom models includes the following steps:
- Complete the following prerequisites:
- Configure required IAM policy.
-
Configure recommended cluster settings.
- Upload a custom model to an Object Storage bucket.
-
Register the model.
-
Deploy the model.
-
(Optional) Test the model.
1: Prerequisites
IAM Policy
You need to create a policy to grant OCI Search with OpenSearch access to the Object Storage bucket you upload the custom model to. The following policy example includes the required permissions:
ALLOW ANY-USER to manage object-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}
If you're new to policies, see Getting Started with Policies and Common Policies.
Regions for Generative AI Connectors
To use OCI Generative AI, your tenancy must be subscribed to the US Midwest (Chicago) region or the Germany Central (Frankfurt) region. You don't need to create the cluster in either of those regions, just ensure that your tenancy is subscribed to one of the regions.
Configure Cluster Settings
Use the settings operation of the Cluster APIs to configure the recommended cluster settings for semantic search. The following example includes the recommended settings:
PUT _cluster/settings
{
"persistent": {
"plugins": {
"ml_commons": {
"only_run_on_ml_node": "false",
"model_access_control_enabled": "true",
"native_memory_threshold": "99",
"rag_pipeline_feature_enabled": "true",
"memory_feature_enabled": "true",
"allow_registering_model_via_local_file": "true",
"allow_registering_model_via_url": "true",
"model_auto_redeploy.enable":"true",
"model_auto_redeploy.lifetime_retry_times": 10
}
}
}
}
Upload Model to Object Storage Bucket
To make a custom model available to register for a cluster, you need to upload the model to an Object Storage bucket in the tenancy. If you don't have an existing Object Storage bucket, you need to create the bucket. For a tutorial that walks you through how to create a bucket, see Creating a Bucket.
Next, you need to upload the custom model to the bucket, see Uploading Files to a Bucket for a tutorial that walks you through how to upload files to a bucket. For the purposes of this walkthrough, you can download any supported hugging face model to upload.
2: Register the Model
After you have uploaded a custom model to an Object Storage bucket, you need to get the URL for accessing the uploaded file and pass the URL in the register operation from the Model APIs. You can then use the Get operation of the Tasks APIs to track the completion of the register operation and get the model ID to use when you deploy the model.
To get the URL of the uploaded model file
Open the navigation menu and click Storage. Under Object Storage & Archive Storage, click Buckets.
Click the bucket that contains the uploaded model. The bucket's Details page appears.
Click the
next to the object name, and then select View Object Details. The Object Details dialog box appears.The URL to access the model file is displayed in the URL Path (URI) field. Copy the URL to use in the next step when you register the custom model.
Important
You might see a warning message indicating that the current URL in the URL Path (URI) field is deprecated, with a new URL specified in the warning message. If you see this warning message, use the new URL in the warning message instead to register the custom model.
Register the custom model
Use the register operation to register the custom model. In the following example, the
custom model uploaded to the Object Storage bucket is the
huggingface/sentence-transformers/all-MiniLM-L12-v2
model. The values specified in
model_config
for this example are from the model's config file. Ensure that you're using the applicable
model configuration values for the custom model you're registering.
Specify the Object Storage URL in the actions
section, this is an OCI Search with OpenSearch API added to support the BYOM
scenario.
POST /_plugins/_ml/models/_register
{
"model_group_id": "<Model_Group_ID>",
"name": "sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.",
"model_task_type": "TEXT_EMBEDDING",
"model_format": "TORCH_SCRIPT",
"model_content_size_in_bytes": 134568911,
"model_content_hash_value": "f8012a4e6b5da1f556221a12160d080157039f077ab85a5f6b467a47247aad49",
"model_config": {
"model_type": "bert",
"embedding_dimension": 384,
"framework_type": "sentence_transformers",
"all_config": "{\"_name_or_path\":\"microsoft/MiniLM-L12-H384-uncased\",\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":12,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}"
},
"url_connector": {
"protocol": "oci_sigv1",
"parameters": {
"auth_type": "resource_principal"
},
"actions": [
{
"method": "GET",
"action_type": "DOWNLOAD",
"url": "<Object_Storage_URL_Path>"
}
]
}
}
https://<tenancy_name>.objectstorage.us-ashburn-1.oraclecloud.com/n/<tenancy_name>/b/<bucket_name>/o/sentence-transformers_all-distilroberta-v1-1.0.1-torch_script.zip
Make note of the task_id
returned in the response, you can use the
task_id
to check the status of the operation.
For example, from the following response:
{
"task_id": "<task_ID>
",
"status": "CREATED"
}
Track the register task and get the model ID
To check the status of the register operation, use thetask_id
with the Get operation of the Tasks APIs, as shown in the following example:
GET /_plugins/_ml/tasks/<task_ID>
When
the register operation is complete, the status
value in the response to the
Get operation is COMPLETED
, as shown the following example:
{
"model_id": "<embedding_model_ID>",
"task_type": "REGISTER_MODEL",
"function_name": "TEXT_EMBEDDING",
"state": "COMPLETED",
"worker_node": [
"3qSqVfK2RvGJv1URKfS1bw"
],
"create_time": 1706829732915,
"last_update_time": 1706829780094,
"is_async": true
}
Make note of the model_id
value returned in the response to
use when you deploy the model.
3: Deploy the Model
After the register operation is completed for the model, you can deploy the model to the
cluster using the deploy operation of the Model APIs, passing the
model_id
from the Get operation response in the previous step, as shown in
the following example:
POST /_plugins/_ml/models/<embedding_model_ID>/_deploy
Make note of the task_id
returned in the response, you can use the
task_id
to check the status of the operation.
For example, from the following response:
{
"task_id": "<task_ID>
",
"task_type": "DEPLOY_MODEL",
"status": "CREATED"
}
to check the status of the register operation, use the task_id
with the
Get operation of the Tasks APIs, as shown in the following example:
GET /_plugins/_ml/tasks/<task_ID>
When the deploy operation is complete, the status
value in the response to
the Get operation is COMPLETED
.
4: Test the Model
After the model is successfully deployed, you can test the model by using the
text_embedding
endpoint, as shown in the following example:
POST /_plugins/_ml/_predict/text_embedding/<embedding_model_ID>
{
"text_docs":["hellow world", "new message", "this too"]
}
Alternatively, you can use the _predict
endpoint, as shown in the
following example:
POST /_plugins/_ml/models/<embedding_model_ID>/_predict
{
"parameters":{
"passage_text": "Testing the cohere embedding model"
}
}