Conversational Search with OCI Generative AI
OCI Search with OpenSearch provides support for creating an OCI Generative AI connector.
You can leverage the connector to have access to all the Generative AI features such as Retrieval-Augmented Generation (RAG), text summarization, text generation, conversational search, and semantic search.
Prerequisites
- To use OCI Generative AI, the tenancy must be subscribed to the US Midwest (Chicago) region or the Germany Central (Frankfurt) region. You don't need to create the cluster in either of those regions, just ensure that the tenancy is subscribed to one of the regions.
-
To use an OCI Generative AI connector with OCI Search with OpenSearch, you need a cluster configured to use OpenSearch version 2.11. By default, new clusters are configured to use version 2.11. To create a cluster, see Creating an OpenSearch Cluster.
For existing clusters configured for version 2.3, you can perform an inline upgrade to version 2.11, for more information, see Inline Upgrade for OpenSearch Clusters.
To upgrade existing clusters configured for version 1.2.3 to 2.11, you need to use the upgrade process described in Upgrading an OpenSearch Cluster.
-
Create a policy to grant access to Generative AI resources. The following policy example includes the required permissions:
ALLOW ANY-USER to manage generative-ai-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}
If you're new to policies, see Getting Started with Policies and Common Policies.
-
Use the settings operation of the Cluster APIs to configure the recommended cluster settings that allow you to create a connector. The following example includes the recommended settings:
PUT _cluster/settings { "persistent": { "plugins": { "ml_commons": { "only_run_on_ml_node": "false", "model_access_control_enabled": "true", "native_memory_threshold": "99", "rag_pipeline_feature_enabled": "true", "memory_feature_enabled": "true", "allow_registering_model_via_local_file": "true", "allow_registering_model_via_url": "true", "model_auto_redeploy.enable":"true", "model_auto_redeploy.lifetime_retry_times": 10 } } } }
Register the Model Group
Register a model group using the register operation in the Model Group APIs, as shown in the following example:
POST /_plugins/_ml/model_groups/_register
{
"name": "public_model_group-emb",
"description": "This is a public model group"
}
Make note of the model_group_id
returned in the response:
{
"model_group_id": "<model_group_ID>",
"status": "CREATED"
}
Create the Connector
Create the Generative AI connector as shown in one of the following examples.
You have two endpoint options, actions/generateText
or
actions/chat
. We recommend that you use the
actions/chat
model.
actions/chat Endpoint Option
cohere.command-r-plus:
POST _plugins/_ml/connectors/_create { "name": "Cohere Commar-R-Plus Chat Connector", "description": "Check errors in logs", "version": 2, "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal" }, "credential": { }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://${parameters.endpoint}/20231130/actions/chat", "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"cohere.command-r-plus\",\"servingType\":\"ON_DEMAND\"},\"chatRequest\":{\"message\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"isStream\":false,\"chatHistory\":[],\"apiFormat\":\"COHERE\"}}", "post_process_function": "def text = params['chatResponse']['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'" } ] }
cohere.command-r-16k model:
POST _plugins/_ml/connectors/_create { "name": "Cohere Chat Connector", "description": "Check errors in logs", "version": 2, "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal" }, "credential": { }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://${parameters.endpoint}/20231130/actions/chat", "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"cohere.command-r-16k\",\"servingType\":\"ON_DEMAND\"},\"chatRequest\":{\"message\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"isStream\":false,\"chatHistory\":[],\"apiFormat\":\"COHERE\"}}", "post_process_function": "def text = params['chatResponse']['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'" } ] }
-
meta.llama-3-70b-instruct
model:
POST _plugins/_ml/connectors/_create { "name": "Llama3 Chat Connector", "description": "Check errors in logs", "version": 2, "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal" }, "credential": { }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://${parameters.endpoint}/20231130/actions/chat", "request_body": "{\"compartmentId\":\<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"meta.llama-3-70b-instruct\",\"servingType\":\"ON_DEMAND\"},\"chatRequest\":{\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":-1,\"isStream\":false,\"apiFormat\":\"GENERIC\",\"messages\":[{\"role\":\"USER\",\"content\":[{\"type\":\"TEXT\",\"text\":\"${parameters.prompt}\"}]}]}}", "post_process_function": "def text = params['chatResponse']['choices'][0]['message']['content'][0]['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'" } ] }
Authentication is done using a resource principal. Specify the cluster's compartment ID in
request_body
.
Make note of the connector_id
returned in the response:
{
"connector_id": "<connector_ID>",
}
actions/generateText Endpoint Option
-
cohere.command model:
POST _plugins/_ml/connectors/_create { "name": "OpenAI Chat Connector", "description": "when did us pass espio", "version": 2, "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal" }, "credential": { }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://${parameters.endpoint}/20231130/actions/generateText", "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"cohere.command\",\"servingType\":\"ON_DEMAND\"},\"inferenceRequest\":{\"prompt\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"returnLikelihoods\":\"GENERATION\",\"isStream\":false ,\"stopSequences\":[],\"runtimeType\":\"COHERE\"}}" } ] }
-
meta.llama-2-70b-chat:
POST _plugins/_ml/connectors/_create { "name": "OpenAI Chat Connector", "description": "testing genAI connector", "version": 2, "protocol": "oci_sigv1", "parameters": { "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com", "auth_type": "resource_principal" }, "credential": { }, "actions": [ { "action_type": "predict", "method": "POST", "url": "https://${parameters.endpoint}/20231130/actions/generateText", "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"meta.llama-2-70b-chat\",\"servingType\":\"ON_DEMAND\"},\"inferenceRequest\":{\"prompt\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":-1,\"isStream\":false,\"numGenerations\":1,\"stop\":[],\"runtimeType\":\"LLAMA\"}}", "post_process_function": "def text = params['inferenceResponse']['choices'][0]['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'" } ] }
Authentication is done using a resource principal. Specify the cluster's compartment ID in
request_body
.
Make note of the connector_id
returned in the response:
{
"connector_id": "<connector_ID>",
}
Dedicated Generative AI Model Endpoint Option
To use a dedicated Generative AI model endpoint, reconfigure the connector payload with the following changes:
- Use
endpointId
instead ofmodelId
, and then specify the dedicated model endpoint's OCID instead of the model name. For example, change:
to:\"modelId\":\"meta.llama-2-70b-chat\"
\"endpointId\":\"<dedicated_model_enpoint_OCID>\"
Change
servingType
fromON_DEMAND
toDEDICATED
. For example, change:
to:\"servingType\":\"ON_DEMAND\"
\"servingType\":\"DEDICATED\"
The following is a complete example that shows how to create connector using a dedicated model endpoint:
POST _plugins/_ml/connectors/_create
{
"name": "Cohere Commar-R-Plus Chat Connector",
"description": "Check errors in logs",
"version": 2,
"protocol": "oci_sigv1",
"parameters": {
"endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
"auth_type": "resource_principal"
},
"credential": {
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://${parameters.endpoint}/20231130/actions/chat",
"request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"endpointId\":\"<dedicated_model_enpoint_OCID>\",\"servingType\":\"DEDICATED\"},\"chatRequest\":{\"message\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"isStream\":false,\"chatHistory\":[],\"apiFormat\":\"COHERE\"}}",
"post_process_function": "def text = params['chatResponse']['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'"
}
]
}
Register the Model
Register the remote model using the Generative AI connector with the connector ID and model group ID from the previous steps, as shown in the following example:
POST /_plugins/_ml/models/_register
{
"name": "oci-genai-embed-test",
"function_name": "remote",
"model_group_id": "<model_group_ID>",
"description": "test semantic",
"connector_id": "<connector_ID>"
}
Deploy the Model
Register the remote model using the Generative AI connector with the connector ID and model group ID from the previous steps, as shown in the following example:
POST /_plugins/_ml/models/<embedding_model_ID>/_deploy