Conversational Search with OCI Generative AI

OCI Search with OpenSearch provides support for creating an OCI Generative AI connector.

You can leverage the connector to have access to all the Generative AI features such as Retrieval-Augmented Generation (RAG), text summarization, text generation, conversational search, and semantic search.

This topic covers the following steps required to create a Generative AI connector: Each step includes a generic template of the code required. You can use the Console to automatically generate this code with values configured for your environment, see Creating a RAG pipeline.

Prerequisites

  • To use OCI Generative AI, the tenancy must be subscribed to the US Midwest (Chicago) region or the Germany Central (Frankfurt) region. You don't need to create the cluster in either of those regions, just ensure that the tenancy is subscribed to one of the regions.
  • To use an OCI Generative AI connector with OCI Search with OpenSearch, you need a cluster configured to use OpenSearch version 2.11. By default, new clusters are configured to use version 2.11. To create a cluster, see Creating an OpenSearch Cluster.

    For existing clusters configured for version 2.3, you can perform an inline upgrade to version 2.11, for more information, see Inline Upgrade for OpenSearch Clusters.

    To upgrade existing clusters configured for version 1.2.3 to 2.11, you need to use the upgrade process described in Upgrading an OpenSearch Cluster.

  • Create a policy to grant access to Generative AI resources. The following policy example includes the required permissions:

    ALLOW ANY-USER to manage generative-ai-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}

    If you're new to policies, see Getting Started with Policies and Common Policies.

  • Use the settings operation of the Cluster APIs to configure the recommended cluster settings that allow you to create a connector. The following example includes the recommended settings:

    PUT _cluster/settings
    {
      "persistent": {
        "plugins": {
          "ml_commons": {
            "only_run_on_ml_node": "false",
            "model_access_control_enabled": "true",
            "native_memory_threshold": "99",
            "rag_pipeline_feature_enabled": "true",
            "memory_feature_enabled": "true",
            "allow_registering_model_via_local_file": "true",
            "allow_registering_model_via_url": "true",
            "model_auto_redeploy.enable":"true",
            "model_auto_redeploy.lifetime_retry_times": 10
          }
        }
      }
      }

Register the Model Group

Register a model group using the register operation in the Model Group APIs, as shown in the following example:

POST /_plugins/_ml/model_groups/_register
{
   "name": "public_model_group-emb",
   "description": "This is a public model group"
}

Make note of the model_group_id returned in the response:

{
  "model_group_id": "<model_group_ID>",
  "status": "CREATED"
}

Create the Connector

Create the Generative AI connector as shown in one of the following examples.

You have two endpoint options, actions/generateText or actions/chat. We recommend that you use the actions/chat model.

actions/chat Endpoint Option

  • cohere.command-r-plus:

    POST _plugins/_ml/connectors/_create
    {
         "name": "Cohere Commar-R-Plus Chat Connector",
         "description": "Check errors in logs",
         "version": 2,
         "protocol": "oci_sigv1",
         "parameters": {
             "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
             "auth_type": "resource_principal"
         },
         "credential": {
         },
         "actions": [
             {
                 "action_type": "predict",
                 "method": "POST",
                 "url": "https://${parameters.endpoint}/20231130/actions/chat",
                 "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"cohere.command-r-plus\",\"servingType\":\"ON_DEMAND\"},\"chatRequest\":{\"message\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"isStream\":false,\"chatHistory\":[],\"apiFormat\":\"COHERE\"}}",
                 "post_process_function": "def text = params['chatResponse']['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'"
     
             }
         ]
     } 
  • cohere.command-r-16k model:

    POST _plugins/_ml/connectors/_create
    {
         "name": "Cohere Chat Connector",
         "description": "Check errors in logs",
         "version": 2,
         "protocol": "oci_sigv1",
         "parameters": {
             "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
             "auth_type": "resource_principal"
         },
         "credential": {
         },
         "actions": [
             {
                 "action_type": "predict",
                 "method": "POST",
                 "url": "https://${parameters.endpoint}/20231130/actions/chat",
                 "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"cohere.command-r-16k\",\"servingType\":\"ON_DEMAND\"},\"chatRequest\":{\"message\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"isStream\":false,\"chatHistory\":[],\"apiFormat\":\"COHERE\"}}",
                 "post_process_function": "def text = params['chatResponse']['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'"
     
             }
         ]
     }
  • meta.llama-3-70b-instruct model:
    POST _plugins/_ml/connectors/_create
    {
         "name": "Llama3 Chat Connector",
         "description": "Check errors in logs",
         "version": 2,
         "protocol": "oci_sigv1",
         "parameters": {
             "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
             "auth_type": "resource_principal"
         },
         "credential": {
         },
         "actions": [
             {
                 "action_type": "predict",
                 "method": "POST",
                 "url": "https://${parameters.endpoint}/20231130/actions/chat",
                 "request_body": "{\"compartmentId\":\<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"meta.llama-3-70b-instruct\",\"servingType\":\"ON_DEMAND\"},\"chatRequest\":{\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":-1,\"isStream\":false,\"apiFormat\":\"GENERIC\",\"messages\":[{\"role\":\"USER\",\"content\":[{\"type\":\"TEXT\",\"text\":\"${parameters.prompt}\"}]}]}}",
                 
                  "post_process_function": "def text = params['chatResponse']['choices'][0]['message']['content'][0]['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'"
    
    
    
             }
         ]
     }

Authentication is done using a resource principal. Specify the cluster's compartment ID in request_body.

Make note of the connector_id returned in the response:

{
  "connector_id": "<connector_ID>",
}

actions/generateText Endpoint Option

  • cohere.command model:

    POST _plugins/_ml/connectors/_create
    {
         "name": "OpenAI Chat Connector",
         "description": "when did us pass espio",
         "version": 2,
         "protocol": "oci_sigv1",
         "parameters": {
             "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
             "auth_type": "resource_principal"
         },
         "credential": {
         },
         "actions": [
             {
                 "action_type": "predict",
                 "method": "POST",
                 "url": "https://${parameters.endpoint}/20231130/actions/generateText",
                 "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"cohere.command\",\"servingType\":\"ON_DEMAND\"},\"inferenceRequest\":{\"prompt\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"returnLikelihoods\":\"GENERATION\",\"isStream\":false ,\"stopSequences\":[],\"runtimeType\":\"COHERE\"}}"
             }
         ]
     }
  • meta.llama-2-70b-chat:

    POST _plugins/_ml/connectors/_create
    {
         "name": "OpenAI Chat Connector",
         "description": "testing genAI connector",
         "version": 2,
         "protocol": "oci_sigv1",
         "parameters": {
             "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
             "auth_type": "resource_principal"
         },
         "credential": {
         },
         "actions": [
             {
                 "action_type": "predict",
                 "method": "POST",
                 "url": "https://${parameters.endpoint}/20231130/actions/generateText",
                 "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"modelId\":\"meta.llama-2-70b-chat\",\"servingType\":\"ON_DEMAND\"},\"inferenceRequest\":{\"prompt\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":-1,\"isStream\":false,\"numGenerations\":1,\"stop\":[],\"runtimeType\":\"LLAMA\"}}",
                "post_process_function": "def text = params['inferenceResponse']['choices'][0]['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'"
                              
                   }
         ]
     }

Authentication is done using a resource principal. Specify the cluster's compartment ID in request_body.

Make note of the connector_id returned in the response:

{
  "connector_id": "<connector_ID>",
}

Dedicated Generative AI Model Endpoint Option

To use a dedicated Generative AI model endpoint, reconfigure the connector payload with the following changes:

  1. Use endpointId instead of modelId, and then specify the dedicated model endpoint's OCID instead of the model name. For example, change:
    \"modelId\":\"meta.llama-2-70b-chat\"
    to:
    \"endpointId\":\"<dedicated_model_enpoint_OCID>\"
  2. Change servingType from ON_DEMAND to DEDICATED. For example, change:

    \"servingType\":\"ON_DEMAND\"
    to:
    \"servingType\":\"DEDICATED\"

The following is a complete example that shows how to create connector using a dedicated model endpoint:

POST _plugins/_ml/connectors/_create
{
     "name": "Cohere Commar-R-Plus Chat Connector",
     "description": "Check errors in logs",
     "version": 2,
     "protocol": "oci_sigv1",
     "parameters": {
         "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
         "auth_type": "resource_principal"
     },
     "credential": {
     },
     "actions": [
         {
             "action_type": "predict",
             "method": "POST",
             "url": "https://${parameters.endpoint}/20231130/actions/chat",
             "request_body": "{\"compartmentId\":\"<cluster_compartment_id>\",\"servingMode\":{\"endpointId\":\"<dedicated_model_enpoint_OCID>\",\"servingType\":\"DEDICATED\"},\"chatRequest\":{\"message\":\"${parameters.prompt}\",\"maxTokens\":600,\"temperature\":1,\"frequencyPenalty\":0,\"presencePenalty\":0,\"topP\":0.75,\"topK\":0,\"isStream\":false,\"chatHistory\":[],\"apiFormat\":\"COHERE\"}}",
             "post_process_function": "def text = params['chatResponse']['text'].replace('\n', '\\\\n').replace('\"','');\n return '{\"name\":\"response\",\"dataAsMap\":{\"inferenceResponse\":{\"generatedTexts\":[{\"text\":\"' + text + '\"}]}}}'"
 
         }
     ]
 }

Register the Model

Register the remote model using the Generative AI connector with the connector ID and model group ID from the previous steps, as shown in the following example:

POST /_plugins/_ml/models/_register
{
   "name": "oci-genai-embed-test",
   "function_name": "remote",
   "model_group_id": "<model_group_ID>",
   "description": "test semantic",
   "connector_id": "<connector_ID>"
}

Deploy the Model

Register the remote model using the Generative AI connector with the connector ID and model group ID from the previous steps, as shown in the following example:

POST /_plugins/_ml/models/<embedding_model_ID>/_deploy