Semantic Search in OCI OpenSearch

OCI Search with OpenSearch supports semantic search starting with OpenSearch version 2.11.

With semantic search, search engines use the context and content of search queries to better understand the meaning of a query, compared to keyword search, where search results are based on content matching keywords in a query. OpenSearch implements semantic search using neural search, which is a technique that uses large language models to understand the relationships between terms. For more information about neural search in OpenSearch, see Neural search tutorial.

Using Neural Search in OCI Search with OpenSearch

To use neural search for semantic search in OCI Search with OpenSearch, you need to:

Register and deploy your choice of model to the cluster.
Create an index and set up an ingestion pipeline using the deployed model. Use the ingestion pipeline to ingest documents into the index.
Run semantic search queries on the index using either hybrid search or neural search.

Prerequisites

To use semantic search, the OpenSearch version for the cluster must be 2.11 or newer. By default, new clusters use version 2.11. See Creating an OpenSearch Cluster.

For existing clusters configured for version 2.3, you can perform an inline upgrade to version 2.11, for more information, see OpenSearch Cluster Software Upgrades.

To upgrade existing clusters configured for version 1.2.3 to 2.11, you need to use the upgrade process described in OpenSearch Cluster Software Upgrades.

Before you start setting up the model for semantic search, you need to complete the prerequisites, which include specifying the applicable IAM policy if required, and configuring the recommended cluster settings.

IAM Policy for Custom Models and Generative AI Connectors

If you're using one of the pretrained models that are hosted within OCI Search with OpenSearch you don't need to configure permissions, you can skip to the next prerequisite, Cluster Settings for Semantic Search. See also Semantic Search Walkthrough.

Otherwise, you need to create a policy to grant the required access.

You need to create a policy to grant the required access.

If you're new to policies, see Getting Started with Policies and Common Policies.

IAM Policy for Custom Models

If you're using a custom model, you need to grant access for OCI Search with OpenSearch to access to the Object Storage bucket that contains the model.

The following policy example includes the required permission:

ALLOW ANY-USER to manage object-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}

IAM Policy for Generative AI Connectors

If you're using a Generative AI connector, you need to grant access for OCI Search with OpenSearch to access Generative AI resources.

The following policy example includes the required permission:

ALLOW ANY-USER to manage generative-ai-family in tenancy WHERE ALL {request.principal.type='opensearchcluster', request.resource.compartment.id='<cluster_compartment_id>'}

Regions for Generative AI Connectors

To use OCI Generative AI, your tenancy must be subscribed to the US Midwest (Chicago) region or the Germany Central (Frankfurt) region. You don't need to create the cluster in either of those regions, just ensure that your tenancy is subscribed to one of the regions.

Cluster Settings for Semantic Search

Use the settings operation of the Cluster APIs to configure the recommended cluster settings for semantic search. The following example includes the recommended settings:

PUT _cluster/settings
{
  "persistent": {
    "plugins": {
      "ml_commons": {
        "only_run_on_ml_node": "false",
        "model_access_control_enabled": "true",
        "native_memory_threshold": "99",
        "rag_pipeline_feature_enabled": "true",
        "memory_feature_enabled": "true",
        "allow_registering_model_via_local_file": "true",
        "allow_registering_model_via_url": "true",
        "model_auto_redeploy.enable":"true",
        "model_auto_redeploy.lifetime_retry_times": 10
      }
    }
  }
}

Setting up a Model

The first step when configuring neural search is setting up the large language model you want to use. The model is used to generate vector embeddings from text fields.

Register a Model Group

Model groups enable you to manage access to specific models. Registering a model group is optional, however if you don't register a model group, ML Commons creates registers a new model group for you, so we recommend that you register the model group.

POST /_plugins/_ml/model_groups/_register
{
  "name": "new_model_group",
  "description": "A model group for local models"
}

Make note of the model_group_id returned in the response:

{
  "model_group_id": "<model_group_ID>",
  "status": "CREATED"
}

Register the Model to the Model Group

Register the model using the register operation from the Model APIs. The content of the POST request to the register operation depends on the type of model you're using.

Option 1: Built-in OpenSearch pretrained models

Several pretrained sentence transformer models are available for you to directly register and deploy to a cluster without needing to download and then upload them manually into a private storage bucket, unlike the process required for the custom models option. With this option, when you register a pretrained model, you only need the model's model_group_id, name, version, and model_format. See Using an OpenSearch Pretrained Model for how to use a pretrained model.

Option 2: Custom models

You need to pass the Object Storage URL in the actions section in the register operation, as follows:

POST /_plugins/_ml/models/_register
{
.....
        "actions": [
            {
                "method": "GET",
                "action_type": "DOWNLOAD",
                "url": "<Object_Storage_URL_Path>"
            }
        ]
    }
}

For an complete example for a register operation, see Custom Models - 2: Register the Model.

Option 3: Generative AI connector

To use a Generative AI connector to register a remote embedding model such as the cohere.embed-english-v3.0 model, you need to create a connector first and then register the model, using the following steps:

Create a connector to Cohere Embedding model:

POST /_plugins/_ml/connectors/_create
{
  "name": "OCI GenAI Chat Connector cohere-embed-v5",
  "description": "The connector to public Cohere model service for embed",
  "version": "2",
  "protocol": "oci_sigv1",
 
    "parameters": {
      "endpoint": "inference.generativeai.us-chicago-1.oci.oraclecloud.com",
      "auth_type": "resource_principal", 
      "model": "cohere.embed-english-v3.0",
      "input_type":"search_document",
      "truncate": "END"
    },
 
     "credential": {
     },
     "actions": [
         {
             "action_type": "predict",
             "method":"POST",
             "url": "https://${parameters.endpoint}/20231130/actions/embedText",
             "request_body": "{ \"inputs\":[\"${parameters.passage_text}\"], \"truncate\": \"${parameters.truncate}\" ,\"compartmentId\": \"<compartment_ID>\", \"servingMode\": { \"modelId\": \"${parameters.model}\", \"servingType\": \"ON_DEMAND\" } }",
             "pre_process_function": "return '{\"parameters\": {\"passage_text\": \"' + params.text_docs[0] + '\"}}';",
              "post_process_function": "connector.post_process.cohere.embedding"
         }
     ]
 }

The response:

{
  "connector_id": "<connector_ID>"
}

POST /_plugins/_ml/models/_register
{
   "name": "oci-genai-embed-model",
   "function_name": "remote",
   "model_group_id": "<model_group_ID>",
   "description": "test semantic",
   "connector_id": "<connector_ID>"
 }

To use a dedicated Generative AI model endpoint, reconfigure the connector payload with the following changes:

Use endpointId instead of modelId, and then specify the dedicated model endpoint's OCID instead of the model name. For example, change:
```
\"modelId\": \"${parameters.model}\"
```
to:
```
\"endpointId\":\"<dedicated_model_enpoint_OCID>\"
```
Change servingType from ON_DEMAND to DEDICATED. For example, change:
```
\"servingType\":\"ON_DEMAND\"
```
to:
```
\"servingType\":\"DEDICATED\"
```

After you make the register request, you can check the status of the operation, use the task_id with the Get operation of the Tasks APIs, as shown in the following example:

GET /_plugins/_ml/tasks/<task_ID>

When the register operation is complete, the status value in the response to the Get operation is COMPLETED, as shown the following example:

{
  "model_id": "<embedding_model_ID>",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "3qSqVfK2RvGJv1URKfS1bw"
  ],
  "create_time": 1706829732915,
  "last_update_time": 1706829780094,
  "is_async": true
}

Make note of the model_id value returned in the response to use when you deploy the model.

Deploy the Model

After the register operation is completed for the model, you can deploy the model to the cluster using the deploy operation of the Model APIs, passing the model_id from the Get operation response in the previous step, as shown in the following example:

POST /_plugins/_ml/models/<embedding_model_ID>/_deploy

Make note of the task_id returned in the response, you can use the task_id to check the status of the operation.

For example, from the following response:

{
  "task_id": "<task_ID>",
  "task_type": "DEPLOY_MODEL",
  "status": "CREATED"
}

to check the status of the register operation, use the task_id with the Get operation of the Tasks APIs, as shown in the following example:

GET /_plugins/_ml/tasks/<task_ID>

When the deploy operation is complete, the status value in the response to the Get operation is COMPLETED.

Ingest Data

The first step when configuring neural search is setting up the large language model you want to use. The model is used to generate vector embeddings from text fields.

Create Ingestion Pipeline

Using the model ID of the model deployed, create an ingestion pipeline as shown in the following example:

PUT _ingest/pipeline/test-nlp-pipeline
{
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text_embedding": {
        "model_id": "<model_ID>",
        "field_map": {
           "passage_text": "passage_embedding"
        }
      }
    }
  ]
}

The ingestion pipeline defines a processor and the field mappings (in this case "passage_text" → "passage_embedding" ) This means if you use this pipeline on a specific index to ingest data, the pipeline automatically finds the "passage_text" field, and use the pipeline model to generate the corresponding embeddings, "passage_embedding", and then maps them before indexing.

Remember "passage_text" → "passage_embedding" are user defined and can be anything you want. Ensure that you're consistent with the naming while creating the index where you plan to use the pipeline. The pipeline processor needs to be able to map the fields as described.

Create an Index

During the index creation, you can specify the pipeline you want to use to ingest documents into the index.

The following API call example shows how to create an index using the test-nlp-pipeline pipeline created in the previous step.

PUT /test-index
{
    "settings": {
        "index.knn": true,
        "default_pipeline": "test-nlp-pipeline"
    },
    "mappings": {
        "properties": {
            "passage_embedding": {
                "type": "knn_vector",
                "dimension": <model_dimension>,
                "method": {
                    "name":"hnsw",
                    "engine":"lucene",
                    "space_type": "l2",
                    "parameters":{
                        "m":512,
                        "ef_construction": 245
                    }
                }
            },
            "passage_text": {
                "type": "text"
            }
        }
    }
}

When creating the index, you also need to specify which library implementation of approximate nearest neighbor (ANN) you want to use. OCI Search with OpenSearch supports NMSLIB, Faiss, and Lucene libraries, for more information, see Search Engines.

The following example uses the Lucene engine.

{
  "model_id": "<model_ID>",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "3qSqVfK2RvGJv1URKfS1bw"
  ],
  "create_time": 1706829732915,
  "last_update_time": 1706829780094,
  "is_async": true
}

Ingest Data into Index

After successfully creating an index, you can now ingest data into the index as shown in the following example:

POST /test-index/_doc/1
{
  "passage_text": "there are many sharks in the ocean"
}
 
POST /test-index/_doc/2
{
  "passage_text": "fishes must love swimming"
}
 
POST /test-index/_doc/3
{
  "passage_text": "summers are usually very hot"
}
 
POST /test-index/_doc/4
{
  "passage_text": "florida has a nice weather all year round"
}

Use a GET to verify that the documents are being ingested correctly and embeddings are getting auto generated during ingestion:

GET /test-index/_doc/3

Oracle Cloud Infrastructure Documentation