Reading data from OCI Autonomous Data Warehouse(ADW)

Overview

ML Monitoring Application provides ADWApplicationDataReader reader to read data from OCI ADW tables using user-provided SQL queries.

To use ADW Reader, fulfil the prerequisites mentioned in ADW Prerequisites

ADW Reader Authentication

ADW Readers connect to an OCI ADW instance using a mTLS connection. This requires the use of wallet files and database credentials which contain the required authentication information.

ADWApplicationDataReader reads wallet files and database credentials from OCI Vault. The steps are as follows:

  1. Upload each wallet file to OCI Vault. This creates a secret for each file.

  2. Write the appropriate policy to allow the principal (user/resource) to read secrets from OCI Vault.

  3. Add the following details in ML Monitoring Application Configuration file:

    {
       "type": "ADWApplicationDataReader",
       "params": {
         "auth": {
           "vault_ocid": "<vault_ocid>",
           "vault_tenancy_ocid": "<tanancy_ocid>"
         },
        }
    }
    
    • vault_ocid: The OCID of the vault that contains the secret.

    • vault_tenancy_ocid: The OCID of the tenancy where vault resides.

  4. By default, The ADW Reader expects the following wallet files to be present in OCI Vault in base 64 encoded format:

[
    "cwallet.sso",
    "ewallet.pem",
    "ewallet.p12",
    "keystore.jks",
    "ojdbc.properties",
    "tnsnames.ora",
    "truststore.jks",
    "sqlnet.ora",
    "db_username",
    "db_password"
]

6. If the secret names in OCI Vault are not same as default file names, provide the names explicitly in the configuration. As shown below, the cwallet.sso is uploaded file as custom-cwallet.sso in OCI Vault. In this case, the ADW Reader is configured to read the custom wallet file names from OCI Vault.

"baseline_reader": {
   "type": "ADWApplicationDataReader",
   "params": {
     "auth": {
       "vault_ocid": "<vault_ocid>",
        "vault_tenancy_ocid": "<tenancy_ocid>",
        "vault_secret_names": {
            "db_username": "custom_db_username",
            "db_password": "custom_db_password",
            "cwallet.sso": "custom-cwallet.sso",
            "ewallet.pem": "custom-ewallet.pem",
            "ewallet.p12": "custom-ewallet.p12",
            "keystore.jks": "custom-keystore.jks",
            "ojdbc.properties": "custom-ojdbc.properties",
            "tnsnames.ora": "custom-tnsnames.ora",
            "truststore.jks": "custom-truststore.jks",
            "sqlnet.ora": "custom-sqlnet.ora"
        }
     }
    }
}

Read data using ADW Application Reader

ADW readers read data from OCI cloud hosted Autonomous Data Warehouse using SQL query as mandatory input, which in return, outputs the data into the dataframe on which you can run the data analytics specifying the metrics and test results.

sample_config.json
    {
        "input_schema": {
        "salary": {
            "data_type": "INTEGER",
            "variable_type": "CONTINUOUS",
            "column_type": "INPUT"
        },
        "first_name": {
            "data_type": "STRING",
            "variable_type": "NOMINAL",
            "column_type": "INPUT"
        }
    },
    "dataset_metrics": [
        {
            "type": "RowCount"
        }
    ],
    "feature_metrics": {
        "salary": [
            {
                "type": "Count"
            },
            {
                "type": "Quartiles"
            },
            {
                "type": "FrequencyDistribution"
            },
            {
                "type": "ProbabilityDistribution"
            },
            {
                "type": "DistinctCount"
            },
            {
                "type": "Skewness"
            },
            {
                "type": "Variance"
            },
            {
                "type": "TypeMetric"
            },
            {
                "type": "KolmogorovSmirnov"
            }
        ],
        "first_name": [
            {
                "type": "Count"
            },
            {
                "type": "TopKFrequentElements"
            },
            {
                "type": "TypeMetric"
            },
            {
                "type": "ChiSquare"
            }
        ]
    },
    "prediction_reader": {
        "type": "ADWApplicationDataReader",
        "params": {
            "auth": {
                "vault_ocid": "<vault_ocid>",
                "vault_tenancy_ocid": "<tenancy_ocid>",
                "vault_secret_names": {
                    "db_username": "mlm-int2-db_username",
                    "db_password": "mlm-int2-db_password",
                    "cwallet.sso": "mlm-int2-cwallet.sso",
                    "ewallet.pem": "mlm-int2-ewallet.pem",
                    "ewallet.p12": "mlm-int2-ewallet.p12",
                    "keystore.jks": "mlm-int2-keystore.jks",
                    "ojdbc.properties": "mlm-int2-ojdbc.properties",
                    "tnsnames.ora": "mlm-int2-tnsnames.ora",
                    "truststore.jks": "mlm-int2-truststore.jks",
                    "sqlnet.ora": "mlm-int2-sqlnet.ora"
                }
            },
            "sql_query": "select * from employees where hire_date  > TO_DATE(:start,'DD-MM-YY') and  appt_date < TO_DATE(:end,'DD-MM-YY')",
            "index_col": "employee_id",
            "bind_variables": {
                "start": "",
                "end": ""
            }
        }
    },
    "baseline_reader": {
        "type": "ADWApplicationDataReader",
        "params": {
            "auth": {
                "vault_ocid": "<vault_ocid>",
                "vault_tenancy_ocid": "<tenancy_ocid>",
                "vault_secret_names": {
                    "db_username": "mlm-int2-db_username",
                    "db_password": "mlm-int2-db_password",
                    "cwallet.sso": "mlm-int2-cwallet.sso",
                    "ewallet.pem": "mlm-int2-ewallet.pem",
                    "ewallet.p12": "mlm-int2-ewallet.p12",
                    "keystore.jks": "mlm-int2-keystore.jks",
                    "ojdbc.properties": "mlm-int2-ojdbc.properties",
                    "tnsnames.ora": "mlm-int2-tnsnames.ora",
                    "truststore.jks": "mlm-int2-truststore.jks",
                    "sqlnet.ora": "mlm-int2-sqlnet.ora"
                }
            },
            "sql_query": "select * from employees where hire_date > TO_DATE('12-03-10','DD-MM-YY')",
            "index_col": "employee_id"
        }
    }
}

Override bind parameters using Runtime Parameters

When doing a prediction run, you can override bind_parameters to be used in sql_query using the date_range parameters for the prediction reader.

  • date_range start and end values should have date values compatible with date format specified in the sql_query

   select * from employees where hire_date  > TO_DATE(:start,'DD-MM-YY') and  appt_date < TO_DATE(:end,'DD-MM-YY');

- Runtime parameters are specified as follows:
   {"ACTION_TYPE":"RUN_PREDICTION", "DATE_RANGE": {"start": "15-03-23", "end": "17-03-23"}}

- Bind parameters are specified as follows:
"bind_variables": {
                "start": "",
                "end": ""
            }

The user API parameters of ADWApplicationDataReader are defined in the following table:

Parameter

Description

Examples

sql_query

  • The ADW readers read data from OCI Cloud hosted Autonomous Data Warehouse using a SQL query as a mandatory input parameter.

  • The datatypes provided in the input schema are honoured over the datatypes of the SQL schema in the query view.

  • Query the data using valid Oracle SQL compatible with the oracle-db python driver.

  • Required: Yes

query example

select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN from employees

bind_variables

  • Include the place holders for bind variables in the SQL query and mention the bind variables as key value parameters.

  • Required: No

bind variables example
  • sql_query: ``select * EMPLOYEES where hire_date > TO_DATE(:input_date,’DD-MM-YY’) ``

  • bind_variables:{ "input_date" : "13-01-08"}

auth

  • Use this parameter to provide authentication details required to connect to ADW to read data

  • Details are provided in ADW Reader Authentication

  • Required: Yes

ADW Reader Authentication

index_col

  • To read a large dataset efficiently in a partitioned manner, index_col serves as the partition key.

  • index_col column must be present in the select query view.

  • index_col must have NUMBER as its SQL datatype.

  • Required: No

  • Default: If not provided, all the data is read using a single partition only.

index column example
  • sql_query:select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN from employees

  • index_col:en

no_of_partitions

  • no_of_partitions is used to specify the number of partitions required to read a large dataset.

  • index_col must be specified to use the number of partitions.

  • Required: No

  • Default: If not provided, a default value of 1 is used.

number of partition example
  • sql_query:select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN from employees

  • index_col:en

  • no_of_partitions :3

partition_size

  • partition_size is used to specify the memory size of a partition required to read a large dataset

  • index_col must be specified to use the partition_size

  • The value of the partition size can take a numeric values like 10240 for 10 MB, or string values like 10 MiB for 10 MB.

  • The number of partitions is calculated dynamically using size of the data read and the partition_size value.

  • Required: No

  • Default: If not provided, a default value of 256 MiB is used.

partition size example
  • sql_query:select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN from employees

  • index_col:en

  • partition_size :10240

Note

  • If the partition_size and no_of_partitions are not provided, the number of partitions are calculated using

the default value of partition_size which is 256 MiB.

  • If the partition_size and no_of_partitions are both provided, no_of_partitions is honoured.