.. _Test/Test Suites Component: Test/Test Suites Component ============================= Up until now, we have seen Profile computation which produces the metrics results. As a machine learning engineer/data scientist, I would like to apply certain tests/checks on the metrics to understand if the metric values are breaching certain thresholds. For e.g.: I might want to be alerted if the Minimum metric of a feature is beyond a certain threshold or Q1 Quartiles for a feature in a prediction run is not within the threshold of the corresponding value in reference profile or for a classification model the precision score is within 10% deviation from the precision score of the baseline run. Insights Test/Test Suites enables comprehensive validation of customer's machine learning models and data via a suite of test and test suites for various types of use cases such as : - Data Integrity - Data Quality - Model Performance (Classification, Regression) - Drift - Correlation, etc. They provide a structured / easier way to add thresholds on metrics. This can be used for Notifications and alerts for continuous Model Monitoring allowing them to take remediative actions * :ref:`How it works` * :ref:`Tests/Test Suites API` - :ref:`Using Insights Builder API` - :ref:`Using Insights Config API to get insights builder` - :ref:`Using Insights Config API to get test builder` * :ref:`Insights Test Types` - :ref:`Predicate-based Tests` - :ref:`Metric-based Tests` * :ref:`Understanding Test Configuration` - :ref:`Defining Feature Tests` - :ref:`Defining Dataset Tests` - :ref:`Defining Global Test Tags` - :ref:`Defining Predicate-based Tests` - :ref:`Defining Metric-based Tests` * :ref:`List of Available Tests` - :ref:`Predicate-based Tests` - :ref:`Metric-based Tests` * :ref:`Test Results` - :ref:`Test Summary` - :ref:`Test Result` - :ref:`Test Results Grouping` .. _How it works: ----------------- How it works ----------------- .. image:: T-TS-Overview.png :width: 800 :alt: How does it work 1. User has created Baseline/Prediction Profile(s). 2. User works with Test Suites, Tests, Test Condition, Threshold , Test Results and Test Report. 3. Insights Test Suites are composed of Insight Tests. 4. Each Test has: * Test Condition (implicit or user provided). An example of user provided is >=, <=, etc. Implicit is used when running tests for a specific metric. * Threshold (either user provided or captured from reference profile). For eg: user can provide a value of 200 when evaluating Mean of a feature with greater than test * Test Configuration. Each test can take a test-specific config which tweaks its behavior. For eg: when using `TestGreaterThan` , user can decide whether to do a > or >= by setting appropriate config. 5. Tests are of various types allowing flexibility and ease of use: - :ref:`Predicate-based Tests` - :ref:`Metric-based Tests` - Custom Tests. Users can write tests specific to their needs if the library-provided tests do not meet their unique requirements. 6. Tests can be added/edited/removed to/from a Test Suite; 7. Tests Suites can be consumed using Out-of-thebox Test Suites, created from scratch or composed from an existing suite 8. Tests/Test Suites are executed producing test evaluation results. Each test evaluation results consists of: - Test Name - Test Description - Test Status (Pass/Fail/Error) - Test Assertion (expected v actual) - System Tags - User-defined Tags - Test Configuration (if any) - Test Errors (if any) 9. Test results can be stored in a customer provided bucket. 10. Further post processors can be added to push the alerts to oci monitoring based on each test evaluation result .. _Tests/Test Suites API: ----------------- API ----------------- Test/Test Suites API is available for use via 4 primary ways: - Insights Builder API - Insights Test Builder API - Insights Config API to get insights builder - Insights Config API to get test builder .. _Using Insights Builder API: ++++++++++++++++++++++++++ Using Insights Builder API ++++++++++++++++++++++++++ Insights Test/Test Suites can be executed as part of an Insights run by using the builder API's - with_test_config API to setup the test - with_reference_profile API to setup the reference profile 1. Begin by loading the required libraries and modules: .. code-block:: python3 from mlm_insights.builder.insights_builder import InsightsBuilder 2. Create Insights builder using required components, test config and reference profile (optional) .. code-block:: python3 # load the reference profiles using either `ProfileReader` or profile object test_config = TestConfig(tests=test) run_result = InsightsBuilder(). \ with_input_schema(input_schema). \ with_data_frame(data_frame=data_frame). \ with_test_config(test_config). \ with_reference_profile(reference_profile=reference_profile). \ build(). \ run() 3. Test result can be extracted from the run result object. .. code-block:: python3 test_result = run_result.test_results .. _Using Insights Config API to get insights builder: +++++++++++++++++++++++++++++++++++++++++++++++++ Using Insights Config API to get insights builder +++++++++++++++++++++++++++++++++++++++++++++++++ Insights Test/Test Suites can be executed as part of an Insights run. 1. Begin by loading the required libraries and modules: .. code-block:: python3 from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader 2. Create Insights builder using config, test config and reference profile (optional) .. code-block:: python3 run_result = InsightsConfigReader(config_location=test_config_path) .get_builder() .build() .run() 3. Test result can be extracted from the run result object. .. code-block:: python3 test_result = run_result.test_results .. _Using Insights Config API to get test builder: +++++++++++++++++++++++++++++++++++++++++++++ Using Insights Config API to get test builder +++++++++++++++++++++++++++++++++++++++++++++ Along with the library provided APIs, Insights Test/Test Suites can be set up and customized by authoring and passing a JSON configuration. This section shows how to use the Insights configuration reader to load the config and run the tests 1. Begin by loading the required libraries and modules: .. code-block:: python3 from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader 2. Create Test Context by providing current profile and reference profile (optional) .. code-block:: python3 from mlm_insights.tests.test_context.interfaces.test_context import TestContext from mlm_insights.tests.test_context.profile_test_context import ProfileTestContext # load the current and reference profiles using either `ProfileReader` or custom implementations def get_test_context() -> TestContext: # Create a test context and pass the profiles loaded in the step above return ProfileTestContext(current_profile=current_profile,reference_profile=reference_profile) 3. Initialize `InsightsConfigReader` by specifying the location of the monitor config JSON file which contains test configuration .. code-block:: python3 from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader test_results = InsightsConfigReader(config_location=test_config_path) .get_test_builder(test_context=get_test_context()) .build() .run() 4. Process the `test_results` to group them by test status/features or send them to Insights Post Processors for further usage. 5. Lets see an example of grouping the `test_results` by test status .. code-block:: python3 from mlm_insights.tests.constants import GroupByKey grouped_results = test_results.group_tests_by(group_by=GroupByKey.TEST_STATUS_KEY) .. _Insights Test Types: ---------------------------------- Insights Test Types ---------------------------------- Before we take a deep dive into the test configuration schema, this section explains the Test Types. Currently, Insights supports the following Test Types: - Predicate-based Tests - Metric-based Tests .. _Predicate-based Tests: ++++++++++++++++++++++ Predicate-based Tests ++++++++++++++++++++++ - General-purpose test to evaluate single conditions against a single metric for a feature. - Each test provides a single predicates (test condition) of the form : `lhs rhs` - For eg: lets consider a test to evaluate whether `Mean` of a feature is greater than 100.23. In this case: - `lhs` is the value of `Mean` metric, - `rhs` is `100.23`, - `` is greater than (`>=`) - For eg: `TestGreaterThan` is a predicate-based test which tests if a metric value is greater than a specific threshold - For a list of all predicate-based tests and their examples, please refer to section: :ref:`List of Predicate-based Tests` - Allows fetching the compared value (`rhs`) from a dynamic source such as a reference profile .. _Metric-based Tests: ++++++++++++++++++++++ Metric-based Tests ++++++++++++++++++++++ - Tests specific to an Insights metric - Has in-built metric key and test condition - For eg: `TestIsPositive` is a metric-based test which works on the `IsPositive` metric only and tests if a feature has all positive values - For a list of all metric-based tests and their examples, please refer to section: :ref:`List of Metric-based Tests` - When no threshold values are provided, fetches the built-in metric values from reference profile .. note:: - The metric associated with any metric-based or predicate-based test that is configured by the user must be present in the Profile. For e.g.: The `Count` metric should be present in the profile if user wishes to run `TestIsComplete` test. - If the metric associated with a particular metric-based or predicate-based test is not found during test execution, the test's status is set to `ERROR` and error details are added to the test result. .. _Understanding Test Configuration: ---------------------------------- Understanding Test Configuration ---------------------------------- Insights Tests can be provided in a declarative fashion using JSON format. All the tests need to be defined under a new key `test_config` in Insights Configuration. .. code-block:: JSON { "input_schema": {...}, // other components go here "test_config": {} } We will now look at the details of the `test_config` key in the sections below. .. _Defining Feature Tests: ++++++++++++++++++++++ Defining Feature Tests ++++++++++++++++++++++ - All Insights Tests for a specific feature need to be defined under `feature_metric_tests` key. The general structure is as below: .. code-block:: JSON { "test_config": { "feature_metric_tests": [ { "feature_name": "Feature_1", "tests": [ { // Each test is defined here } ] }, { "feature_name": "Feature_2", "tests": [ { // Each test is defined here } ] } ] } } .. note:: - The feature name provided in the `feature_name` key must be present in the Profile i.e it should come from features defined in either `input_schema` or via conditional features - If the feature provided in `feature_name` is not found during test execution, the test's status is set to `ERROR` and error details are added to the test result .. _Defining Dataset Tests: ++++++++++++++++++++++++ Defining Dataset Tests ++++++++++++++++++++++++ - All Insights Tests for the entire dataset need to be defined under `dataset_metric_tests` key. - Dataset metric tests are evaluated against Dataset metrics. - The general structure is as below: .. code-block:: JSON { "test_config": { "dataset_metric_tests": [ { // Each test is defined here }, { // Each test is defined here } ] } } .. _Defining Global Test Tags: ++++++++++++++++++++++++++ Defining Global Test Tags ++++++++++++++++++++++++++ - User can set user-defined, free-form tags for all the tests in the `tags` key. - Both key and value can be any user-defined values of type `string` only. - These tags are then attached to each test and available in each test's `TestResult` via `user_defined_tags` property. - The general structure is as below: .. code-block:: JSON { "test_config": { "tags": { "tag_1": "tag_1_value", "tag_2": "tag_2_value" } } } .. _Defining Predicate-based Tests: ++++++++++++++++++++++++++++++++ Defining Predicate-based Tests ++++++++++++++++++++++++++++++++ - A Predicate-based test is defined in `feature_metric_tests` under `tests` key and in `dataset_metric_tests`. - The general structure is as shown below: .. code-block:: JSON { "test_name": "", "metric_key": "", "threshold_value": "<>", "threshold_source": "REFERENCE", "threshold_metric": "", "tags": { "key_1": "value_1" }, "config": {} } - The details of each of the above properties are described below: .. list-table:: :widths: 20 10 40 30 :header-rows: 1 * - Key - Required - Description - Examples * - test_name - Yes - * Insights-provided Test Name. * Must be one of the following names as defined in section :ref:`List of Available Tests` - `TestGreaterThan` * - metric_key - Yes - * Metric key on which to run the test evaluation. * Each Insights Metric is emitted in a Standard Metric Result format. The metric key must be one of the values in `variable_names` * If a metric has nore than one variables, qualify the metric key with Metric name. * For eg: consider `Quartiles` metric which emits the metric result as. To run test evaluation against lets say `q1` value, `metric_key` = . i.e `Quartiles.q1` .. code-block:: JSON { metric_name: 'Quartiles', variable_names: ['q1', 'q2', 'q3'], // other details omitted for brevity } - `Min`, `Quartiles.q1` * - threshold_value - Yes, if `threshold_metric` is not provided. Otherwise No - * A static user-defined threshold value against which the metric value is compared against * The type of threshold value is dependent on each predicate-based test * For eg: - For `TestIsBetween` user needs to provide a range of values as `[min, max]` - For `TestIsGreaterThan`, user needs to provide a single number value - `100.0`, `[200, 400]` * - threshold_source - No - * Set `threshold_source` to `REFERENCE` to evaluate metric value against the corresponding metric value from reference profile * When this is set, ensure reference profile is made available to the prediction run - Always set to `REFERENCE` * - threshold_metric - Yes, if `threshold_value` is not provided. Otherwise No - * Set `threshold_metric` to evaluate the metric value against another metric. * For eg: if one wants to test whether `Min` metric is greater than the `Mean` metric, set `metric_key` to `Min` and `threshold_metric` to `Mean` * When used in conjunction with `threshold_source` set to `REFERENCE`, the metric value for the metric provided in `threshold_metric` is fetched from reference profile - `Min`, `Quartiles.q1` * - tags - No - * User can set user-defined, free-form tags for a specific test in the `tags` key. * Both key and value can be any user-defined values of type `string` only. * These tags are then attached to the test and available in the test's `TestResult` via `user_defined_tags` property. - .. code-block:: JSON "tags": { "key_1": "value_1" } One can provide multiple tags in the above format. .. _Defining Metric-based Tests: ++++++++++++++++++++++++++++++++ Defining Metric-based Tests ++++++++++++++++++++++++++++++++ - A Metric-based test is defined in `feature_metric_tests` under `tests` key. - The general structure is as shown below: .. code-block:: JSON { "test_name": "", "threshold_value": "<>", "tags": { "key_1": "value_1" } } - The details of each of the above properties are described below: .. list-table:: :widths: 20 10 40 30 :header-rows: 1 * - Key - Required - Description - Examples * - test_name - Yes - * Insights-provided Test Name. * Must be one of the following names as defined in section :ref:`List of Available Tests` - `TestNoNewCategory` * - threshold_value - No - * A static user-defined threshold value against which the metric value is compared against * The type of threshold value is dependent on each predicate-based test * For eg: - For `TestIsComplete` user needs to provide a single number 100.0 - For `TestNoNewCategory`, user needs to provide a list of string values * When `threshold_value` is not provided, the general behavior is to fetch the corresponding metric from reference profile - `100.0`, `[200, 400]` * - tags - No - * User can set user-defined, free-form tags for a specific test in the `tags` key. * Both key and value can be any user-defined values of type `string` only. * These tags are then attached to the test and available in the test's `TestResult` via `user_defined_tags` property. - .. code-block:: JSON "tags": { "key_1": "value_1" } One can provide multiple tags in the above format. .. _List of Available Tests: ------------------------- List of Available Tests ------------------------- .. _List of Predicate-based Tests: ++++++++++++++++++++++++ Predicate-based Tests ++++++++++++++++++++++++ .. list-table:: :widths: 10 40 25 25 :header-rows: 1 * - Test Name - Test Description - Test Configuration - Examples * - TestGreaterThan - * Tests if the left value is greater than or equal to the right value. * Is of the form: `lhs >[=] rhs`, where lhs = left hand side and rhs = right hand side * Both left and right values must be one of int, float or boolean - * `strictly : bool` * When set to True, condition is >=, else condition is > * Default value is false - Tests whether Min metric of a feature >= 7500. .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_value": 7500, "config": { "strictly": true } } Tests whether Min metric of a feature > Median metric of the same feature .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_metric": "Median" } Tests whether Min metric of a feature > p25 i.e Q1 metric of the same feature .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1" } Tests whether Min metric of a feature > p25 i.e Q1 metric of reference profile .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1", "threshold_source": "REFERENCE" } Tests whether RowCount metric > RowCount of reference profile .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "RowCount", "threshold_source": "REFERENCE" } * - TestLessThan - * Tests if the left value is less than or equal to the right value. * Is of the form: `lhs <[=] rhs`, where lhs = left hand side and rhs = right hand side * Both left and right values must be one of int, float or boolean - * `strictly : bool` * When set to true, condition is <=, else condition is < * Default value is false - Tests whether Min metric of a feature <= 7500. .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_value": 7500, "config": { "strictly": true } } Tests whether Min metric of a feature < Median metric of the same feature .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_metric": "Median" } Tests whether Min metric of a feature < p25 i.e Q1 metric of the same feature .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1" } Tests whether Min metric of a feature < p25 i.e Q1 metric of reference profile .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1", "threshold_source": "REFERENCE" } * - TestEqual - * Tests if the left value is equal to the right value. * Is of the form: `lhs == rhs`, where lhs = left hand side and rhs = right hand side * Both left and right values must be one of int, float or boolean - None - Tests whether Min metric of a feature = 7500. .. code-block:: JSON { "test_name": "TestEqual", "metric_key": "Min", "threshold_value": 7500 } * - TestIsBetween - * Tests if a numerical value is between a minimum and maximum value * Is of the form: `min <[=] lhs <[=] max`, where lhs = left hand side, min and max are range of values * lhs must be one of int, float * rhs must be a list of 2 values, each of which must be one of int or float - * `strictly : bool` * When set to true, condition condition is (minimum value <= test value <= maximum value) * When set to false, condition condition is (minimum value < test value < maximum value) * Default value is false - Tests whether Min metric of a feature lies within the range 7500 to 8000 .. code-block:: JSON { "test_name": "TestIsBetween", "metric_key": "Min", "threshold_value": [7500, 8000], "config": { "strictly": true } } * - TestDeviation - * Tests if the deviation between two values is within threshold * Both left and right values must be one of int or float * Right value is fetched from the reference profile for the configured metric - * `deviation_threshold : float` * The threshold value that should be used to compare with * Default value is 0.1 (i.e 10%) - * Suppose Mean metric of a feature in prediction profile is 200.0 and 205.0 in reference profile * Deviation thrershold has been set to 10% or 0.10 * Deviation is calculated as 205.0 - 200.0 / 200.0 = 0.025 i.e 2.5% * Actual deviation is less than the deviation threshold i,e 0.025 < 0.10 .. code-block:: JSON { "test_name": "TestDeviation", "metric_key": "Mean", "config": { "deviation_threshold": 0.10 } } .. _List of Metric-based Tests: ++++++++++++++++++++++++ Metric-based Tests ++++++++++++++++++++++++ .. list-table:: :widths: 10 40 10 15 25 :header-rows: 1 * - Test Name - Test Description - Test Configuration - Metric - Examples * - TestIsComplete - * Tests whether completion percentage of a feature is greater than the threshold value (in percentage) * Threshold value can either be provided via `threshold_value` OR * Validated against the completion % in reference profile - None - `Count` - Tests whether completion percentage of a feature >= 95% i.e 95% of values are non-NaN .. code-block:: JSON { "test_name": "TestIsComplete", "threshold_value": 95.0 } Tests whether completion % of a feature >= completion % of the feature in reference profile .. code-block:: JSON { "test_name": "TestIsComplete" } * - TestIsMatchingInferenceType - * Tests whether all the values in a feature match a data type specified by the threshold value * Threshold value can either be provided via `threshold_value` OR * validated against the inference type in reference profile * Accepted values for `threshold_value`: Integer, String, Float, Boolean, None * Test errors out if `threshold_value` is `None` and no reference profile is provided - None - `TypeMetric` - Tests whether type of a feature is `Integer` .. code-block:: JSON { "test_name": "TestIsMatchingInferenceType", "threshold_value": "Integer" } Tests whether type of a feature matches the type in reference profile .. code-block:: JSON { "test_name": "TestIsMatchingInferenceType" } * - TestIsNegative - * Tests whether all the values in a feature are negative - None - `IsNegative` - .. code-block:: JSON { "test_name": "TestIsNegative" } * - TestIsPositive - * Tests whether all the values in a feature are positive - None - `IsPositive` - .. code-block:: JSON { "test_name": "TestIsPositive" } * - TestIsNonZero - * Tests whether all the values in a feature are non-zero - None - `IsNonZero` - .. code-block:: JSON { "test_name": "TestIsNonZero" } * - TestNoNewCategory - * Tests whether any new categories are found in a feature for a prediction run that are not present in `threshold_value`. * Test status is set to `FAILED` if new category(ies) are present * Threshold value can either be provided via `threshold_value` (must be a list) OR * Validated against the categories found in reference profile * Use the test for only categorical features - None - `TopKFrequentElements` - Tests whether categories in a feature match the `threshold_value` list values .. code-block:: JSON { "test_name": "TestNoNewCategory", "threshold_value": ["cat_a", "cat_b"] } Tests whether categories in a feature match the `values present in reference profile .. code-block:: JSON { "test_name": "TestNoNewCategory" } .. _Test Results: ---------------- Test Results ---------------- In this section, we will describe the test results returned after the test execution. Code is shown as below. .. code-block:: python3 from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader test_results = InsightsConfigReader(config_location=test_config_path) .get_test_builder(test_context=get_test_context()) .build() .run() `test_results` is an instance of type `TestResults` which returns two high-level properties: - Test Summary - List of test result for each configured test .. _Test Summary: +++++++++++++++ Test Summary +++++++++++++++ Test Summary returns the following information about the executed tests. - Count of tests executed - Count of passed tests - Count of failed tests - Count of error tests - Tests error out when the test validation fails or error is encountered during test execution .. _Test Result: ++++++++++++++++++++++++ Test Result ++++++++++++++++++++++++ Each test returns a result in standard format which includes the following properties: .. list-table:: :widths: 10 50 40 :header-rows: 1 * - Key - Description - Example * - name - Name of the test - `TestGreaterThan`, `TestIsPositive` * - description - * Test description in a structured format * For predicate-based tests, the descriptions are structured in the following formats depending on test configuration. - `The of feature is . Test condition : [predicate condition] ` - `The of feature is . of feature is . Test condition : [predicate condition] ` - `The of feature is . of feature is in Reference profile. Test condition is [predicate condition] ` - `The is . is in Reference profile. Test condition is [predicate condition] ` - * The Min of feature feature_1 is 23.45. Test condition : 23.45 >= 4.5 * The Min of feature feature_1 is 23.45. Median of feature feature_1 is 34.5. Test condition : 23.45 >= 34.5 * The Min of feature feature_1 is 23.45. Min of feature feature_1 is 4.5 in Reference profile. Test condition is 23.45 >= 4.5 * The RMSE is 23.45. RMSE is 12.34 in Reference profile. Test condition is 23.45 >= 12.34 * The Min of feature feature_1 is 23.45. Test Condition: 23.45 deviates by +/- 4% from 1.2 * - status - * Each test when executed produces a status which is one of the following: PASSED, FAILED, ERROR * When test passes a given condition, status is set to `PASSED` * When test fails a given condition, status is set to `FAILED` * When test exeuction encounters an error, status is set to `ERROR` - * - Test Assertion Info - Each test returns the `expected` and `actual` information which helps in understanding why a particular passed/failed - * - error - When a test encounters error(s) during its execution, returns an error description - .. _Test Results Grouping: ++++++++++++++++++++++++ Test Results Grouping ++++++++++++++++++++++++ Test Results can be grouped by supported keys. This helps in arranging the tests for easier visualization, processing and sending to downstream components. Currently supported group by keys are: * Feature * Test Status * Test Type Below are some code snippets along with the grouped results .. code-block:: python3 from mlm_insights.tests.constants import FEATURE_TAG_KEY, TEST_TYPE_TAG_KEY, GroupByKey, TEST_STATUS_KEY # Group by features grouped_results_by_features = test_results.group_tests_by(group_by=GroupByKey.FEATURE_TAG_KEY) # Produces the grouped results as : {'feature_1' :[List of Test Result], 'feature_2' :[List of Test Result]} # Group by test status grouped_results_by_status = test_results.group_tests_by(group_by=GroupByKey.TEST_STATUS_KEY) # Produces the grouped results as : {'PASSED' :[List of Test Result], 'FAILED' :[List of Test Result], 'ERROR' :[List of Test Result]}