Metric Component
=============================
Metric components are responsible for calculating all statistical metrics and algorithms
of the data. These includes metrics from different groups like data integrity, summary
or quality, drift score, or performance. Metrics are calculated on the input data and
can produce output which can be scalar, list, or a dictionary. Different metrics might
expect a specific type of input data (for example, the model performance metric needs
the prediction and target column type), or a specific variable type (for example,
regression performance expects prediction and target to be continuous variable types).
They might be incompatible with specific data types (for example, count metric can't
work with string data type) and might require data from different dimensional partitions
(for example, drift detection might need data from a different time frame to generate a score).

While metrics can be quite diverse and fall into different groups based on their
use case, they all use the same interface. In the next sections you see what different
types of metric there are, how to use them, and we will briefly discuss the internal
workings of them.


~~~~~~~~~~~~~~~~~~~~~~~~~
Types of Metric
~~~~~~~~~~~~~~~~~~~~~~~~~
Currently, ML Insights support two distinct types of metric,
the univariate and the dataset metric. Univariate metrics only
expect a single feature as input and provide some form of statistics on them,
for example, sum, mean or kurtosis. Dataset metrics, take more than one input
feature (which can extend to all columns). Some examples for dataset metrics
are a metric that describes the entire data set like the number of rows, or
the data or column type of a dataset or multivariate metric like correlation or model performance.

The types of metric are captured in the MetricDetail class.

.. code-block::

    class MetricDetail:
        univariate_metric: Dict[str, List[MetricMetadata]]
        dataset_metrics: Optional[List[MetricMetadata]]

These metric have to be passed in the proper way for the framework to behave correctly.
The dataset level metric must be passed in the dataset_metric list and the univariate
metric must be passed to the univariate_metric list.

~~~~~~~~~~~~~~~~~~~~~~~~~~
How to use
~~~~~~~~~~~~~~~~~~~~~~~~~~
In this section we see how to construct a metric and pass it to the builder object.

.. hint::

    If no metric is passed to the builder, the builder can automatically process features
    with a specific set of metrics heuristically.
    This is done based on the data type, variable type, and column type of the feature.

#. Import the right metric class we want to use
    .. code-block:: bash

        from mlm_insights.core.metrics.kurtosis import Kurtosis
        from mlm_insights.core.metrics.max import Max
        from mlm_insights.core.metrics.mean import Mean
        from mlm_insights.core.metrics.min import Min
        from mlm_insights.core.metrics.mode import Mode
        from mlm_insights.core.metrics.range import Range
        from mlm_insights.core.metrics.is_quasi_constant_feature import IsQuasiConstantFeature


    Import some needed dependencies
        .. code-block:: bash

            from mlm_insights.core.metrics.metric_metadata import MetricMetadata
            from mlm_insights.builder.builder_component import MetricDetail
#. Construct metricmetadata class for metrics we want to use with proper parameters (if any) and push it in a list
    .. code-block:: bash

        metrics = [
               MetricMetadata(klass=Max),
               MetricMetadata(klass=Min),
               MetricMetadata(klass=Mean),
               MetricMetadata(klass=IsQuasiConstantFeature),
               MetricMetadata(klass=Kurtosis),
               MetricMetadata(klass=Mode),
               MetricMetadata(klass=Range)
              ]
#. Create a dictionary with keys as the features and metrics as the MetricMetadata list (We have taken example features from the iris dataset)
    .. code-block:: bash

        uni_variate_metrics = {
            "sepal length (cm)": metrics,
            "sepal width (cm)": metrics,
            "petal length (cm)": metrics,
            "petal width (cm)": metrics
        }
#. Create the MetricDetail object with the univariate and dataset metric dictionary/list we just created
    .. code-block:: bash

        metric_details = MetricDetail(univariate_metric=uni_variate_metrics,
                                  dataset_metrics=[])
#. Pass on the MetricDetails object to the corresponding api in the builder object
    .. code-block:: bash

        InsightsBuilder().with_metrics(metrics=metric_details)


.. note::

    Note: Instead of creating the actual metrics, we passed on a different construct
    called MetricMetadata. This is because, while the actual logic and code of calculating
    the metric remains with the Metric class, the framework controls the entire lifecycle
    of metric. Hence, based on when a metric needs to be created, the runner object constructs
    the actual metric object by itself.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How metric works
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this section we briefly discuss some important aspects of metrics. This gives some more insight
into how metrics work and how they can scale.

All metrics in ML Insights must fulfill the following functionality:

#. Mergeable - The metrics must be mergeable. Two metrics of the same type calculated on two sets of data (say D1, D2) can be merged to provide a metric that represents the metric score of the combined data (D1 + D2).

#. Serializable  - Metrics should be able to serialise their current state.

#. De-serializable - Metrics can be de-serialized to a previously stored state.

#. Single pass - All metrics are single pass, as the runner reads through the input data only once.

#. Approximate or accurate - Metrics can be approximate or accurate. The type of score a metric produces can be identified from the API documentation.

#. No input data persistence - Metrics do not store subsets of raw data in memory except for the one being processed. If the input data is partitioned into a number of parts, only one part is be in memory at any time.