mlm_insights.core.metrics.datetime_metrics package¶
Submodules¶
mlm_insights.core.metrics.datetime_metrics.common module¶
- mlm_insights.core.metrics.datetime_metrics.common.get_maximum(max_date_1: str, max_date_2: str, errors: str, date_format: str = '') str ¶
- mlm_insights.core.metrics.datetime_metrics.common.get_maximum_date(column: Series, errors: str, date_format: str = '', unit: str = '', origin: str = '') Tuple[str, int] ¶
- mlm_insights.core.metrics.datetime_metrics.common.get_minimum(min_date_1: str, min_date_2: str, errors: str, date_format: str = '') str ¶
- mlm_insights.core.metrics.datetime_metrics.common.get_minimum_date(column: Series, errors: str, date_format: str = '', unit: str = '', origin: str = '') Tuple[str, int] ¶
mlm_insights.core.metrics.datetime_metrics.constants module¶
mlm_insights.core.metrics.datetime_metrics.datetime_duration module¶
- class mlm_insights.core.metrics.datetime_metrics.datetime_duration.DateTimeDuration(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, min_date: str = '', max_date: str = '', unit: str = 's', errors: str = 'coerce', date_format: str = '%Y-%m-%d %H:%M:%S', origin: str = 'unix', duration_unit: str = 'D', valid_duration_units: ~typing.List[str] = <factory>, invalid_rows_count: int = 0)¶
Bases:
MetricBase
Feature Metric to compute the longest duration in terms of min and max date values (MAX - MIN) in a featureIt takes into consideration removing NaN values while computing total countIt is an exact univariate metric which can process only DATETIME & TIMESTAMP data types.Configuration¶
- duration_unit: str
Unit for the output duration. Must be one of Y, M, W, D, h, m, s
Default unit is D i.e. days
- errors: str
Specify how to handle date type in case datetime is non-parsable. Default is “coerce” i.e. treat non-parseable dates as NaT
NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html NOTE: This metric relies on the configuration provided in feature schema date_format: str
Format string for datetime. Default is “%Y-%m-%d %H:%M:%S”
- unit: str
In case input is Timestamp, specify the unit. Default is “s”
- origin: str
In case input is Timestamp, specify the origin. Default is “unix”
Returns¶
- max_duration: float
Longest duration for the date time feature i.e. MAX - MIN
- invalid_rows_count: int
Count of the values which are not valid date times. This includes: missing values, invalid dates
and date values whose format are different from the one specified
- date_format: str
Format string used in output
- unit: str
Unit specified in config
- origin: str
Origin specified in config
- duration_unit: str
Duration unit specified in config
Examples
import pandas as pd from mlm_insights.builder.builder_component import MetricDetail, EngineDetail from mlm_insights.builder.insights_builder import InsightsBuilder from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType from mlm_insights.core.metrics.datetime_metrics.datetime_duration import DateTimeDuration from mlm_insights.core.metrics.metric_metadata import MetricMetadata import pandas as pd def main(): input_schema = { 'date_created': FeatureType( data_type=DataType.DATETIME, variable_type=VariableType.DATETIME, column_type=ColumnType.INPUT, config={'date_format': '%Y-%m-%d %H:%M:%S'}) } data_frame = pd.DataFrame({'date_created': ["2024-08-05", "2025-01-22", "2024-11-10", None]}) metric_details = MetricDetail(univariate_metric= {"date_created": [MetricMetadata(klass=DateTimeDuration, config={CONFIG_DURATION_UNIT_KEY: 'D'})]}, dataset_metrics=[]) runner = InsightsBuilder(). with_input_schema(input_schema). with_data_frame(data_frame=data_frame). with_metrics(metrics=metric_details). with_engine(engine=EngineDetail(engine_name="native")). build() profile_json = runner.run().profile.to_json() feature_metrics = profile_json['feature_metrics'] print(feature_metrics['date_created']["DateTimeDuration"]) if __name__ == "__main__": main() # # Returns the standard metric result as: # { # 'metric_name': 'DateTimeDuration', # 'metric_description': 'Feature Metric to compute the longest duration in terms of min and max date values', # 'variable_count': 5, # 'variable_names': ['max_duration', 'invalid_rows_count', 'date_format', 'unit', 'origin', 'duration_unit], # 'variable_types': ['DATETIME', 'DISCRETE', 'NOMINAL', 'NOMINAL', 'NOMINAL'], # 'variable_dtypes': ['FLOAT', 'INTEGER', 'STRING', 'STRING', 'STRING'], # 'variable_dimensions': [0, 0, 0, 0, 0, 0], # 'metric_data': [12.0, 0, '%Y-%m-%d %H:%M:%S', 's', 'unix', 'D'], # 'metadata': {} # }
- compute(column: Series, **kwargs: Any) None ¶
- Computes the minimum and maximum datetime for the dataset. In case of a partitioned dataset,
computes the minimum and maximum datetime for the specific partition
Parameters¶
- columnpd.Series
Input column.
- classmethod create(config: Dict[str, ConfigParameter] | None = None) DateTimeDuration ¶
Factory Method to create an object. The configuration will be available in config.
Returns¶
An Instance of DateTimeDuration.
- date_format: str = '%Y-%m-%d %H:%M:%S'¶
- duration_unit: str = 'D'¶
- errors: str = 'coerce'¶
- get_result(**kwargs: Any) Dict[str, Any] ¶
Returns minimum DateTimeDuration metric.
Returns¶
string: minimum datetime in specified format.
- get_standard_metric_result(**kwargs: Any) StandardMetricResult ¶
Returns Standard Metric for DateTimeDuration metric.
Returns¶
StandardMetricResult: DateTimeDuration Metric in standard format.
- classmethod get_supported_variable_types() List[VariableType] ¶
Method to retrieve the list of Feature Variable type supported for the metric
Returns¶
List of Feature Variable type supported by the metric
- invalid_rows_count: int = 0¶
- max_date: str = ''¶
- merge(other_metric: DateTimeDuration, **kwargs: Any) DateTimeDuration ¶
Merge two DateTimeDuration metric into one, without mutating the others.
Parameters¶
- other_metricDateTimeDuration
Other DateTimeDuration that need be merged.
Returns¶
- DateTimeDuration
A new instance of DateTimeDuration after merging.
- min_date: str = ''¶
- origin: str = 'unix'¶
- unit: str = 's'¶
- valid_duration_units: List[str]¶
mlm_insights.core.metrics.datetime_metrics.datetime_max module¶
- class mlm_insights.core.metrics.datetime_metrics.datetime_max.DateTimeMax(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, max_date: str = '', unit: str = 's', errors: str = 'coerce', date_format: str = '%Y-%m-%d %H:%M:%S', origin: str = 'unix', invalid_rows_count: int = 0)¶
Bases:
MetricBase
Feature Metric to compute maximum datetime in a columnIt takes into consideration removing NaN values while computing total countIt is an exact univariate metric which can process only DATETIME & TIMESTAMP data types.Configuration¶
- errors: str
Specify how to handle date type in case datetime is non-parsable.
NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html NOTE: This metric relies on the configuration provided in feature schema date_format: str
Format string for datetime, same format will be used in output. Default is “%Y-%m-%d %H:%M:%S”
- unit: str
In case input is Timestamp, specify the unit. Default is “s”
- origin: str
In case input is Timestamp, specify the origin. Default is “unix”
NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Returns¶
- datetime_max: str
Maximum datetime
- invalid_rows_count: int
Count of the values which are not valid date times. This includes: missing values, invalid dates
and date values whose format are different from the one specified
- date_format: str
Format string used in output
- unit: str
Unit specified in config
- origin: str
Origin specified in config
Examples
import pandas as pd from mlm_insights.builder.builder_component import MetricDetail, EngineDetail from mlm_insights.builder.insights_builder import InsightsBuilder from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType from mlm_insights.core.metrics.datetime_metrics.datetime_max import DateTimeMax from mlm_insights.core.metrics.metric_metadata import MetricMetadata import pandas as pd def main(): input_schema = { 'date_created': FeatureType( data_type=DataType.DATETIME, variable_type=VariableType.DATETIME, column_type=ColumnType.INPUT, config={'date_format': '%Y-%m-%d %H:%M:%S'}) } data_frame = pd.DataFrame({'date_created': ["2024-08-05", "2025-01-22", "2024-11-10", None]}) metric_details = MetricDetail(univariate_metric= {"date_created": [MetricMetadata(klass=DateTimeMax)]}, dataset_metrics=[]) runner = InsightsBuilder(). with_input_schema(input_schema). with_data_frame(data_frame=data_frame). with_metrics(metrics=metric_details). with_engine(engine=EngineDetail(engine_name="native")). build() profile_json = runner.run().profile.to_json() feature_metrics = profile_json['feature_metrics'] print(feature_metrics['date_created']["DateTimeMax"]) if __name__ == "__main__": main() # Returns the standard metric result as: # { # 'metric_name': 'DateTimeMax', # 'metric_description': 'Feature Metric to compute maximum date value', # 'variable_count': 4, # 'variable_names': ['datetime_max', 'invalid_rows_count', 'date_format', 'unit', 'origin'], # 'variable_types': ['DATETIME', 'DISCRETE', 'NOMINAL', 'NOMINAL', 'NOMINAL'], # 'variable_dtypes': ['STRING', 'INTEGER', 'STRING', 'STRING', 'STRING'], # 'variable_dimensions': [0, 0, 0, 0, 0], # 'metric_data': ['2025-01-22 00:00:00', 0, '%Y-%m-%d %H:%M:%S', 's', 'unix'], # 'metadata': {} # }
- compute(column: Series, **kwargs: Any) None ¶
Computes the maximum datetime for the dataset. In case of a partitioned dataset, computes the maximum datetime for the specific partition
Parameters¶
- columnpd.Series
Input column.
- classmethod create(config: Dict[str, ConfigParameter] | None = None) DateTimeMax ¶
Factory Method to create an object. The configuration will be available in config.
Returns¶
An Instance of DateTimeMax.
- date_format: str = '%Y-%m-%d %H:%M:%S'¶
- errors: str = 'coerce'¶
- get_result(**kwargs: Any) Dict[str, Any] ¶
Returns maximum DateTimeMax metric.
Returns¶
string: maximum datetime in specified format.
- get_standard_metric_result(**kwargs: Any) StandardMetricResult ¶
Returns Standard Metric for DateTimeMax metric.
Returns¶
StandardMetricResult: DateTimeMax Metric in standard format.
- classmethod get_supported_variable_types() List[VariableType] ¶
Method to retrieve the list of Feature Variable type supported for the metric
Returns¶
List of Feature Variable type supported by the metric
- invalid_rows_count: int = 0¶
- max_date: str = ''¶
- merge(other_metric: DateTimeMax, **kwargs: Any) DateTimeMax ¶
Merge two DateTimeMax metric into one, without mutating the others.
Parameters¶
- other_metricDateTimeMax
Other DateTimeMax that need be merged.
Returns¶
- DateTimeMax
A new instance of DateTimeMax after merging.
- origin: str = 'unix'¶
- unit: str = 's'¶
mlm_insights.core.metrics.datetime_metrics.datetime_min module¶
- class mlm_insights.core.metrics.datetime_metrics.datetime_min.DateTimeMin(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, min_date: str = '', unit: str = 's', errors: str = 'coerce', date_format: str = '%Y-%m-%d %H:%M:%S', origin: str = 'unix', invalid_rows_count: int = 0)¶
Bases:
MetricBase
Feature Metric to compute minimum datetime in a columnIt takes into consideration removing NaN values while computing total countIt is an exact univariate metric which can process only DATETIME & TIMESTAMP data types.Configuration¶
- errors: str
Specify how to handle date type in case datetime is non-parsable.
NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html NOTE: This metric relies on the configuration provided in feature schema date_format: str
Format string for datetime, same format will be used in output. Default is “%Y-%m-%d %H:%M:%S”
- unit: str
In case input is Timestamp, specify the unit. Default is “s”
- origin: str
In case input is Timestamp, specify the origin. Default is “unix”
Returns¶
- datetime_min: str
Minimum datetime
- invalid_rows_count: int
Count of the values which are not valid date times. This includes: missing values, invalid dates
and date values whose format are different from the one specified
- date_format: str
Format string used in output
- unit: str
Unit specified in config
- origin: str
Origin specified in config
Examples
import pandas as pd from mlm_insights.builder.builder_component import MetricDetail, EngineDetail from mlm_insights.builder.insights_builder import InsightsBuilder from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType from mlm_insights.core.metrics.datetime_metrics.datetime_min import DateTimeMin from mlm_insights.core.metrics.metric_metadata import MetricMetadata import pandas as pd def main(): input_schema = { 'date_created': FeatureType( data_type=DataType.DATETIME, variable_type=VariableType.DATETIME, column_type=ColumnType.INPUT, config={'date_format': '%Y-%m-%d %H:%M:%S'}) } data_frame = pd.DataFrame({'date_created': ["2024-08-05", "2025-01-22", "2024-11-10", None]}) metric_details = MetricDetail(univariate_metric= {"date_created": [MetricMetadata(klass=DateTimeMin)]}, dataset_metrics=[]) runner = InsightsBuilder(). with_input_schema(input_schema). with_data_frame(data_frame=data_frame). with_metrics(metrics=metric_details). with_engine(engine=EngineDetail(engine_name="native")). build() profile_json = runner.run().profile.to_json() feature_metrics = profile_json['feature_metrics'] print(feature_metrics['date_created']["DateTimeMin"]) if __name__ == "__main__": main() # # Returns the standard metric result as: # { # 'metric_name': 'DateTimeMin', # 'metric_description': 'Feature Metric to compute minimum date value', # 'variable_count': 4, # 'variable_names': ['datetime_min', 'invalid_rows_count', 'date_format', 'unit', 'origin'], # 'variable_types': ['DATETIME', 'DISCRETE', 'NOMINAL', 'NOMINAL', 'NOMINAL'], # 'variable_dtypes': ['STRING', 'INTEGER', 'STRING', 'STRING', 'STRING'], # 'variable_dimensions': [0, 0, 0, 0, 0], # 'metric_data': ['2024-08-05 00:00:00', 0, '%Y-%m-%d %H:%M:%S', 's', 'unix'], # 'metadata': {} # }
- compute(column: Series, **kwargs: Any) None ¶
Computes the minimum datetime for the dataset. In case of a partitioned dataset, computes the minimum datetime for the specific partition
Parameters¶
- columnpd.Series
Input column.
- classmethod create(config: Dict[str, ConfigParameter] | None = None) DateTimeMin ¶
Factory Method to create an object. The configuration will be available in config.
Returns¶
An Instance of DateTimeMin.
- date_format: str = '%Y-%m-%d %H:%M:%S'¶
- errors: str = 'coerce'¶
- get_result(**kwargs: Any) Dict[str, Any] ¶
Returns minimum DateTimeMin metric.
Returns¶
string: minimum datetime in specified format.
- get_standard_metric_result(**kwargs: Any) StandardMetricResult ¶
Returns Standard Metric for DateTimeMin metric.
Returns¶
StandardMetricResult: DateTimeMin Metric in standard format.
- classmethod get_supported_variable_types() List[VariableType] ¶
Method to retrieve the list of Feature Variable type supported for the metric
Returns¶
List of Feature Variable type supported by the metric
- invalid_rows_count: int = 0¶
- merge(other_metric: DateTimeMin, **kwargs: Any) DateTimeMin ¶
Merge two DateTimeMin metric into one, without mutating the others.
Parameters¶
- other_metricDateTimeMin
Other DateTimeMin that need be merged.
Returns¶
- DateTimeMin
A new instance of DateTimeMin after merging.
- min_date: str = ''¶
- origin: str = 'unix'¶
- unit: str = 's'¶