# Release Notes ## 24.3.0 ### Features and Improvements * New: Added explanations for forecasting predictions! (Comparative Feature Importance) * Added a model-agnostic prediction explainer to support forecasting tasks. * For any data point within the forecasting horizon, this explainer can indicate which factors increased or decreased the model's prediction compared to a previous reference point. * The generated explanations can be retrieved either with show-in-notebook (designed based on a custom waterfall plot) or as a DataFrame. ### Bug fixes * Fixed a bug in initializing the Ray engine on Mac OS * Resolved a bug where plot_forecast would not show y_train, on series without date-time index * Fixed a bug that was causing `Pipeline.refit` to always reuse the same dataset given as input to `Pipeline.train`, when the two functions were called in sequence * Fixed multiple small bugs to improve reproducibility of results from AutoML. * Fixed a race condition occurring when model training jobs are cancelled (e.g., due to time-budgets) ## 24.2.0 ### Features and Improvements * Added new AutoRecommender class for supporting recommendation tasks * Added support for ALS, BPR, ItemKNN, and Trexx models. * Made enhancements to time-series preprocessing and ML forecasting models to improve running time and quality of forecast on long time-series and those of irregular periodicity/seasonality. ### Bug fixes * Added option to provide fixed hyperparameters in a model's search space. A mix of tunable and non-tunable hyperparameters is also supported. * Fixed a bug that was occasionally causing AutoMLx to raise an exception in case of small time budgets or long computations. * Fixed HyperGD indefinite stall when the parameter space contains a categorical variable with only one possible value * Fixed a bug that was incorrectly raising import errors on some missing optional dependencies if they were not installed. * Fixed a bug that was causing forecasting on time-series with `int64` index, and `auto` cross validation to fail. * Fixed the set of supported search strategies for hyperparameter optimization * Fixed a bug that was causing AutoMLx to select more computationally expensive models in case of small time budgets compared to previous versions. ## 24.1.0 ### Features and Improvements * Added regression-based ML models for forecasting: ExtratreesForecaster, LGBMForecaster, and XGBForecaster. * Added new `train_model` and `evaluate_model_quality` functions to simplify AutoML for business users, which feature: * The ability to accept data as either a path to a CSV file or a pandas dataframe * The `train_model` function will automatically identify, tune, train and return the best model it can for the provided data. * The `evaluate_model_quality` function will return the score of the given model on a new, user-provided dataset. * Added James-Stein encoding as a default categorical encoder for classification and regression tasks ### Bug fixes * Fixed a bug that was causing feature selection to fail if any of the feature ranking computations failed, even though only one is required. * Fixed a bug that was causing the cache directory cleanup to occasionally fail with the Ray backend engine * Fixed a bug in AutoML that caused sub-optimal models to be returned occasionally when `n_algos_tuned > 1`. * Fixed a bug that caused the SARIMAX forecasting model to converge to a poor configuration and return `NaN` values during model tuning. #### Possibly breaking changes * Calling `automlx.init(logger=None)` will no longer initialize handlers for python's root logger. Backward compatibility is achieved by calling `automlx.init(logger="auto")` (default behaviour, equivalent to `automlx.init()`), which will intialize the root logger to log to standard output at the specified loglevel. * The argument `summary_frame` for `AutoForecaster.plot_forecast` was renamed to `predictions`. ### Miscellaneous * version of sktime library was upgraded to 0.24.0. ## 23.4.1 ### Bug fixes * Fixed incorrect package meta-data. ## 23.4.0 ### Features and Improvements * Added support for image classification task * Added support for Torch vision ResNet and EfficientNet models * Image data is lazily loaded from disk * Added new Ray-based engine * Support for single-machine and distributed execution of multiple concurrent jobs/trials * Includes utilities to control the AutoMLx temporary caching directory and ray object spilling settings * Includes support for caching image dataset transformations to disk. Utilities for controlling the cache directory and related security settings are provided. * Added new express AutoClassifier, AutoRegressor, AutoForecaster and AutoAnomalyDetector classes – for example, can be imported with `from automlx import AutoClassifier` * Adaptive sampling is now skipped in AutoML when it is not needed (that is, if feature selection, HPO and threshold tuning are not active). * AutoML Pipeline now accepts a `search_strategy` parameter, which determines the search algorithm used by the tuning step. These include all sampling strategies from Optuna, for example, TPEs and NSGA-II. * Added `ModelBiasMitigator`, a Bias Mitigation tool to help improve a trained model’s fairness metric score. It can be imported with `from automlx.fairness.bias_mitigation import ModelBiasMitigator` * Added a new log level, `sensitive_info` (15), which is used to prevent exposing sensitive information in higher log levels (info, warning, etc.) * Threshold tuning has been improved to scale the prediction probabilities instead of modifying the prediction threshold. This means that model prediction probabilities are more interprettable when threshold tuning is enabled. * The time budget for individual AutoML steps can now be controlled by passing a dictionary with the budget for each individual step to the AutoML `time_budget` argument. * Added support for TreeSHAP, which provides fast local feature importance explanations for tree-based models. * Added install options automlx[classic], automlx[explain] alongside automlx[forecasting], automlx[onnx], automlx[deep-learning], and automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz]. * Added enhancements to speed up adaptive sampling. * Improvements (for example, lazy loading and prolific, intelligent sampling) to enable AutoML to run on very large datasets (for example, one billion rows) for classification and regression. * Enhanced local feature importance to compute explanations in parallel when multiple rows are explained together. ### Bug fixes * Fixed a bug in SVC and LinearSVC models that caused prediction probabilities (but not predicted labels) to change depending on the rows passed to `Pipeline.predict_proba`. * The AutoML Pipeline now raises a warning instead of automatically dropping slow models for large datasets, if the user explicitly passes them into the `model_list` argument. * Fixed a bug in the local feature importance and counterfactual explainers, ensuring target labels can be passed as strings as well as integers. * Addressed a bug related to the rendering of `ipywidgets` that prevented some explainer visualizations from loading. ### Possibly breaking changes * The AutoMLx Package now needs to be imported as `import automlx` instead of `import automl` * Removed support for the following deprecated items: * Internal (never-documented) attributes of the AutoML pipeline. * The dask and spark execution engines and related options. * The ModelTune interface. * All Pipeline attributes matching `*_trials_`, which contain information about the trials performed by the AutoML pipeline. These are replaced by two new dataframe attributes `completed_trials_summary_` and `completed_trials_detailed_`,. * AutoML optimization levels 1 and 2. * The Pipeline attribute `selected_features_`. Instead, users should use `selected_features_names_` or `selected_features_names_raw_` to access the names of the selected engineered or original features, respectively. * ONNX conversion: * ONNX models produced from Pipeline objects now take as input a dictionary of Numpy arrays instead of a single tensor. Every array is an input column from the prediction dataframe * The `y` argument within the `explain_prediction` method of the `tabular explainer` is deprecated. ## 23.2.3 ### Possibly breaking changes * The automlx package has been renamed to “oracle-automlx”. You can still import the package with `import automl`; however, you will need to install it as `pip install oracle-automlx`. ## 23.2.2 ### Bug fixes * Fixed a bug that was causing logging messages to be written to stderr rather than stdout by default ## 23.2.1 ### Features and Improvements * Added install options automlx[forecasting], automlx[onnx], and automlx[deep-learning] alongside automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz]. ### Bug fixes * Fixed bug where ETSForecaster could fail the entire pipeline when it fails to convergence. * Fixed bug which causes pipeline to set forecast horizon to zero when forecasting short length time series (less than 8 datapoints). * Fixed bug which could cause model fit failure for some Seasonal Decompose (e.g., STL) models for series which have short length (less than 3 times seasonality period). * Fixed bug where BoxCox transformer could produce NaNs as the result of inverse transformation. * Fixed a bug that caused the advanced feature importance sampling strategies to raise an exception. ### Possibly breaking changes * Deep-learning models for classification (TorchMLPClassifier, CatboostClassifier, TabNetClassifier), regression (TorchMLPRegressor) and anomaly detection (AutoEncoderOD) now require install option automlx[deep-learning]. * If a logger is not pre-initialized or a loglevel is not explicitly stated in init(), then we will log to stderr as is the default behavior in the logging module of Python Standard Library. * Changed the initialization of the logging module to: * no longer log to file by default; * not overwrite the global logging configuration if it was already setup. ## 23.2.0 ### Features and Improvements * Added support for TabNet classifier. * Training TabNet with CPUs is slow, so it is disabled by default until GPU support is added. * To enable TabNet, add 'TabNetClassifier' to the `model_list` when initializing the AutoML Pipeline. * New counterfactual Explainer (ACE) * Added the AutoMLx Counterfactual Explainer (ACE) for classification and anomaly detection tasks. * ACE is faster and finds more valid counterfactuals than DiCE. * It guarantees to find a counterfactual for each query instance if the reference dataset set contains an example with the desired class. * Fairness Feature Importance is now available for tabular datasets! `MLExplainer` has a new `explain_model_fairness()` function to compute global feature importance attributions for fairness metrics. * Added threshold tuning for binary and multi-class classification tasks. Threshold Tuning can be enabled by passing `threshold_tuning=True` to the Pipeline object when it is created. * Python 3.10 support added. ### Deprecations * Removed support for Uber Orbit forecaster due to in-built bayesian inference engine instability. * Added deprecation warnings to objects that will be removed or replaced in 23.4.0. * Deprecations include: * Internal (never-documented) attributes of the AutoML pipeline. * The dask and spark execution engines and related options. * The ModelTune interface. Similar functionality can be achieved by using the AutoML pipeline and disabling all stages except the tuning stage. * All Pipeline attributes matching `*_trials_`, which contain information about the trials performed by the AutoML pipeline. These will be replaced by two new dataframe attributes `completed_trials_summary_` and `completed_trials_detailed_`,. * AutoML optimization levels 1 and 2. * The Pipeline attribute `selected_features_`. Instead, users should use `selected_features_names_` or `selected_features_names_raw_` to access the names of the selected engineered or raw features, respectively. * Deprecation warnings can be suppressed using `from automl import init; init(check_deprecation_warnings=False)` ### Miscellaneous * Bump packages * fbprophet==0.7.1 to prophet==1.1.2 * torch to 1.13.1 * onnx to 1.12.0 * onnxruntime to 1.12.1 ### Possibly breaking changes * `score_metric` is no longer accepted in the `MLExplainer` factory function. It is now an optional argument to the `TabularExplainer`'s `explain_model` and `explain_model_fairness` methods. ## 23.1.1 ### Features and Improvements * Unsupervised anomaly detection * Implemented N-1 experts for hyperparameter tuning * Added N-1 experts-based contamination factor identification * Overhauled package documentation ### Bug fixes * Fixed a bug in feature importance explainers for when the dataset contains feature names that are numpy integers and an AutoML pipeline is being explained. ## 23.1.0 ### Features and Improvements * Fairness metrics are now available to measure bias in both datasets and trained models. Fairness metrics can be imported from `automl.fairness.metrics`. * Explanations can now be computed from custom user-defined metrics. * Introduced `max_tuning_trials` option that controls maximum HPO trials per algorithm. * New explainer (Counterfactual) * Added a model-agnostic counterfactual explainer for classification, regression, and anomaly detection tasks. * The explainer can find diverse counterfactuals for the desired prediction, while the user is able to choose which features to vary and their permitted range. * Counterfactual explanations can be visualized either with What-if explainer or dataframe. * Added support of surrogate explainer for local text explanation. * Code updated to comply with security checks with Python Bandit. * Added catboost as a new classification model. ### Bug fixes * Fixed a bug on LIME's explanation Bar Chart where annotations were misplaced for dataset stringified integers feature names. * Fixed a bug where features would be placed incorrectly on plots' axis when trying to visualize explanations for categorical features. * Deleted internal state to reduce memory consumption in explanations * Fixed a bug where dataset downcasting to `int32` and `float32` was only applied during training but not for doing the final fit or collecting predictions. * Preprocessing of `datetime` columns is now much faster. * Fixed a bug where dependencies of automl would on import initialize a rootLogger preventing subsequent applications from using `logging.basicConfig()`. * Fixed a bug where the AutoTune step would override default params even if it did not find any better params than the default ones. * Propagated dataset downcasting to all relevant pipeline stages, potentially reducing memory consumption for very large datasets. * Changed AutoTune behavior to consider using default hyperparameters scored at the end of feature selection step if they performed better than those AutoTune tried within timebudget. . ### Deprecations * Added deprecation warnings for the following: * Some attributes in the pipeline that are not publicly documented. * Attributes of the pipeline containing trial information, which were renamed to `completed_trials_summary_` and `completed_trials_detailed_`. The `stage` column is renamed to `step`. * Optimization levels of 1 and 2. * Dask and spark engines and engine options. * The ModelTune class. * To disable the warnings: * In the initialization, set the argument `check_deprecation_warnings` to False. ## 22.4.2 ### Features and Improvements * Added support for explaining selected features in local and global permutation importance, as well as automatically detecting which features were selected by an AutoML model. ### Bug fixes * Fixed a bug in local perturbation-based feature attribution explainers for the `n_iter='auto'` option that caused the iterations to be set too high. * Enhanced performance of local feature importance explainers to improve running times by batching inference calls together. ## 22.4.1 ### Features and Improvements * Pipeline now accepts a ``min_class_instances`` input argument to manually specify the number of examples every class must have when doing classification. The value for ``min_class_instances`` must be at least 2. ### Bug fixes * Fixed a bug where IPython and ipywidgets are not properly guarded as an optional dependencies which make them required. * Fixed a bug introduced by last dependency update which caused fbprophet to not produce forecasts with correct index type, when fbprophet was installed manually. ## 22.4.0 ### Features and Improvements * New feature dependence explainers * Added an Accumulated Local Effects (ALE) explainer * ALE explanations can be computed for up to two features if at least one is not categorical. * New explainer (What-IF) * Added a What-IF explainer for classification and regression tasks * What-IF explanations include exploration of the behavior of an ML model on a single sample as well as on the entire dataset. * Sample exploration (edit a sample value and see how the model predictions changes) and relationships' visualization (how a feature is related to predictions or other features) are supported. * New feature importance aggregators * Added ALFI (Aggregate Local Feature Importance) that gives a visual summary of multiple local explanations. * New local feature importance explainer * Added support for surrogate-based (LIME+) local feature importance explainers ### Bug fixes * Import failure due to CUDA: The package no longer crashes when imported on a machine with CUDA installed. * Fixed a bug where `TorchMLPClassifier` would fail when trying to predict a single instance. * Fixed a bug where `OracleAutoMLx_Forecasting.ipynb` would fail if visualization packages were not already installed. * Fixed a bug that caused the pipeline.transform to raise an exception if a single row was passed. * Explanation documentation * Our documentation website () now includes documentation for the explanation objects returned by our explainers. * Enhanced performance of local feature importance explainers to address long running times. * Improved visualization of facet for the columns with cardinality equal to 1 by selecting the bars' width and pads properly. ## 22.3.0 ### Features and Improvements * New Explainer * Added support for KernelSHAP (a new feature importance tabulator), which provides fast approximations for the Shapley feature importance method. * Support ARM architecture (`aarch64`) * Released platform-specific wheel file for ARM machines. ### Miscellaneous * Clarified documentation on the accepted data formats for input datasets and added a more meaningful corresponding error message. ## 22.2.0 ### Features and Improvements * New profiler * Profiler tracks CPU and memory utilization * Timeseries forecasting pipeline * Added the support for multivariate datasets * Added the support for exogenous variables * Enhanced heteroskedasticity detection technique * Applied Box-Cox transform-inverse_transform with params determined via MLE to handle heteroskedasticity * Explainers / MLX integration * New global text explainer * Added support * New feature importance attribution explainers * Added several local and global feature importance explainers, including permutation importance, exactly Shapley, and SHAP-PI. * The explainers support for classification, regression and anomaly detection * The explainers can also be configured to explain the importance of features to any model (explanation_type='observational') as well as for a particular model (explanation_type='interventional'). * Observational explanations are supported for all tasks; interventional explanations are only supported for classification and regression. * New feature dependence explainers * Added a partial dependence plot (PDP) and individual conditional expectations (ICE) explainer * PDP explanations include visualization support for up to 4 dimensions. PDPs in higher dimension can be returned as dataframes. * Unsupervised Anomaly Detection * Added N-1 Experts: a new experimental metric for UAD Model Selection * Documentation * Added the description of `init` function of the automl to documentation * Cleaned up documentation for more consistency among different sections and added cross-references ### Bug fixes * Timeseries forecasting pipeline * Statsmodel exception for some frequencies, users are now able to pass in timeperiod as a parameter * Preprocessing * Datetime preprocessor * Fixed the bug regarding column expansion and None/Null/Nan values * Standard preprocessor refitting * The standard preprocessor used to first be fit on a subsample of the training set, and then re-fit at the very end of the pipeline using the full training set. This occasionally resulted in a different number of engineered features being produced. As a result, the features identified during the model selection module could no longer exist. The standard preprocessor is now fit only once. * ONNX predictions inconsistency * Changed the ONNX conversion function to reduce the difference between the ONNX dumped model and the original pipeline object predictions * Improved ONNX conversion runtime * ONNX conversion now only requires a sample from the training or test set as input. This sample is used to infer the final types and shapes ### Possibly breaking changes * Removed matplotlib as a dependency of the AutoMLx package * Forecasting predictions can now instead be visualized only using plotly using the same interface as before, automl.utils.plot_forecast. The alternate visualizations that were provided with plotly using automl.utils.plot_forecast_interactive has been removed. * Updated the AutoMLx package dependencies * All dependency versions have been reviewed and updated to address all known CVEs * A few unneeded dependencies have also been removed.