Release Notes
24.3.0
Features and Improvements
-
New: Added explanations for forecasting predictions! (Comparative Feature Importance)
-
Added a model-agnostic prediction explainer to support forecasting tasks.
-
For any data point within the forecasting horizon, this explainer can indicate which factors increased or decreased the model’s prediction compared to a previous reference point.
-
The generated explanations can be retrieved either with show-in-notebook (designed based on a custom waterfall plot) or as a DataFrame.
-
Bug fixes
-
Fixed a bug in initializing the Ray engine on Mac OS
-
Resolved a bug where plot_forecast would not show y_train, on series without date-time index
-
Fixed a bug that was causing
Pipeline.refit
to always reuse the same dataset given as input toPipeline.train
, when the two functions were called in sequence -
Fixed multiple small bugs to improve reproducibility of results from AutoML.
-
Fixed a race condition occurring when model training jobs are cancelled (e.g., due to time-budgets)
24.2.0
Features and Improvements
-
Added new AutoRecommender class for supporting recommendation tasks
-
Added support for ALS, BPR, ItemKNN, and Trexx models.
-
-
Made enhancements to time-series preprocessing and ML forecasting models to improve running time and quality of forecast on long time-series and those of irregular periodicity/seasonality.
Bug fixes
-
Added option to provide fixed hyperparameters in a model’s search space. A mix of tunable and non-tunable hyperparameters is also supported.
-
Fixed a bug that was occasionally causing AutoMLx to raise an exception in case of small time budgets or long computations.
-
Fixed HyperGD indefinite stall when the parameter space contains a categorical variable with only one possible value
-
Fixed a bug that was incorrectly raising import errors on some missing optional dependencies if they were not installed.
-
Fixed a bug that was causing forecasting on time-series with
int64
index, andauto
cross validation to fail. -
Fixed the set of supported search strategies for hyperparameter optimization
-
Fixed a bug that was causing AutoMLx to select more computationally expensive models in case of small time budgets compared to previous versions.
24.1.0
Features and Improvements
-
Added regression-based ML models for forecasting: ExtratreesForecaster, LGBMForecaster, and XGBForecaster.
-
Added new
train_model
andevaluate_model_quality
functions to simplify AutoML for business users, which feature:-
The ability to accept data as either a path to a CSV file or a pandas dataframe
-
The
train_model
function will automatically identify, tune, train and return the best model it can for the provided data. -
The
evaluate_model_quality
function will return the score of the given model on a new, user-provided dataset.
-
-
Added James-Stein encoding as a default categorical encoder for classification and regression tasks
Bug fixes
-
Fixed a bug that was causing feature selection to fail if any of the feature ranking computations failed, even though only one is required.
-
Fixed a bug that was causing the cache directory cleanup to occasionally fail with the Ray backend engine
-
Fixed a bug in AutoML that caused sub-optimal models to be returned occasionally when
n_algos_tuned > 1
. -
Fixed a bug that caused the SARIMAX forecasting model to converge to a poor configuration and return
NaN
values during model tuning.
Possibly breaking changes
-
Calling
automlx.init(logger=None)
will no longer initialize handlers for python’s root logger. Backward compatibility is achieved by callingautomlx.init(logger="auto")
(default behaviour, equivalent toautomlx.init()
), which will intialize the root logger to log to standard output at the specified loglevel. -
The argument
summary_frame
forAutoForecaster.plot_forecast
was renamed topredictions
.
Miscellaneous
-
version of sktime library was upgraded to 0.24.0.
23.4.1
Bug fixes
-
Fixed incorrect package meta-data.
23.4.0
Features and Improvements
-
Added support for image classification task
-
Added support for Torch vision ResNet and EfficientNet models
-
Image data is lazily loaded from disk
-
-
Added new Ray-based engine
-
Support for single-machine and distributed execution of multiple concurrent jobs/trials
-
Includes utilities to control the AutoMLx temporary caching directory and ray object spilling settings
-
Includes support for caching image dataset transformations to disk. Utilities for controlling the cache directory and related security settings are provided.
-
-
Added new express AutoClassifier, AutoRegressor, AutoForecaster and AutoAnomalyDetector classes – for example, can be imported with
from automlx import AutoClassifier
-
Adaptive sampling is now skipped in AutoML when it is not needed (that is, if feature selection, HPO and threshold tuning are not active).
-
AutoML Pipeline now accepts a
search_strategy
parameter, which determines the search algorithm used by the tuning step. These include all sampling strategies from Optuna, for example, TPEs and NSGA-II. -
Added
ModelBiasMitigator
, a Bias Mitigation tool to help improve a trained model’s fairness metric score. It can be imported withfrom automlx.fairness.bias_mitigation import ModelBiasMitigator
-
Added a new log level,
sensitive_info
(15), which is used to prevent exposing sensitive information in higher log levels (info, warning, etc.) -
Threshold tuning has been improved to scale the prediction probabilities instead of modifying the prediction threshold. This means that model prediction probabilities are more interprettable when threshold tuning is enabled.
-
The time budget for individual AutoML steps can now be controlled by passing a dictionary with the budget for each individual step to the AutoML
time_budget
argument. -
Added support for TreeSHAP, which provides fast local feature importance explanations for tree-based models.
-
Added install options automlx[classic], automlx[explain] alongside automlx[forecasting], automlx[onnx], automlx[deep-learning], and automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz].
-
Added enhancements to speed up adaptive sampling.
-
Improvements (for example, lazy loading and prolific, intelligent sampling) to enable AutoML to run on very large datasets (for example, one billion rows) for classification and regression.
-
Enhanced local feature importance to compute explanations in parallel when multiple rows are explained together.
Bug fixes
-
Fixed a bug in SVC and LinearSVC models that caused prediction probabilities (but not predicted labels) to change depending on the rows passed to
Pipeline.predict_proba
. -
The AutoML Pipeline now raises a warning instead of automatically dropping slow models for large datasets, if the user explicitly passes them into the
model_list
argument. -
Fixed a bug in the local feature importance and counterfactual explainers, ensuring target labels can be passed as strings as well as integers.
-
Addressed a bug related to the rendering of
ipywidgets
that prevented some explainer visualizations from loading.
Possibly breaking changes
-
The AutoMLx Package now needs to be imported as
import automlx
instead ofimport automl
-
Removed support for the following deprecated items:
-
Internal (never-documented) attributes of the AutoML pipeline.
-
The dask and spark execution engines and related options.
-
The ModelTune interface.
-
All Pipeline attributes matching
*_trials_
, which contain information about the trials performed by the AutoML pipeline. These are replaced by two new dataframe attributescompleted_trials_summary_
andcompleted_trials_detailed_
,. -
AutoML optimization levels 1 and 2.
-
The Pipeline attribute
selected_features_
. Instead, users should useselected_features_names_
orselected_features_names_raw_
to access the names of the selected engineered or original features, respectively.
-
-
ONNX conversion:
-
ONNX models produced from Pipeline objects now take as input a dictionary of Numpy arrays instead of a single tensor. Every array is an input column from the prediction dataframe
-
-
The
y
argument within theexplain_prediction
method of thetabular explainer
is deprecated.
23.2.3
Possibly breaking changes
-
The automlx package has been renamed to “oracle-automlx”. You can still import the package with
import automl
; however, you will need to install it aspip install oracle-automlx
.
23.2.2
Bug fixes
-
Fixed a bug that was causing logging messages to be written to stderr rather than stdout by default
23.2.1
Features and Improvements
-
Added install options automlx[forecasting], automlx[onnx], and automlx[deep-learning] alongside automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz].
Bug fixes
-
Fixed bug where ETSForecaster could fail the entire pipeline when it fails to convergence.
-
Fixed bug which causes pipeline to set forecast horizon to zero when forecasting short length time series (less than 8 datapoints).
-
Fixed bug which could cause model fit failure for some Seasonal Decompose (e.g., STL) models for series which have short length (less than 3 times seasonality period).
-
Fixed bug where BoxCox transformer could produce NaNs as the result of inverse transformation.
-
Fixed a bug that caused the advanced feature importance sampling strategies to raise an exception.
Possibly breaking changes
-
Deep-learning models for classification (TorchMLPClassifier, CatboostClassifier, TabNetClassifier), regression (TorchMLPRegressor) and anomaly detection (AutoEncoderOD) now require install option automlx[deep-learning].
-
If a logger is not pre-initialized or a loglevel is not explicitly stated in init(), then we will log to stderr as is the default behavior in the logging module of Python Standard Library.
-
Changed the initialization of the logging module to:
-
no longer log to file by default;
-
not overwrite the global logging configuration if it was already setup.
-
23.2.0
Features and Improvements
-
Added support for TabNet classifier.
-
Training TabNet with CPUs is slow, so it is disabled by default until GPU support is added.
-
To enable TabNet, add ‘TabNetClassifier’ to the
model_list
when initializing the AutoML Pipeline.
-
-
New counterfactual Explainer (ACE)
-
Added the AutoMLx Counterfactual Explainer (ACE) for classification and anomaly detection tasks.
-
ACE is faster and finds more valid counterfactuals than DiCE.
-
It guarantees to find a counterfactual for each query instance if the reference dataset set contains an example with the desired class.
-
-
Fairness Feature Importance is now available for tabular datasets!
MLExplainer
has a newexplain_model_fairness()
function to compute global feature importance attributions for fairness metrics. -
Added threshold tuning for binary and multi-class classification tasks. Threshold Tuning can be enabled by passing
threshold_tuning=True
to the Pipeline object when it is created. -
Python 3.10 support added.
Deprecations
-
Removed support for Uber Orbit forecaster due to in-built bayesian inference engine instability.
-
Added deprecation warnings to objects that will be removed or replaced in 23.4.0.
-
Deprecations include:
-
Internal (never-documented) attributes of the AutoML pipeline.
-
The dask and spark execution engines and related options.
-
The ModelTune interface. Similar functionality can be achieved by using the AutoML pipeline and disabling all stages except the tuning stage.
-
All Pipeline attributes matching
*_trials_
, which contain information about the trials performed by the AutoML pipeline. These will be replaced by two new dataframe attributescompleted_trials_summary_
andcompleted_trials_detailed_
,. -
AutoML optimization levels 1 and 2.
-
The Pipeline attribute
selected_features_
. Instead, users should useselected_features_names_
orselected_features_names_raw_
to access the names of the selected engineered or raw features, respectively.
-
-
-
Deprecation warnings can be suppressed using
from automl import init; init(check_deprecation_warnings=False)
Miscellaneous
-
Bump packages
-
fbprophet==0.7.1 to prophet==1.1.2
-
torch to 1.13.1
-
onnx to 1.12.0
-
onnxruntime to 1.12.1
-
Possibly breaking changes
-
score_metric
is no longer accepted in theMLExplainer
factory function. It is now an optional argument to theTabularExplainer
’sexplain_model
andexplain_model_fairness
methods.
23.1.1
Features and Improvements
-
Unsupervised anomaly detection
-
Implemented N-1 experts for hyperparameter tuning
-
Added N-1 experts-based contamination factor identification
-
-
Overhauled package documentation
Bug fixes
-
Fixed a bug in feature importance explainers for when the dataset contains feature names that are numpy integers and an AutoML pipeline is being explained.
23.1.0
Features and Improvements
-
Fairness metrics are now available to measure bias in both datasets and trained models. Fairness metrics can be imported from
automl.fairness.metrics
. -
Explanations can now be computed from custom user-defined metrics.
-
Introduced
max_tuning_trials
option that controls maximum HPO trials per algorithm. -
New explainer (Counterfactual)
-
Added a model-agnostic counterfactual explainer for classification, regression, and anomaly detection tasks.
-
The explainer can find diverse counterfactuals for the desired prediction, while the user is able to choose which features to vary and their permitted range.
-
Counterfactual explanations can be visualized either with What-if explainer or dataframe.
-
-
Added support of surrogate explainer for local text explanation.
-
Code updated to comply with security checks with Python Bandit.
-
Added catboost as a new classification model.
Bug fixes
-
Fixed a bug on LIME’s explanation Bar Chart where annotations were misplaced for dataset stringified integers feature names.
-
Fixed a bug where features would be placed incorrectly on plots’ axis when trying to visualize explanations for categorical features.
-
Deleted internal state to reduce memory consumption in explanations
-
Fixed a bug where dataset downcasting to
int32
andfloat32
was only applied during training but not for doing the final fit or collecting predictions. -
Preprocessing of
datetime
columns is now much faster. -
Fixed a bug where dependencies of automl would on import initialize a rootLogger preventing subsequent applications from using
logging.basicConfig()
. -
Fixed a bug where the AutoTune step would override default params even if it did not find any better params than the default ones.
-
Propagated dataset downcasting to all relevant pipeline stages, potentially reducing memory consumption for very large datasets.
-
Changed AutoTune behavior to consider using default hyperparameters scored at the end of feature selection step if they performed better than those AutoTune tried within timebudget. .
Deprecations
-
Added deprecation warnings for the following:
-
Some attributes in the pipeline that are not publicly documented.
-
Attributes of the pipeline containing trial information, which were renamed to
completed_trials_summary_
andcompleted_trials_detailed_
. Thestage
column is renamed tostep
. -
Optimization levels of 1 and 2.
-
Dask and spark engines and engine options.
-
The ModelTune class.
-
-
To disable the warnings:
-
In the initialization, set the argument
check_deprecation_warnings
to False.
-
22.4.2
Features and Improvements
-
Added support for explaining selected features in local and global permutation importance, as well as automatically detecting which features were selected by an AutoML model.
Bug fixes
-
Fixed a bug in local perturbation-based feature attribution explainers for the
n_iter='auto'
option that caused the iterations to be set too high. -
Enhanced performance of local feature importance explainers to improve running times by batching inference calls together.
22.4.1
Features and Improvements
-
Pipeline now accepts a
min_class_instances
input argument to manually specify the number of examples every class must have when doing classification. The value formin_class_instances
must be at least 2.
Bug fixes
-
Fixed a bug where IPython and ipywidgets are not properly guarded as an optional dependencies which make them required.
-
Fixed a bug introduced by last dependency update which caused fbprophet to not produce forecasts with correct index type, when fbprophet was installed manually.
22.4.0
Features and Improvements
-
New feature dependence explainers
-
Added an Accumulated Local Effects (ALE) explainer
-
ALE explanations can be computed for up to two features if at least one is not categorical.
-
-
New explainer (What-IF)
-
Added a What-IF explainer for classification and regression tasks
-
What-IF explanations include exploration of the behavior of an ML model on a single sample as well as on the entire dataset.
-
Sample exploration (edit a sample value and see how the model predictions changes) and relationships’ visualization (how a feature is related to predictions or other features) are supported.
-
-
New feature importance aggregators
-
Added ALFI (Aggregate Local Feature Importance) that gives a visual summary of multiple local explanations.
-
-
New local feature importance explainer
-
Added support for surrogate-based (LIME+) local feature importance explainers
-
Bug fixes
-
Import failure due to CUDA: The package no longer crashes when imported on a machine with CUDA installed.
-
Fixed a bug where
TorchMLPClassifier
would fail when trying to predict a single instance. -
Fixed a bug where
OracleAutoMLx_Forecasting.ipynb
would fail if visualization packages were not already installed. -
Fixed a bug that caused the pipeline.transform to raise an exception if a single row was passed.
-
Explanation documentation
-
Our documentation website ( http://automl.oraclecorp.com/ ) now includes documentation for the explanation objects returned by our explainers.
-
-
Enhanced performance of local feature importance explainers to address long running times.
-
Improved visualization of facet for the columns with cardinality equal to 1 by selecting the bars’ width and pads properly.
22.3.0
Features and Improvements
-
New Explainer
-
Added support for KernelSHAP (a new feature importance tabulator), which provides fast approximations for the Shapley feature importance method.
-
-
Support ARM architecture (
aarch64
)-
Released platform-specific wheel file for ARM machines.
-
Miscellaneous
-
Clarified documentation on the accepted data formats for input datasets and added a more meaningful corresponding error message.
22.2.0
Features and Improvements
-
New profiler
-
Profiler tracks CPU and memory utilization
-
-
Timeseries forecasting pipeline
-
Added the support for multivariate datasets
-
Added the support for exogenous variables
-
Enhanced heteroskedasticity detection technique
-
Applied Box-Cox transform-inverse_transform with params determined via MLE to handle heteroskedasticity
-
-
Explainers / MLX integration
-
New global text explainer
-
Added support
-
-
New feature importance attribution explainers
-
Added several local and global feature importance explainers, including permutation importance, exactly Shapley, and SHAP-PI.
-
The explainers support for classification, regression and anomaly detection
-
The explainers can also be configured to explain the importance of features to any model (explanation_type=’observational’) as well as for a particular model (explanation_type=’interventional’).
-
Observational explanations are supported for all tasks; interventional explanations are only supported for classification and regression.
-
-
New feature dependence explainers
-
Added a partial dependence plot (PDP) and individual conditional expectations (ICE) explainer
-
PDP explanations include visualization support for up to 4 dimensions. PDPs in higher dimension can be returned as dataframes.
-
-
-
Unsupervised Anomaly Detection
-
Added N-1 Experts: a new experimental metric for UAD Model Selection
-
-
Documentation
-
Added the description of
init
function of the automl to documentation -
Cleaned up documentation for more consistency among different sections and added cross-references
-
Bug fixes
-
Timeseries forecasting pipeline
-
Statsmodel exception for some frequencies, users are now able to pass in timeperiod as a parameter
-
-
Preprocessing
-
Datetime preprocessor
-
Fixed the bug regarding column expansion and None/Null/Nan values
-
-
Standard preprocessor refitting
-
The standard preprocessor used to first be fit on a subsample of the training set, and then re-fit at the very end of the pipeline using the full training set. This occasionally resulted in a different number of engineered features being produced. As a result, the features identified during the model selection module could no longer exist. The standard preprocessor is now fit only once.
-
-
-
ONNX predictions inconsistency
-
Changed the ONNX conversion function to reduce the difference between the ONNX dumped model and the original pipeline object predictions
-
Improved ONNX conversion runtime
-
ONNX conversion now only requires a sample from the training or test set as input. This sample is used to infer the final types and shapes
-
Possibly breaking changes
-
Removed matplotlib as a dependency of the AutoMLx package
-
Forecasting predictions can now instead be visualized only using plotly using the same interface as before, automl.utils.plot_forecast. The alternate visualizations that were provided with plotly using automl.utils.plot_forecast_interactive has been removed.
-
-
Updated the AutoMLx package dependencies
-
All dependency versions have been reviewed and updated to address all known CVEs
-
A few unneeded dependencies have also been removed.
-