Initialization
- init ( engine = 'ray' , engine_opts = None , logger = None , loglevel = 30 , cache_directory = None , dataset_cache_enabled = True )
-
Initialize the AutoMLx framework’s execution engine. AutoMLx can work with a variety of parallelization platforms.
- Parameters :
-
-
engine ( str , default='ray' ) –
- Name of the parallelization framework. Can be one of:
-
-
'ray'
: Use ray multiprocessing framework -
'local'
: Use Python’s inbuilt multiprocessing
framework. -
'threading'
: Use Python’s inbuilt multithreading framework. -
-
engine_opts ( dict or None , default=None ) –
- Options for the parallelization framework. When engine is:
-
-
-
'ray'
: a dictionary with the following keys -
-
"n_jobs"
(int
), degree of inter-model
parallelism *
"model_n_jobs"
(int
), the degree of intra-model parallelism *"ray_setup"
(dict
), specifies the arguments to pass to ray.init *"cluster_mode"
(bool
) specifies whether Ray should detect a running cluster on the node and connects to is. Needs to be set both for head and worker nodes. * “enable_object_spilling” ( bool , by default False ), determines if ray object spilling is enabled. If object spilling is enabled and no further object spilling configuration is provided in ray_setup , the object spilling directory is automatically set to the secure AutoMLx caching directory. -
-
-
'local'
:engine_opts
is of the form
{'n_jobs' : val1, 'model_n_jobs' : val2}
, whereval1
is the degree of inter-model parallelism andval2
is the degree of intra-model parallelism. -'threading'
:engine_opts
is of the form{'n_jobs' : val}
, whereval
is the degree of parallelism. -
-
logger ( logging.Logger , str or None , default=None ) –
- Logging mode. One of
-
-
None
: Log to console with specified loglevel (by default
logging.WARNING
). - str : Log to the provided file path and console. -logging.Logger
: Use existingLogger
object. -
-
loglevel ( int or None , default=``logging.WARNING`` ) –
Log level is derived from the python logging module, and adjusts the logging verbosity in the following increasing order:
-
logging.CRITICAL < logging.WARNING < logging.INFO < logging.DEBUG
. -
Set to
None
to avoid any logging initialization and use the
current logging module configuration. - Setting the loglevel here does nothing if the root logger already has handlers configured. The parameter is also ignored if a
logging.Logger
object is passed to thelogger
parameter, or the AutoMLx package has already been configured with a different loglevel. -
-
cache_directory ( str or None , default=None ) –
Cache directory to be used to store intermediate results of AutoMLx.
-
If a path is provided here, the user is responsible for
managing the directory. - If cache_directory is None , the cache is created as a temporary directory and cleaned-up by AutoMLx. - The caching directory location may also be controlled by setting the TMPDIR environment variable, which will serve as a parent directory to the AutoMLx cache (please ensure the environment variable is set before AutoMLx is imported, for example by running your python script as TMPDIR=/path/to/dir python3 run_automlx.py ). - The caching directory is cleared at the end of the execution of the python process or when the AutoMLx engine is explicitly shutdown via automlx.shutdown() . The cache may not be cleared if the process is terminated abruptly (for example, by a SIGTERM event). - If guaranteed cleanup of the temporary files and directories is desired, a cleanup EXIT trap may be utilized. For example, it the AutoMLx cache_directory is set to /tmp/mydir , a cleanup EXIT trap can be defined at the top of a shell script running the AutoMLx python scripts as trap “rm -f /tmp/mydir” EXIT .
-
-
dataset_cache_enabled ( bool , default=True ) – If the dataset cache is enabled, transformed versions of the data may be stored to disk (to the AutoMLx cache directory) to speed-up subsequent transformations of the same data.
-