cv | Determines the cross-validation splitting strategy
Possible inputs for cv are:
- None, to use the default 3-fold cross validation,
- integer, to specify the number of folds in a `(Stratified)KFold`,
- An object to be used as a cross-validation generator
- An iterable yielding train, test splits
For integer/None inputs, if the estimator is a classifier and ``y`` is
either binary or multiclass, :class:`StratifiedKFold` is used. In all
other cases, :class:`KFold` is used
Refer :ref:`User Guide ` for the various
cross-validation strategies that can be used here | default: {"oml-python:serialized_object": "cv_object", "value": {"name": "sklearn.model_selection._split.StratifiedKFold", "parameters": {"n_splits": "2", "random_state": "null", "shuffle": "true"}}} |
error_score | Value to assign to the score if an error occurs in estimator fitting
If set to 'raise', the error is raised. If a numeric value is given,
FitFailedWarning is raised. This parameter does not affect the refit
step, which will always raise the error | default: "raise" |
estimator | A object of that type is instantiated for each grid point
This is assumed to implement the scikit-learn estimator interface
Either estimator needs to provide a ``score`` function,
or ``scoring`` must be passed | default: {"oml-python:serialized_object": "component_reference", "value": {"key": "estimator", "step_name": null}} |
fit_params | Parameters to pass to the fit method
.. deprecated:: 0.19
``fit_params`` as a constructor argument was deprecated in version
0.19 and will be removed in version 0.21. Pass fit parameters to
the ``fit`` method instead | default: null |
iid | If True, the data is assumed to be identically distributed across
the folds, and the loss minimized is the total loss per sample,
and not the mean loss across the folds | default: true |
n_iter | Number of parameter settings that are sampled. n_iter trades
off runtime vs quality of the solution | default: 5 |
n_jobs | Number of jobs to run in parallel | default: 1 |
param_distributions | Dictionary with parameters names (string) as keys and distributions
or lists of parameters to try. Distributions must provide a ``rvs``
method for sampling (such as those from scipy.stats.distributions)
If a list is given, it is sampled uniformly | default: {"bootstrap": [true, false], "criterion": ["gini", "entropy"], "max_depth": [3, null], "max_features": [1, 2, 3, 4], "min_samples_leaf": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "min_samples_split": [2, 3, 4, 5, 6, 7, 8, 9, 10]} |
pre_dispatch | Controls the number of jobs that get dispatched during parallel
execution. Reducing this number can be useful to avoid an
explosion of memory consumption when more jobs get dispatched
than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediately
created and spawned. Use this for lightweight and
fast-running jobs, to avoid delays due to on-demand
spawning of the jobs
- An int, giving the exact number of total jobs that are
spawned
- A string, giving an expression as a function of n_jobs,
as in '2*n_jobs' | default: "2*n_jobs" |
random_state | Pseudo random number generator state used for random uniform sampling
from lists of possible values instead of scipy.stats distributions
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random` | default: null |
refit | Refit an estimator using the best found parameters on the whole
dataset
For multiple metric evaluation, this needs to be a string denoting the
scorer that would be used to find the best parameters for refitting
the estimator at the end
The refitted estimator is made available at the ``best_estimator_``
attribute and permits using ``predict`` directly on this
``RandomizedSearchCV`` instance
Also for multiple metric evaluation, the attributes ``best_index_``,
``best_score_`` and ``best_parameters_`` will only be available if
``refit`` is set and all of them will be determined w.r.t this specific
scorer
See ``scoring`` parameter to know more about multiple metric
evaluation | default: true |
return_train_score | If ``False``, the ``cv_results_`` attribute will not include training
scores
Current default is ``'warn'``, which behaves as ``True`` in addition
to raising a warning when a training score is looked up
That default will be changed to ``False`` in 0.21
Computing training scores is used to get insights on how different
parameter settings impact the overfitting/underfitting trade-off
However computing the scores on the training set can be computationally
expensive and is not strictly required to select the parameters that
yield the best generalization performance. | default: "warn" |
scoring | A single string (see :ref:`scoring_parameter`) or a callable
(see :ref:`scoring`) to evaluate the predictions on the test set
For evaluating multiple metrics, either give a list of (unique) strings
or a dict with names as keys and callables as values
NOTE that when using custom scorers, each scorer should return a single
value. Metric functions returning a list/array of values can be wrapped
into multiple scorers that return one value each
See :ref:`multimetric_grid_search` for an example
If None, the estimator's default scorer (if available) is used | default: null |
verbose | Controls the verbosity: the higher, the more messages | default: 0 |