cv | Determines the cross-validation splitting strategy
Possible inputs for cv are:
- None, to use the default 5-fold cross validation,
- integer, to specify the number of folds in a `(Stratified)KFold`,
- :term:`CV splitter`,
- An iterable yielding (train, test) splits as arrays of indices
For integer/None inputs, if the estimator is a classifier and ``y`` is
either binary or multiclass, :class:`StratifiedKFold` is used. In all
other cases, :class:`KFold` is used
Refer :ref:`User Guide ` for the various
cross-validation strategies that can be used here
.. versionchanged:: 0.22
``cv`` default value if None changed from 3-fold to 5-fold | default: {"oml-python:serialized_object": "cv_object", "value": {"name": "sklearn.model_selection._split.StratifiedKFold", "parameters": {"n_splits": "2", "random_state": "null", "shuffle": "true"}}} |
error_score | Value to assign to the score if an error occurs in estimator fitting
If set to 'raise', the error is raised. If a numeric value is given,
FitFailedWarning is raised. This parameter does not affect the refit
step, which will always raise the error | default: NaN |
estimator | This is assumed to implement the scikit-learn estimator interface
Either estimator needs to provide a ``score`` function,
or ``scoring`` must be passed | default: {"oml-python:serialized_object": "component_reference", "value": {"key": "estimator", "step_name": null}} |
iid | If True, return the average score across folds, weighted by the number
of samples in each test set. In this case, the data is assumed to be
identically distributed across the folds, and the loss minimized is
the total loss per sample, and not the mean loss across the folds
.. deprecated:: 0.22
Parameter ``iid`` is deprecated in 0.22 and will be removed in 0.24 | default: "deprecated" |
n_jobs | Number of jobs to run in parallel
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context
``-1`` means using all processors. See :term:`Glossary `
for more details
.. versionchanged:: v0.20
`n_jobs` default changed from 1 to None | default: null |
param_grid | Dictionary with parameters names (`str`) as keys and lists of
parameter settings to try as values, or a list of such
dictionaries, in which case the grids spanned by each dictionary
in the list are explored. This enables searching over any sequence
of parameter settings | default: [{"max_features": [2, 4]}, {"min_samples_leaf": [1, 10]}] |
pre_dispatch | Controls the number of jobs that get dispatched during parallel
execution. Reducing this number can be useful to avoid an
explosion of memory consumption when more jobs get dispatched
than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediately
created and spawned. Use this for lightweight and
fast-running jobs, to avoid delays due to on-demand
spawning of the jobs
- An int, giving the exact number of total jobs that are
spawned
- A str, giving an expression as a function of n_jobs,
as in '2*n_jobs' | default: "2*n_jobs" |
refit | Refit an estimator using the best found parameters on the whole
dataset
For multiple metric evaluation, this needs to be a `str` denoting the
scorer that would be used to find the best parameters for refitting
the estimator at the end
Where there are considerations other than maximum score in
choosing a best estimator, ``refit`` can be set to a function which
returns the selected ``best_index_`` given ``cv_results_``. In that
case, the ``best_estimator_`` and ``best_params_`` will be set
according to the returned ``best_index_`` while the ``best_score_``
attribute will not be available
The refitted estimator is made available at the ``best_estimator_``
attribute and permits using ``predict`` directly on this
``GridSearchCV`` instance
Also for multiple metric evaluation, the attributes ``best_index_``,
``best_score_`` and ``best_params_`` will only be available if
``refit`` is set and all of them will be determined w.r.t this specific
s... | default: true |
return_train_score | If ``False``, the ``cv_results_`` attribute will not include training
scores
Computing training scores is used to get insights on how different
parameter settings impact the overfitting/underfitting trade-off
However computing the scores on the training set can be computationally
expensive and is not strictly required to select the parameters that
yield the best generalization performance
.. versionadded:: 0.19
.. versionchanged:: 0.21
Default value was changed from ``True`` to ``False``
Examples
--------
>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import GridSearchCV
>>> iris = datasets.load_iris()
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters)
>>> clf.fit(iris.data, iris.target)
GridSearchCV(estimator=SVC(),
param_grid={'C': [1, 10], 'kernel': ('linear', 'rbf')})
>>> sorted(clf.cv_results_.keys())
['mean_fit_time', 'mean_score_time', 'mean_test_score'... | default: false |
scoring | A single str (see :ref:`scoring_parameter`) or a callable
(see :ref:`scoring`) to evaluate the predictions on the test set
For evaluating multiple metrics, either give a list of (unique) strings
or a dict with names as keys and callables as values
NOTE that when using custom scorers, each scorer should return a single
value. Metric functions returning a list/array of values can be wrapped
into multiple scorers that return one value each
See :ref:`multimetric_grid_search` for an example
If None, the estimator's score method is used | default: null |
verbose | Controls the verbosity: the higher, the more messages | default: 0 |