scikit-learn
diff --git a/‎_pst_preview/.buildinfo
Lines changed: 1 addition & 1 deletion b/‎_pst_preview/.buildinfo
Lines changed: 1 addition & 1 deletion
diff --git a/‎_pst_preview/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
-199 Bytes b/‎_pst_preview/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
-199 Bytes
diff --git a/‎_pst_preview/_downloads/1b8827af01c9a70017a4739bcf2e21a8/plot_gpr_co2.py
Lines changed: 5 additions & 6 deletions b/‎_pst_preview/_downloads/1b8827af01c9a70017a4739bcf2e21a8/plot_gpr_co2.py
Lines changed: 5 additions & 6 deletions
diff --git a/‎_pst_preview/_downloads/23614d75e8327ef369659da7d2ed62db/plot_nested_cross_validation_iris.py
Lines changed: 6 additions & 6 deletions b/‎_pst_preview/_downloads/23614d75e8327ef369659da7d2ed62db/plot_nested_cross_validation_iris.py
Lines changed: 6 additions & 6 deletions
diff --git a/‎_pst_preview/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb
Lines changed: 1 addition & 1 deletion b/‎_pst_preview/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎_pst_preview/_downloads/32173eb704d697c23dffbbf3fd74942a/plot_digits_denoising.py
Lines changed: 5 additions & 5 deletions b/‎_pst_preview/_downloads/32173eb704d697c23dffbbf3fd74942a/plot_digits_denoising.py
Lines changed: 5 additions & 5 deletions
diff --git a/‎_pst_preview/_downloads/3c3c738275484acc54821615bf72894a/plot_permutation_importance.py
Lines changed: 3 additions & 3 deletions b/‎_pst_preview/_downloads/3c3c738275484acc54821615bf72894a/plot_permutation_importance.py
Lines changed: 3 additions & 3 deletions
diff --git a/‎_pst_preview/_downloads/45916745bb89ca49be3a50aa80e65e3f/plot_nested_cross_validation_iris.ipynb
Lines changed: 1 addition & 1 deletion b/‎_pst_preview/_downloads/45916745bb89ca49be3a50aa80e65e3f/plot_nested_cross_validation_iris.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎_pst_preview/_downloads/4e46f015ab8300f262e6e8775bcdcf8a/plot_adaboost_multiclass.py
Lines changed: 11 additions & 11 deletions b/‎_pst_preview/_downloads/4e46f015ab8300f262e6e8775bcdcf8a/plot_adaboost_multiclass.py
Lines changed: 11 additions & 11 deletions
diff --git a/‎_pst_preview/_downloads/51833337bfc73d152b44902e5baa50ff/plot_lasso_lars_ic.ipynb
Lines changed: 1 addition & 1 deletion b/‎_pst_preview/_downloads/51833337bfc73d152b44902e5baa50ff/plot_lasso_lars_ic.ipynb
Lines changed: 1 addition & 1 deletion
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 4aa0097b2b97714bf707e4da06a20a2d
+config: 5f8529899ed1da684e9b1bb6c892cca4
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -4,20 +4,19 @@
 ====================================================================================
 
 This example is based on Section 5.4.3 of "Gaussian Processes for Machine
-Learning" [RW2006]_. It illustrates an example of complex kernel engineering
+Learning" [1]_. It illustrates an example of complex kernel engineering
 and hyperparameter optimization using gradient ascent on the
 log-marginal-likelihood. The data consists of the monthly average atmospheric
 CO2 concentrations (in parts per million by volume (ppm)) collected at the
 Mauna Loa Observatory in Hawaii, between 1958 and 2001. The objective is to
 model the CO2 concentration as a function of the time :math:`t` and extrapolate
 for years after 2001.
 
-.. topic: References
+.. rubric:: References
 
-    .. [RW2006] `Rasmussen, Carl Edward.
-       "Gaussian processes in machine learning."
-       Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
-       <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.
+.. [1] `Rasmussen, Carl Edward. "Gaussian processes in machine learning."
+    Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
+    <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.
 """
 
 print(__doc__)
 
@@ -30,17 +30,17 @@
 performance of non-nested and nested CV strategies by taking the difference
 between their scores.
 
-.. topic:: See Also:
+.. seealso::
 
     - :ref:`cross_validation`
     - :ref:`grid_search`
 
-.. topic:: References:
+.. rubric:: References
 
-    .. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and
-     subsequent selection bias in performance evaluation.
-     J. Mach. Learn. Res 2010,11, 2079-2107.
-     <http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_
+.. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and
+    subsequent selection bias in performance evaluation.
+    J. Mach. Learn. Res 2010,11, 2079-2107.
+    <http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_
 
 """
 
 
@@ -330,7 +330,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        ".. topic:: References\n\n   .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n          comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n          Neural computation, 10(7).\n   .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n          error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n          In Advances in neural information processing systems.\n   .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n          of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n          In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n   .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n          for a change: a tutorial for comparing multiple classifiers through\n          Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n          The Journal of Machine Learning Research, 18(1). See the Python\n          library that accompanies this paper [here](https://github.com/janezd/baycomp).\n   .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n          Journal of Business & economic statistics, 20(1), 134-144.\n\n"
+        ".. rubric:: References\n\n.. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n       comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n       Neural computation, 10(7).\n.. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n       error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n       In Advances in neural information processing systems.\n.. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n       of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n       In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n.. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n       for a change: a tutorial for comparing multiple classifiers through\n       Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n       The Journal of Machine Learning Research, 18(1). See the Python\n       library that accompanies this paper [here](https://github.com/janezd/baycomp).\n.. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n       Journal of Business & economic statistics, 20(1), 134-144.\n\n"
       ]
     }
   ],
 
@@ -12,12 +12,12 @@
 
 We will use USPS digits dataset to reproduce presented in Sect. 4 of [1]_.
 
-.. topic:: References
+.. rubric:: References
 
-   .. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
-      "Learning to find pre-images."
-      Advances in neural information processing systems 16 (2004): 449-456.
-      <https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_
+.. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
+    "Learning to find pre-images."
+    Advances in neural information processing systems 16 (2004): 449-456.
+    <https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_
 
 """
 
 
@@ -18,10 +18,10 @@
 This example shows how to use Permutation Importances as an alternative that
 can mitigate those limitations.
 
-.. topic:: References:
+.. rubric:: References
 
-   * :doi:`L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
-     2001. <10.1023/A:1010933404324>`
+* :doi:`L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
+  2001. <10.1023/A:1010933404324>`
 
 """
 
 
@@ -4,7 +4,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. topic:: See Also:\n\n    - `cross_validation`\n    - `grid_search`\n\n.. topic:: References:\n\n    .. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n     subsequent selection bias in performance evaluation.\n     J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
+        "\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. seealso::\n\n    - `cross_validation`\n    - `grid_search`\n\n.. rubric:: References\n\n.. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n    subsequent selection bias in performance evaluation.\n    J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
       ]
     },
     {
 
@@ -17,11 +17,11 @@
 be selected. This ensures that subsequent iterations of the algorithm focus on
 the difficult-to-classify samples.
 
-.. topic:: References:
+.. rubric:: References
 
-    .. [1] :doi:`J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost."
-           Statistics and its Interface 2.3 (2009): 349-360.
-           <10.4310/SII.2009.v2.n3.a8>`
+.. [1] :doi:`J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost."
+    Statistics and its Interface 2.3 (2009): 349-360.
+    <10.4310/SII.2009.v2.n3.a8>`
 
 """
 
@@ -231,16 +231,16 @@ def misclassification_error(y_true, y_pred):
 # decision. Indeed, this exactly is the formulation of updating the base
 # estimators' weights after each iteration in AdaBoost.
 #
-# |details-start| Mathematical details |details-split|
+# .. dropdown:: Mathematical details
 #
-# The weight associated with a weak learner trained at the stage :math:`m` is
-# inversely associated with its misclassification error such that:
+#    The weight associated with a weak learner trained at the stage :math:`m` is
+#    inversely associated with its misclassification error such that:
 #
-# .. math:: \alpha^{(m)} = \log \frac{1 - err^{(m)}}{err^{(m)}} + \log (K - 1),
+#    .. math:: \alpha^{(m)} = \log \frac{1 - err^{(m)}}{err^{(m)}} + \log (K - 1),
 #
-# where :math:`\alpha^{(m)}` and :math:`err^{(m)}` are the weight and the error
-# of the :math:`m` th weak learner, respectively, and :math:`K` is the number of
-# classes in our classification problem. |details-end|
+#    where :math:`\alpha^{(m)}` and :math:`err^{(m)}` are the weight and the error
+#    of the :math:`m` th weak learner, respectively, and :math:`K` is the number of
+#    classes in our classification problem.
 #
 # Another interesting observation boils down to the fact that the first weak
 # learners of the model make fewer errors than later weak learners of the
 
@@ -4,7 +4,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Lasso model selection via information criteria\n\nThis example reproduces the example of Fig. 2 of [ZHT2007]_. A\n:class:`~sklearn.linear_model.LassoLarsIC` estimator is fit on a\ndiabetes dataset and the AIC and the BIC criteria are used to select\nthe best model.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>It is important to note that the optimization to find `alpha` with\n    :class:`~sklearn.linear_model.LassoLarsIC` relies on the AIC or BIC\n    criteria that are computed in-sample, thus on the training set directly.\n    This approach differs from the cross-validation procedure. For a comparison\n    of the two approaches, you can refer to the following example:\n    `sphx_glr_auto_examples_linear_model_plot_lasso_model_selection.py`.</p></div>\n\n.. topic:: References\n\n    .. [ZHT2007] :arxiv:`Zou, Hui, Trevor Hastie, and Robert Tibshirani.\n       \"On the degrees of freedom of the lasso.\"\n       The Annals of Statistics 35.5 (2007): 2173-2192.\n       <0712.0881>`\n"
+        "\n# Lasso model selection via information criteria\n\nThis example reproduces the example of Fig. 2 of [ZHT2007]_. A\n:class:`~sklearn.linear_model.LassoLarsIC` estimator is fit on a\ndiabetes dataset and the AIC and the BIC criteria are used to select\nthe best model.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>It is important to note that the optimization to find `alpha` with\n    :class:`~sklearn.linear_model.LassoLarsIC` relies on the AIC or BIC\n    criteria that are computed in-sample, thus on the training set directly.\n    This approach differs from the cross-validation procedure. For a comparison\n    of the two approaches, you can refer to the following example:\n    `sphx_glr_auto_examples_linear_model_plot_lasso_model_selection.py`.</p></div>\n\n.. rubric:: References\n\n.. [ZHT2007] :arxiv:`Zou, Hui, Trevor Hastie, and Robert Tibshirani.\n    \"On the degrees of freedom of the lasso.\"\n    The Annals of Statistics 35.5 (2007): 2173-2192.\n    <0712.0881>`\n"
       ]
     },
     {
Original file line number	Diff line number	Diff line change
`@@ -330,7 +330,7 @@`
`330`	`330`	`"cell_type": "markdown",`
`331`	`331`	`"metadata": {},`
`332`	`332`	`"source": [`
`333`		- ".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
	`333`	+ ".. rubric:: References\n\n.. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n.. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n.. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n.. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n.. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
`334`	`334`	`]`
`335`	`335`	`}`
`336`	`336`	`],`
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@`
`4`	`4`	`"cell_type": "markdown",`
`5`	`5`	`"metadata": {},`
`6`	`6`	`"source": [`
`7`		- "\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. topic:: See Also:\n\n - `cross_validation`\n - `grid_search`\n\n.. topic:: References:\n\n .. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n subsequent selection bias in performance evaluation.\n J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
	`7`	+ "\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. seealso::\n\n - `cross_validation`\n - `grid_search`\n\n.. rubric:: References\n\n.. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n subsequent selection bias in performance evaluation.\n J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
`8`	`8`	`]`
`9`	`9`	`},`
`10`	`10`	`{`