Skip to content

Commit 13b1db7

Browse files
committed
Pushing the docs to dev/ for branch: main, commit ec74b2a78a3365fb49b70c12dd4e305cb5ab6be0
1 parent a375935 commit 13b1db7

File tree

1,528 files changed

+6218
-6037
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,528 files changed

+6218
-6037
lines changed
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/592b2521e44501266ca5339d1fb123cb/plot_rfe_with_cross_validation.py

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,12 @@
2222

2323
from sklearn.datasets import make_classification
2424

25+
n_features = 15
26+
feat_names = [f"feature_{i}" for i in range(15)]
27+
2528
X, y = make_classification(
2629
n_samples=500,
27-
n_features=15,
30+
n_features=n_features,
2831
n_informative=3,
2932
n_redundant=2,
3033
n_repeated=0,
@@ -71,7 +74,12 @@
7174
import matplotlib.pyplot as plt
7275
import pandas as pd
7376

74-
cv_results = pd.DataFrame(rfecv.cv_results_)
77+
data = {
78+
key: value
79+
for key, value in rfecv.cv_results_.items()
80+
if key in ["n_features", "mean_test_score", "std_test_score"]
81+
}
82+
cv_results = pd.DataFrame(data)
7583
plt.figure()
7684
plt.xlabel("Number of features selected")
7785
plt.ylabel("Mean test accuracy")
@@ -91,3 +99,17 @@
9199
# cross-validation technique. The test accuracy decreases above 5 selected
92100
# features, this is, keeping non-informative features leads to over-fitting and
93101
# is therefore detrimental for the statistical performance of the models.
102+
103+
# %%
104+
import numpy as np
105+
106+
for i in range(cv.n_splits):
107+
mask = rfecv.cv_results_[f"split{i}_support"][
108+
rfecv.n_features_
109+
] # mask of features selected by the RFE
110+
features_selected = np.ma.compressed(np.ma.masked_array(feat_names, mask=1 - mask))
111+
print(f"Features selected in fold {i}: {features_selected}")
112+
# %%
113+
# In the five folds, the selected features are consistant. This is good news,
114+
# it means that the selection is stable accross folds, and it confirms that
115+
# these features are the most informative ones.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/949ed208b2147ed2b3e348e81fef52be/plot_rfe_with_cross_validation.ipynb

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
},
3434
"outputs": [],
3535
"source": [
36-
"from sklearn.datasets import make_classification\n\nX, y = make_classification(\n n_samples=500,\n n_features=15,\n n_informative=3,\n n_redundant=2,\n n_repeated=0,\n n_classes=8,\n n_clusters_per_class=1,\n class_sep=0.8,\n random_state=0,\n)"
36+
"from sklearn.datasets import make_classification\n\nn_features = 15\nfeat_names = [f\"feature_{i}\" for i in range(15)]\n\nX, y = make_classification(\n n_samples=500,\n n_features=n_features,\n n_informative=3,\n n_redundant=2,\n n_repeated=0,\n n_classes=8,\n n_clusters_per_class=1,\n class_sep=0.8,\n random_state=0,\n)"
3737
]
3838
},
3939
{
@@ -69,7 +69,7 @@
6969
},
7070
"outputs": [],
7171
"source": [
72-
"import matplotlib.pyplot as plt\nimport pandas as pd\n\ncv_results = pd.DataFrame(rfecv.cv_results_)\nplt.figure()\nplt.xlabel(\"Number of features selected\")\nplt.ylabel(\"Mean test accuracy\")\nplt.errorbar(\n x=cv_results[\"n_features\"],\n y=cv_results[\"mean_test_score\"],\n yerr=cv_results[\"std_test_score\"],\n)\nplt.title(\"Recursive Feature Elimination \\nwith correlated features\")\nplt.show()"
72+
"import matplotlib.pyplot as plt\nimport pandas as pd\n\ndata = {\n key: value\n for key, value in rfecv.cv_results_.items()\n if key in [\"n_features\", \"mean_test_score\", \"std_test_score\"]\n}\ncv_results = pd.DataFrame(data)\nplt.figure()\nplt.xlabel(\"Number of features selected\")\nplt.ylabel(\"Mean test accuracy\")\nplt.errorbar(\n x=cv_results[\"n_features\"],\n y=cv_results[\"mean_test_score\"],\n yerr=cv_results[\"std_test_score\"],\n)\nplt.title(\"Recursive Feature Elimination \\nwith correlated features\")\nplt.show()"
7373
]
7474
},
7575
{
@@ -78,6 +78,24 @@
7878
"source": [
7979
"From the plot above one can further notice a plateau of equivalent scores\n(similar mean value and overlapping errorbars) for 3 to 5 selected features.\nThis is the result of introducing correlated features. Indeed, the optimal\nmodel selected by the RFE can lie within this range, depending on the\ncross-validation technique. The test accuracy decreases above 5 selected\nfeatures, this is, keeping non-informative features leads to over-fitting and\nis therefore detrimental for the statistical performance of the models.\n\n"
8080
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {
86+
"collapsed": false
87+
},
88+
"outputs": [],
89+
"source": [
90+
"import numpy as np\n\nfor i in range(cv.n_splits):\n mask = rfecv.cv_results_[f\"split{i}_support\"][\n rfecv.n_features_\n ] # mask of features selected by the RFE\n features_selected = np.ma.compressed(np.ma.masked_array(feat_names, mask=1 - mask))\n print(f\"Features selected in fold {i}: {features_selected}\")"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"In the five folds, the selected features are consistant. This is good news,\nit means that the selection is stable accross folds, and it confirms that\nthese features are the most informative ones.\n\n"
98+
]
8199
}
82100
],
83101
"metadata": {
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

dev/_downloads/scikit-learn-docs.zip

738 Bytes
Binary file not shown.
-499 Bytes
108 Bytes
-70 Bytes
-66 Bytes

0 commit comments

Comments
 (0)