scikit-learn-contrib
diff --git a/‎.pre-commit-config.yaml
+4 b/‎.pre-commit-config.yaml
+4
diff --git a/‎README.rst
+4-89 b/‎README.rst
+4-89
diff --git a/‎azure-pipelines.yml
+26-14 b/‎azure-pipelines.yml
+26-14
diff --git a/‎build_tools/azure/install.sh
+2-1 b/‎build_tools/azure/install.sh
+2-1
diff --git a/‎build_tools/azure/linting.sh
+43 b/‎build_tools/azure/linting.sh
+43
diff --git a/‎build_tools/azure/posix-docker.yml
+1 b/‎build_tools/azure/posix-docker.yml
+1
diff --git a/‎build_tools/azure/posix.yml
+1 b/‎build_tools/azure/posix.yml
+1
diff --git a/‎build_tools/azure/test_script.sh
+1-1 b/‎build_tools/azure/test_script.sh
+1-1
diff --git a/‎build_tools/azure/windows.yml
+1 b/‎build_tools/azure/windows.yml
+1
diff --git a/‎conftest.py
+1 b/‎conftest.py
+1
diff --git a/‎doc/common_pitfalls.rst
+17-1 b/‎doc/common_pitfalls.rst
+17-1
diff --git a/‎doc/conf.py
+1-6 b/‎doc/conf.py
+1-6
@@ -20,3 +20,7 @@ repos:
      -  id: mypy
         files: sklearn/
         additional_dependencies: [pytest==6.2.4]
+-   repo: https://github.com/PyCQA/isort
+    rev: 5.10.1
+    hooks:
+    -   id: isort
@@ -30,7 +30,7 @@
 .. |PythonMinVersion| replace:: 3.8
 .. |NumPyMinVersion| replace:: 1.17.3
 .. |SciPyMinVersion| replace:: 1.3.2
-.. |ScikitLearnMinVersion| replace:: 1.1.0
+.. |ScikitLearnMinVersion| replace:: 1.0.2
 .. |MatplotlibMinVersion| replace:: 3.1.2
 .. |PandasMinVersion| replace:: 1.0.5
 .. |TensorflowMinVersion| replace:: 2.4.3
@@ -154,92 +154,7 @@ One way of addressing this issue is by re-sampling the dataset as to offset this
 imbalance with the hope of arriving at a more robust and fair decision boundary
 than you would otherwise.
 
-Re-sampling techniques are divided in two categories:
-    1. Under-sampling the majority class(es).
-    2. Over-sampling the minority class.
-    3. Combining over- and under-sampling.
-    4. Create ensemble balanced sets.
-
-Below is a list of the methods currently implemented in this module.
-
-* Under-sampling
-    1. Random majority under-sampling with replacement
-    2. Extraction of majority-minority Tomek links [1]_
-    3. Under-sampling with Cluster Centroids
-    4. NearMiss-(1 & 2 & 3) [2]_
-    5. Condensed Nearest Neighbour [3]_
-    6. One-Sided Selection [4]_
-    7. Neighboorhood Cleaning Rule [5]_
-    8. Edited Nearest Neighbours [6]_
-    9. Instance Hardness Threshold [7]_
-    10. Repeated Edited Nearest Neighbours [14]_
-    11. AllKNN [14]_
-
-* Over-sampling
-    1. Random minority over-sampling with replacement
-    2. SMOTE - Synthetic Minority Over-sampling Technique [8]_
-    3. SMOTENC - SMOTE for Nominal and Continuous [8]_
-    4. SMOTEN - SMOTE for Nominal [8]_
-    5. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 [9]_
-    6. SVM SMOTE - Support Vectors SMOTE [10]_
-    7. ADASYN - Adaptive synthetic sampling approach for imbalanced learning [15]_
-    8. KMeans-SMOTE [17]_
-    9. ROSE - Random OverSampling Examples [19]_
-
-* Over-sampling followed by under-sampling
-    1. SMOTE + Tomek links [12]_
-    2. SMOTE + ENN [11]_
-
-* Ensemble classifier using samplers internally
-    1. Easy Ensemble classifier [13]_
-    2. Balanced Random Forest [16]_
-    3. Balanced Bagging
-    4. RUSBoost [18]_
-
-* Mini-batch resampling for Keras and Tensorflow
-
-The different algorithms are presented in the sphinx-gallery_.
-
-.. _sphinx-gallery: https://imbalanced-learn.readthedocs.io/en/stable/auto_examples/index.html
-
-
-References:
------------
-
-.. [1] : I. Tomek, “Two modifications of CNN,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, pp. 769-772, 1976.
-
-.. [2] : I. Mani, J. Zhang. “kNN approach to unbalanced data distributions: A case study involving information extraction,” In Proceedings of the Workshop on Learning from Imbalanced Data Sets, pp. 1-7, 2003.
-
-.. [3] : P. E. Hart, “The condensed nearest neighbor rule,” IEEE Transactions on Information Theory, vol. 14(3), pp. 515-516, 1968.
-
-.. [4] : M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: One-sided selection,” In Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 179-186, 1997.
-
-.. [5] : J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe, pp. 63-66, 2001.
-
-.. [6] : D. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2(3), pp. 408-421, 1972.
-
-.. [7] : M. R. Smith, T. Martinez, C. Giraud-Carrier, “An instance level analysis of data complexity,” Machine learning, vol. 95(2), pp. 225-256, 2014.
-
-.. [8] : N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
-
-.. [9] : H. Han, W.-Y. Wang, B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” In Proceedings of the 1st International Conference on Intelligent Computing, pp. 878-887, 2005.
-
-.. [10] : H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline over-sampling for imbalanced data classification,” In Proceedings of the 5th International Workshop on computational Intelligence and Applications, pp. 24-29, 2009.
-
-.. [11] : G. E. A. P. A. Batista, R. C. Prati, M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6(1), pp. 20-29, 2004.
-
-.. [12] : G. E. A. P. A. Batista, A. L. C. Bazzan, M. C. Monard, “Balancing training data for automated annotation of keywords: A case study,” In Proceedings of the 2nd Brazilian Workshop on Bioinformatics, pp. 10-18, 2003.
-
-.. [13] : X.-Y. Liu, J. Wu and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 39(2), pp. 539-550, 2009.
-
-.. [14] : I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, 1976.
-
-.. [15] : H. He, Y. Bai, E. A. Garcia, S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In Proceedings of the 5th IEEE International Joint Conference on Neural Networks, pp. 1322-1328, 2008.
-
-.. [16] : C. Chao, A. Liaw, and L. Breiman. "Using random forest to learn imbalanced data." University of California, Berkeley 110 (2004): 1-12.
-
-.. [17] : Felix Last, Georgios Douzas, Fernando Bacao, "Oversampling for Imbalanced Learning Based on K-Means and SMOTE"
-
-.. [18] : Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. "RUSBoost: A hybrid approach to alleviating class imbalance." IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40.1 (2010): 185-197.
+You can refer to the `imbalanced-learn`_ documentation to find details about
+the implemented algorithms.
 
-.. [19] : Menardi, G., Torelli, N.: "Training and assessing classification rules with unbalanced data", Data Mining and Knowledge Discovery,  28, (2014): 92–122
+.. _imbalanced-learn: https://imbalanced-learn.org/stable/user_guide.html
@@ -45,13 +45,16 @@ jobs:
         versionSpec: '3.9'
     - bash: |
         # Include pytest compatibility with mypy
-        pip install pytest flake8 mypy==0.782 black==22.3
+        pip install pytest flake8 mypy==0.782 black==22.3 isort
       displayName: Install linters
     - bash: |
         black --check --diff .
       displayName: Run black
     - bash: |
-        ./build_tools/circle/linting.sh
+        isort --check --diff .
+      displayName: Run isort
+    - bash: |
+        ./build_tools/azure/linting.sh
       displayName: Run linting
     - bash: |
         mypy imblearn/
@@ -102,8 +105,8 @@ jobs:
 # Check compilation with Ubuntu bionic 18.04 LTS and scipy from conda-forge
 - template: build_tools/azure/posix.yml
   parameters:
-    name: Ubuntu_Bionic
-    vmImage: ubuntu-18.04
+    name: Ubuntu_Jammy_Jellyfish
+    vmImage: ubuntu-22.04
     dependsOn: [git_commit, linting]
     condition: |
       and(
@@ -112,7 +115,7 @@ jobs:
         ne(variables['Build.Reason'], 'Schedule')
       )
     matrix:
-      py37_conda_forge_openblas_ubuntu_1804:
+      py38_conda_forge_openblas_ubuntu_1804:
         DISTRIB: 'conda'
         CONDA_CHANNEL: 'conda-forge'
         PYTHON_VERSION: '3.8'
@@ -141,12 +144,12 @@ jobs:
         THREADPOOLCTL_VERSION: 'min'
         COVERAGE: 'false'
       # Linux + Python 3.8 build with OpenBLAS and without SITE_JOBLIB
-      py37_conda_defaults_openblas:
+      py38_conda_defaults_openblas:
         DISTRIB: 'conda'
         CONDA_CHANNEL: 'conda-forge'
         PYTHON_VERSION: '3.8'
         BLAS: 'openblas'
-        NUMPY_VERSION: '1.19.5'  # we cannot get an older version of the dependencies resolution
+        NUMPY_VERSION: '1.21.0'  # we cannot get an older version of the dependencies resolution
         SCIPY_VERSION: 'min'
         SKLEARN_VERSION: 'min'
         MATPLOTLIB_VERSION: 'none'
@@ -155,10 +158,18 @@ jobs:
       # Linux environment to test the latest available dependencies and MKL.
       pylatest_pip_openblas_pandas:
         DISTRIB: 'conda-pip-latest'
-        PYTHON_VERSION: '3.9'
+        PYTHON_VERSION: '*'
         TEST_DOCS: 'true'
         TEST_DOCSTRINGS: 'true'
         CHECK_WARNINGS: 'true'
+      # Test the intermediate version of scikit-learn
+      pylatest_pip_openblas_sklearn_intermediate:
+        DISTRIB: 'conda-pip-latest'
+        PYTHON_VERSION: '3.10'
+        TEST_DOCS: 'true'
+        TEST_DOCSTRINGS: 'true'
+        CHECK_WARNINGS: 'false'
+        SKLEARN_VERSION: '1.1.3'
       pylatest_pip_tensorflow:
         DISTRIB: 'conda-pip-latest-tensorflow'
         CONDA_CHANNEL: 'conda-forge'
@@ -178,11 +189,13 @@ jobs:
         DISTRIB: 'conda-minimum-tensorflow'
         CONDA_CHANNEL: 'conda-forge'
         PYTHON_VERSION: '3.8'
+        NUMPY_VERSION: '1.19.5'  # This version is the minimum requrired by tensorflow
+        SCIPY_VERSION: 'min'
         SKLEARN_VERSION: 'min'
         TENSORFLOW_VERSION: 'min'
         TEST_DOCS: 'true'
         TEST_DOCSTRINGS: 'false'  # it is going to fail because of scikit-learn inheritance
-        CHECK_WARNINGS: 'true'
+        CHECK_WARNINGS: 'false'  # in case the older version raise some FutureWarnings
       pylatest_pip_keras:
         DISTRIB: 'conda-pip-latest-keras'
         CONDA_CHANNEL: 'conda-forge'
@@ -202,11 +215,13 @@ jobs:
         DISTRIB: 'conda-minimum-keras'
         CONDA_CHANNEL: 'conda-forge'
         PYTHON_VERSION: '3.8'
+        NUMPY_VERSION: '1.19.5'  # This version is the minimum requrired by tensorflow
+        SCIPY_VERSION: 'min'
         SKLEARN_VERSION: 'min'
         KERAS_VERSION: 'min'
         TEST_DOCS: 'true'
         TEST_DOCSTRINGS: 'false'  # it is going to fail because of scikit-learn inheritance
-        CHECK_WARNINGS: 'true'
+        CHECK_WARNINGS: 'false'  # in case the older version raise some FutureWarnings
 
 # Currently runs on Python 3.8 while only Python 3.7 available
 # - template: build_tools/azure/posix-docker.yml
@@ -233,7 +248,7 @@ jobs:
 - template: build_tools/azure/posix.yml
   parameters:
     name: macOS
-    vmImage: macOS-10.15
+    vmImage: macOS-11
     dependsOn: [linting, git_commit]
     condition: |
       and(
@@ -275,6 +290,3 @@ jobs:
         PYTHON_ARCH: '64'
         PYTEST_VERSION: '*'
         COVERAGE: 'true'
-      py38_pip_openblas_32bit:
-        PYTHON_VERSION: '3.8'
-        PYTHON_ARCH: '32'
 
@@ -67,7 +67,8 @@ elif [[ "$DISTRIB" == "conda-pip-latest" ]]; then
     make_conda "python=$PYTHON_VERSION"
     python -m pip install -U pip
 
-    python -m pip install scikit-learn pandas matplotlib
+    python -m pip install pandas matplotlib
+    python -m pip install scikit-learn
 
 elif [[ "$DISTRIB" == "conda-pip-latest-tensorflow" ]]; then
     make_conda "python=$PYTHON_VERSION"
 
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+set -e
+# pipefail is necessary to propagate exit codes
+set -o pipefail
+
+flake8 --show-source .
+echo -e "No problem detected by flake8\n"
+
+# For docstrings and warnings of deprecated attributes to be rendered
+# properly, the property decorator must come before the deprecated decorator
+# (else they are treated as functions)
+
+# do not error when grep -B1 "@property" finds nothing
+set +e
+bad_deprecation_property_order=`git grep -A 10 "@property"  -- "*.py" | awk '/@property/,/def /' | grep -B1 "@deprecated"`
+
+if [ ! -z "$bad_deprecation_property_order" ]
+then
+    echo "property decorator should come before deprecated decorator"
+    echo "found the following occurrencies:"
+    echo $bad_deprecation_property_order
+    exit 1
+fi
+
+# Check for default doctest directives ELLIPSIS and NORMALIZE_WHITESPACE
+
+doctest_directive="$(git grep -nw -E "# doctest\: \+(ELLIPSIS|NORMALIZE_WHITESPACE)")"
+
+if [ ! -z "$doctest_directive" ]
+then
+    echo "ELLIPSIS and NORMALIZE_WHITESPACE doctest directives are enabled by default, but were found in:"
+    echo "$doctest_directive"
+    exit 1
+fi
+
+joblib_import="$(git grep -l -A 10 -E "joblib import.+delayed" -- "*.py" ":!sklearn/utils/_joblib.py" ":!sklearn/utils/fixes.py")"
+
+if [ ! -z "$joblib_import" ]; then
+    echo "Use from sklearn.utils.fixes import delayed instead of joblib delayed. The following files contains imports to joblib.delayed:"
+    echo "$joblib_import"
+    exit 1
+fi
@@ -30,6 +30,7 @@ jobs:
     THREADPOOLCTL_VERSION: 'latest'
     COVERAGE: 'false'
     TEST_DOCSTRINGS: 'false'
+    CHECK_WARNINGS: 'false'
     BLAS: 'openblas'
     # Set in azure-pipelines.yml
     DISTRIB: ''
 
@@ -36,6 +36,7 @@ jobs:
     COVERAGE: 'true'
     TEST_DOCS: 'false'
     TEST_DOCSTRINGS: 'false'
+    CHECK_WARNINGS: 'false'
     SHOW_SHORT_SUMMARY: 'false'
   strategy:
     matrix:
 
@@ -34,7 +34,7 @@ if [[ "$COVERAGE" == "true" ]]; then
     TEST_CMD="$TEST_CMD --cov-config='$COVERAGE_PROCESS_START' --cov imblearn --cov-report="
 fi
 
-if [[ -n "$CHECK_WARNINGS" ]]; then
+if [[ "$CHECK_WARNINGS" == "true" ]]; then
     # numpy's 1.19.0's tostring() deprecation is ignored until scipy and joblib removes its usage
     TEST_CMD="$TEST_CMD -Werror::DeprecationWarning -Werror::FutureWarning -Wignore:tostring:DeprecationWarning"
 
 
@@ -21,6 +21,7 @@ jobs:
     PYTEST_XDIST_VERSION: 'latest'
     TEST_DIR: '$(Agent.WorkFolder)/tmp_folder'
     CPU_COUNT: '2'
+    CHECK_WARNINGS: 'false'
   strategy:
     matrix:
       ${{ insert }}: ${{ parameters.matrix }}
 
@@ -6,6 +6,7 @@
 # rather than the one from site-packages.
 
 import os
+
 import pytest
 
 
 
@@ -130,8 +130,24 @@ cross-validation::
   ...     f"{cv_results['test_score'].std():.3f}"
   ... )
   Balanced accuracy mean +/- std. dev.: 0.724 +/- 0.042
+  
+The cross-validation performance looks good, but evaluating the classifiers 
+on the left-out data shows a different picture:: 
 
-We see that the statistical performance are worse than in the previous case.
+  >>> scores = []
+  >>> for fold_id, cv_model in enumerate(cv_results["estimator"]):
+  ...     scores.append(
+  ...         balanced_accuracy_score(
+  ...             y_left_out, cv_model.predict(X_left_out)
+  ...        )
+  ...     )
+  >>> print(
+  ...     f"Balanced accuracy mean +/- std. dev.: "
+  ...     f"{np.mean(scores):.3f} +/- {np.std(scores):.3f}"
+  ... )
+  Balanced accuracy mean +/- std. dev.: 0.698 +/- 0.014
+
+We see that the performance is now worse than the cross-validated performance. 
 Indeed, the data leakage gave us too optimistic results due to the reason
 stated earlier in this section.
 
 
@@ -15,8 +15,8 @@
 import os
 import sys
 from datetime import datetime
-from pathlib import Path
 from io import StringIO
+from pathlib import Path
 
 # If extensions (or modules to document with autodoc) are in another directory,
 # add these directories to sys.path here. If the directory is relative to the
@@ -82,11 +82,6 @@
 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = "sphinx"
 
-# -- Options for math equations -----------------------------------------------
-
-extensions.append("sphinx.ext.imgmath")
-imgmath_image_format = "svg"
-
 # -- Options for HTML output ----------------------------------------------
 
 # The theme to use for HTML and HTML Help pages.  See the documentation for