|
30 | 30 | .. |PythonMinVersion| replace:: 3.8
|
31 | 31 | .. |NumPyMinVersion| replace:: 1.17.3
|
32 | 32 | .. |SciPyMinVersion| replace:: 1.3.2
|
33 |
| -.. |ScikitLearnMinVersion| replace:: 1.1.0 |
| 33 | +.. |ScikitLearnMinVersion| replace:: 1.0.2 |
34 | 34 | .. |MatplotlibMinVersion| replace:: 3.1.2
|
35 | 35 | .. |PandasMinVersion| replace:: 1.0.5
|
36 | 36 | .. |TensorflowMinVersion| replace:: 2.4.3
|
@@ -154,92 +154,7 @@ One way of addressing this issue is by re-sampling the dataset as to offset this
|
154 | 154 | imbalance with the hope of arriving at a more robust and fair decision boundary
|
155 | 155 | than you would otherwise.
|
156 | 156 |
|
157 |
| -Re-sampling techniques are divided in two categories: |
158 |
| - 1. Under-sampling the majority class(es). |
159 |
| - 2. Over-sampling the minority class. |
160 |
| - 3. Combining over- and under-sampling. |
161 |
| - 4. Create ensemble balanced sets. |
162 |
| - |
163 |
| -Below is a list of the methods currently implemented in this module. |
164 |
| - |
165 |
| -* Under-sampling |
166 |
| - 1. Random majority under-sampling with replacement |
167 |
| - 2. Extraction of majority-minority Tomek links [1]_ |
168 |
| - 3. Under-sampling with Cluster Centroids |
169 |
| - 4. NearMiss-(1 & 2 & 3) [2]_ |
170 |
| - 5. Condensed Nearest Neighbour [3]_ |
171 |
| - 6. One-Sided Selection [4]_ |
172 |
| - 7. Neighboorhood Cleaning Rule [5]_ |
173 |
| - 8. Edited Nearest Neighbours [6]_ |
174 |
| - 9. Instance Hardness Threshold [7]_ |
175 |
| - 10. Repeated Edited Nearest Neighbours [14]_ |
176 |
| - 11. AllKNN [14]_ |
177 |
| - |
178 |
| -* Over-sampling |
179 |
| - 1. Random minority over-sampling with replacement |
180 |
| - 2. SMOTE - Synthetic Minority Over-sampling Technique [8]_ |
181 |
| - 3. SMOTENC - SMOTE for Nominal and Continuous [8]_ |
182 |
| - 4. SMOTEN - SMOTE for Nominal [8]_ |
183 |
| - 5. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 [9]_ |
184 |
| - 6. SVM SMOTE - Support Vectors SMOTE [10]_ |
185 |
| - 7. ADASYN - Adaptive synthetic sampling approach for imbalanced learning [15]_ |
186 |
| - 8. KMeans-SMOTE [17]_ |
187 |
| - 9. ROSE - Random OverSampling Examples [19]_ |
188 |
| - |
189 |
| -* Over-sampling followed by under-sampling |
190 |
| - 1. SMOTE + Tomek links [12]_ |
191 |
| - 2. SMOTE + ENN [11]_ |
192 |
| - |
193 |
| -* Ensemble classifier using samplers internally |
194 |
| - 1. Easy Ensemble classifier [13]_ |
195 |
| - 2. Balanced Random Forest [16]_ |
196 |
| - 3. Balanced Bagging |
197 |
| - 4. RUSBoost [18]_ |
198 |
| - |
199 |
| -* Mini-batch resampling for Keras and Tensorflow |
200 |
| - |
201 |
| -The different algorithms are presented in the sphinx-gallery_. |
202 |
| - |
203 |
| -.. _sphinx-gallery: https://imbalanced-learn.readthedocs.io/en/stable/auto_examples/index.html |
204 |
| - |
205 |
| - |
206 |
| -References: |
207 |
| ------------ |
208 |
| - |
209 |
| -.. [1] : I. Tomek, “Two modifications of CNN,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, pp. 769-772, 1976. |
210 |
| -
|
211 |
| -.. [2] : I. Mani, J. Zhang. “kNN approach to unbalanced data distributions: A case study involving information extraction,” In Proceedings of the Workshop on Learning from Imbalanced Data Sets, pp. 1-7, 2003. |
212 |
| -
|
213 |
| -.. [3] : P. E. Hart, “The condensed nearest neighbor rule,” IEEE Transactions on Information Theory, vol. 14(3), pp. 515-516, 1968. |
214 |
| -
|
215 |
| -.. [4] : M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: One-sided selection,” In Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 179-186, 1997. |
216 |
| -
|
217 |
| -.. [5] : J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe, pp. 63-66, 2001. |
218 |
| -
|
219 |
| -.. [6] : D. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2(3), pp. 408-421, 1972. |
220 |
| -
|
221 |
| -.. [7] : M. R. Smith, T. Martinez, C. Giraud-Carrier, “An instance level analysis of data complexity,” Machine learning, vol. 95(2), pp. 225-256, 2014. |
222 |
| -
|
223 |
| -.. [8] : N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. |
224 |
| -
|
225 |
| -.. [9] : H. Han, W.-Y. Wang, B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” In Proceedings of the 1st International Conference on Intelligent Computing, pp. 878-887, 2005. |
226 |
| -
|
227 |
| -.. [10] : H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline over-sampling for imbalanced data classification,” In Proceedings of the 5th International Workshop on computational Intelligence and Applications, pp. 24-29, 2009. |
228 |
| -
|
229 |
| -.. [11] : G. E. A. P. A. Batista, R. C. Prati, M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6(1), pp. 20-29, 2004. |
230 |
| -
|
231 |
| -.. [12] : G. E. A. P. A. Batista, A. L. C. Bazzan, M. C. Monard, “Balancing training data for automated annotation of keywords: A case study,” In Proceedings of the 2nd Brazilian Workshop on Bioinformatics, pp. 10-18, 2003. |
232 |
| -
|
233 |
| -.. [13] : X.-Y. Liu, J. Wu and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 39(2), pp. 539-550, 2009. |
234 |
| -
|
235 |
| -.. [14] : I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, 1976. |
236 |
| -
|
237 |
| -.. [15] : H. He, Y. Bai, E. A. Garcia, S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In Proceedings of the 5th IEEE International Joint Conference on Neural Networks, pp. 1322-1328, 2008. |
238 |
| -
|
239 |
| -.. [16] : C. Chao, A. Liaw, and L. Breiman. "Using random forest to learn imbalanced data." University of California, Berkeley 110 (2004): 1-12. |
240 |
| -
|
241 |
| -.. [17] : Felix Last, Georgios Douzas, Fernando Bacao, "Oversampling for Imbalanced Learning Based on K-Means and SMOTE" |
242 |
| -
|
243 |
| -.. [18] : Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. "RUSBoost: A hybrid approach to alleviating class imbalance." IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40.1 (2010): 185-197. |
| 157 | +You can refer to the `imbalanced-learn`_ documentation to find details about |
| 158 | +the implemented algorithms. |
244 | 159 |
|
245 |
| -.. [19] : Menardi, G., Torelli, N.: "Training and assessing classification rules with unbalanced data", Data Mining and Knowledge Discovery, 28, (2014): 92–122 |
| 160 | +.. _imbalanced-learn: https://imbalanced-learn.org/stable/user_guide.html |
0 commit comments