Closed
Description
Describe the bug
When using SVMSMOTE on dataset which contains a minority class which has very few samples (may be < 10), it'll raise error ValueError: Found array with 0 sample(s) (shape=(0, 600)) while a minimum of 1 is required.
Steps/Code to Reproduce
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import SVMSMOTE # doctest: +NORMALIZE_WHITESPACE
X, y = make_classification(n_classes=3, class_sep=0,
weights=[0.004, 0.451, 0.545], n_informative=3, n_redundant=0, flip_y=0,
n_features=3, n_clusters_per_class=2, n_samples=1000, random_state=10)
print('Original dataset shape %s' % Counter(y))
sm = SVMSMOTE(random_state=42, k_neighbors=4)
X_res, y_res = sm.fit_resample(X, y)
print('Resampled dataset shape %s' % Counter(y_res))
Expected Results
Running without error
Actual Results
Original dataset shape Counter({2: 544, 1: 451, 0: 5})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-78-8f5d2308c2bd> in <module>()
10
11 sm = SVMSMOTE(random_state=42, k_neighbors=4)
---> 12 X_res, y_res = sm.fit_resample(X, y)
13 print('Resampled dataset shape %s' % Counter(y_res))
~/anaconda3/lib/python3.6/site-packages/imblearn/base.py in fit_resample(self, X, y)
82 self.sampling_strategy, y, self._sampling_type)
83
---> 84 output = self._fit_resample(X, y)
85
86 if binarize_y:
~/anaconda3/lib/python3.6/site-packages/imblearn/over_sampling/_smote.py in _fit_resample(self, X, y)
530 def _fit_resample(self, X, y):
531 # print("_fit_resample X shape", X.shape)
--> 532 return self._sample(X, y)
533
534 def _sample(self, X, y):
~/anaconda3/lib/python3.6/site-packages/imblearn/over_sampling/_smote.py in _sample(self, X, y)
569
570 danger_bool = self._in_danger_noise(
--> 571 self.nn_m_, support_vector, class_sample, y, kind='danger')
572 safety_bool = np.logical_not(danger_bool)
573
~/anaconda3/lib/python3.6/site-packages/imblearn/over_sampling/_smote.py in _in_danger_noise(self, nn_estimator, samples, target_class, y, kind)
213 # print("kind", kind)
214 # print("_in_danger_noise samples shape", samples.shape)
--> 215 x = nn_estimator.kneighbors(samples, return_distance=False)[:, 1:]
216 # print("x", x)
217 nn_label = (y[x] != target_class).astype(int)
~/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/base.py in kneighbors(self, X, n_neighbors, return_distance)
400 if X is not None:
401 query_is_train = False
--> 402 X = check_array(X, accept_sparse='csr')
403 else:
404 query_is_train = True
~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
548 " minimum of %d is required%s."
549 % (n_samples, array.shape, ensure_min_samples,
--> 550 context))
551
552 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0, 3)) while a minimum of 1 is required.
Versions
System:
python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
executable: /home/allenyl/anaconda3/bin/python
machine: Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
Python deps:
pip: 19.2.2
setuptools: 41.0.1
sklearn: 0.21.3
numpy: 1.15.1
scipy: 1.4.1
Cython: 0.28.2
pandas: 0.24.1
Metadata
Metadata
Assignees
Labels
No labels