Skip to content

regex filter on non-string columns raises an exception #5798

Closed
@jseabold

Description

@jseabold

I'm not really sure what should be done here. At least a better error message I suppose. Perhaps just skip non-string columns or maybe even try to turn them into strings? E.g., make match("\d", 123) work? Obviously could run into some problems for things that can't be converted (or do "unexpected" things) for turning into string/unicode.

>>> pd.version.version
>>> '0.12.0-1149-g141e93a'

(if columns are all numerical also bombs with different msg)
>>> pd.DataFrame(np.random.random((3,2)), columns=['STRING', 123]).filter(regex='STRING')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-427-e5acd93a79d6> in <module>()
----> 1 pd.DataFrame(np.random.random((3,2)), columns=['STRING', 123]).filter(regex='STRING')

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0_1149_g141e93a-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in filter(self, items, like, regex, axis)
   1522             matcher = re.compile(regex)
   1523             return self.select(lambda x: matcher.search(x) is not None,
-> 1524                                axis=axis_name)
   1525         else:
   1526             raise TypeError('Must pass either `items`, `like`, or `regex`')

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0_1149_g141e93a-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in select(self, crit, axis)
   1116         if len(axis_values) > 0:
   1117             new_axis = axis_values[
-> 1118                 np.asarray([bool(crit(label)) for label in axis_values])]
   1119         else:
   1120             new_axis = axis_values

/usr/local/lib/python2.7/dist-packages/pandas-0.12.0_1149_g141e93a-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in <lambda>(x)
   1521         elif regex:
   1522             matcher = re.compile(regex)
-> 1523             return self.select(lambda x: matcher.search(x) is not None,
   1524                                axis=axis_name)
   1525         else:

TypeError: expected string or buffer

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions