Closed
Description
For a Series S
, I find the S.str.extract
method very useful. It is great how you implemented naming the resulting DataFrame columns according to the names specified in the capturing groups of the regular expression.
However there seems to be a bug when there is a capture group named "name" for example
>>> import re
>>> import pandas as pd
>>> import numpy as np
>>> data = {
... 'Dave': '[email protected]',
... 'multiple': '[email protected] some text [email protected]',
... 'none': np.nan,
... }
>>> pattern = r'''
... (?P<name>[a-z]+)
... @
... (?P<domain>[a-z]+)
... \.
... (?P<tld>[a-z]{2,4})
... '''
>>> S = pd.Series(data)
>>> result = S.str.extract(pattern, re.VERBOSE)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/strings.py", line 1370, in extract
return self._wrap_result(result, name=name)
File "pandas/core/strings.py", line 1088, in _wrap_result
name = kwargs.get('name') or getattr(result, 'name', None) or self.series.name
File "pandas/core/generic.py", line 730, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> from pandas.util.print_versions import show_versions
>>> show_versions()
INSTALLED VERSIONS
------------------
commit: 5d953e3fba420b6721c7f1c5d53e5812fe113bbc
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.8.0-44-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
pandas: 0.17.0+73.g5d953e3
nose: 1.1.2
pip: None
setuptools: 0.6
Cython: 0.20.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: None
IPython: 0.12.1
sphinx: None
patsy: None
dateutil: 1.5
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.7.2
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
>>>
The result I expected was
>>> exp_list = [
... ("dave", "google", "com"),
... ("rob", "gmail", "com"),
... (np.nan, np.nan, np.nan),
... ]
>>> exp = pd.DataFrame(
... exp_list,
... ["Dave", "multiple", "none"],
... ["name", "domain", "tld"])
>>> print exp
name domain tld
Dave dave google com
multiple rob gmail com
none NaN NaN NaN
>>>