Skip to content

str accessor functions returns float(NaN) instead of pd.NA #30966

Closed
@tsvikas

Description

@tsvikas

Code Sample

import pandas as pd
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'], dtype="string")
type(s.str.lower()[5])  # returns <class 'float'>

Problem description

the str accessor, when working on string-typed series, should return a string-typed series, which should be an array of [string, pd.NA] only, but it seems that some functions (see list below) can return series that contains float('nan').

Affected functions

As of now, I found these str accessor functions to be affected:
upper lower replace
also, extract(expand=False) on a string type series returns an object type series, which seems unintended as well.

Expected Output

<class 'pandas._libs.missing.NAType'>

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.0.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.0.0-38-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_IL
LOCALE           : en_IL.UTF-8

pandas           : 1.0.0rc0
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 18.1
setuptools       : 40.8.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions