Skip to content

BUG: astype does not handle conversion of null[pyarrow] #52443

Closed
@MCRE-BE

Description

@MCRE-BE

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
a = pd.Series(
    data = [pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT],
    name = 'ArticleID',
    dtype = "null[pyarrow]",
)
a.astype(np.float32)

Issue Description

Issue 1 :

Crashes interactive window on vscode

Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a > possible cause of the failure. Click here for more info. View Jupyter log for further details.

info 14:28:23.983: Got new session 24c69bde-a9ef-4394-a80c-73b8887ee22c info 14:28:23.983: Started new restart session info 14:29:21.252: Cancel all remaining cells true || Error || undefined info 14:29:44.367: Cancel all remaining cells true || Error || undefined warn 14:29:52.401: StdErr from Kernel Process D:/bld/apache-arrow_1679171173706/work/cpp/src/arrow/result.cc:28: Constructed with a non-error status: OK

error 14:29:52.614: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\ProgramData\Miniconda3\envs\DLL_ETL\lib\site-packages\traitlets\traitlets.py:2548: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.
warn(
c:\ProgramData\Miniconda3\envs\DLL_ETL\lib\site-packages\traitlets\traitlets.py:2499: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use '0d8a058a-cb34-455a-a009-7d177f5771e1' instead of 'b"0d8a058a-cb34-455a-a009-7d177f5771e1"'.
warn(
D:/bld/apache-arrow_1679171173706/work/cpp/src/arrow/result.cc:28: Constructed with a non-error status: OK

info 14:29:52.615: Dispose Kernel process 5360.
error 14:29:52.615: Raw kernel process exited code: 3221226505
error 14:29:52.619: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:32419)
at c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:51471
at Map.forEach ()
at y._clearKernelState (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:51456)
at y.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:44938)
at c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:17:96826
at ee (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:1589492)
at jh.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:17:96802)
at Lh.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:17:104079)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
warn 14:29:52.619: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
info 14:29:52.621: Cancel all remaining cells true || Error || undefined

Issue 2 :

Causes TypeError when running as a Python script

File "c:\Users\MCRE\OneDrive - AholdDelhaize.com\9 - Code\-- not synced\03. DLL_ETL\02. In Development\01. DLX12\bug.py", line 9, in a.astype(np.float32) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\generic.py", line 6240, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\internals\managers.py", line 448, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\internals\managers.py", line 352, in apply applied = getattr(b, f)(**kwargs) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\internals\blocks.py", line 526, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\dtypes\astype.py", line 299, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\dtypes\astype.py", line 227, in astype_array values = values.astype(dtype, copy=copy) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\arrays\base.py", line 610, in astype return np.array(self, dtype=dtype, copy=copy) TypeError: float() argument must be a string or a real number, not 'NAType'

Expected Behavior

Either or all of :

  • not crash VSCode interactive window

  • Should convert the pd.Series to a pd.Series with a float or object)type (I think) or any other dtype that can be transformed afterwards. Now I have to :

    1. Convert null[pyarrow] to float[pyarrow]
    2. Convert float[pyarrow] to np.float32

Installed Versions

INSTALLED VERSIONS ------------------ commit : 478d340 python : 3.10.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : English_Netherlands.1252

pandas : 2.0.0
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.0.1
Cython : None
pytest : 7.2.2
hypothesis : None
sphinx : 6.1.3
blosc : None
feather : None
xlsxwriter : 3.0.9
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.0
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.1
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBugSegfaultNon-Recoverable Error

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions