Skip to content

Pandas version 0.23.1 failed to save dataframe onto HDFS with hdfs cli #21560

Closed
@jiyuan312986471

Description

@jiyuan312986471

Problem description

I'm using Pandas for data transformation and the I/O is HDFS(by using HdfsCLI).

When I use version 0.23.1, the to_csv function gives me AttributeError:

Traceback (most recent call last):
  File "GEOCODAGE_REP_AGREE_STEP_2_V2.py", line 450, in <module>
    main(args)
  File "GEOCODAGE_REP_AGREE_STEP_2_V2.py", line 402, in main
    encoding=encoding)
  File "GEOCODAGE_REP_AGREE_STEP_2_V2.py", line 367, in save_csv
    header=False)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 1745, in to_csv
    formatter.save()
  File "/usr/local/lib/python3.5/site-packages/pandas/io/formats/csvs.py", line 167, in save
    f.close()
AttributeError: 'AsyncWriter' object has no attribute 'close'

Where AsyncWriter is a writer class of HdfsCLI.

Here's the code

with cli_hdfs.write(output_path, encoding=encoding, overwrite=True) as writer:
    df.to_csv(writer, sep=sep, index=False, encoding=encoding, header=False)

Note: When I downgrade my pandas to 0.23.0 the problem is solved.

I have googled a lot without finding anything useful. So I came here to open the issue because it seems a bug to me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csvIO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions