Skip to content

Feature request: write and read dtypes in csv #19378

Closed
@dkapitan

Description

@dkapitan

So here's something I would like. As an avid pandas user, I'd like to be able to write and read csv's to and from a dataframe including the dtypes of each column.

Reading up on pandas, I thought this does the trick in the most Pythonic way:

import ast
import pandas as pd

# dataframe as example
df = pd.DataFrame(data={'int': [1, 2, 3],
                        'float': [1.0, 2.0, 3.0],
                        'bool': [True, False, True],
                        'date': ['2018-03-01', '1973-09-09', '2009-05-20',]},)
df.date = df.date.astype('datetime64[ns]')

# write .csv with comment that lists dtypes
with open('test.csv', 'w') as f:
    f.write('#' + str(df.dtypes.apply(lambda x: x.name).to_dict()) + '\n')
    df.to_csv(f, index=False, )

# read .csv with comment line to parse dates and dtypes
import ast
from collections import Counter
with open('test.csv', 'r') as f:
    type_header = f.readline()
    dtypes = ast.literal_eval(type_header[types.index('#') + 1:type_header.index('}\n')+1])
    parse_dates = [k for k,v in dtypes.items() if v in ['datetime64[ns]', 'datetime64[ns, tz]', 'timedelta[ns]']]
    dtypes = {k: v for k,v in dtypes.items() if k not in parse_dates}
    foo = pd.read_csv(f, comment='#', dtype=dtypes, parse_dates=parse_dates)

foo.dtypes.all() == df.dtypes.all()

Is this something which is worth including, or is it not generic enough and should I just hack my own extension on the Dataframe class?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions