Closed
Description
Problem description
In pandas 0.21 the top level funtion read_parquet() was introduced. Both available engines fastparquet and pyarrow support the specifications of columns to read. If you are only interested in certain columns of a dataframe this reduces the io.
-
Fastparquet https://github.com/dask/fastparquet/blob/master/docs/source/quickstart.rst#reading
-
PyArrow http://pyarrow-xhochy.readthedocs.io/en/latest/pyarrow.parquet.html#pyarrow.parquet.read_table
It should be also possible to specify the columns in pandas.read_parquet().