Skip to content

Get number of rows and columns #20

Closed
@datapythonista

Description

@datapythonista

This issue is to discuss how to obtain the size of a dataframe. I'll show with an example, and base it in the pandas API.

Given a dataframe:

import pandas

data = {'col1': [1, 2, 3, 4],
        'col2': [5, 6, 7, 8]}

df = pandas.DataFrame(data)

I think the Pythonic and simpler way to get the number of rows and columns is to just use Python's len, what pandas does:

>>> len(df)  # number of rows
4
>>> len(df.columns)  # number of columns
2

I guess an alternative could be to use df.num_rows and df.num_columns, but IMHO it doesn't add much value, and just makes the API more complex.

One thing to note, is that pandas mostly implements the dict API for a dataframe (as if it was a dictionary of lists, like in the example data). But when returning the number of rows with len(df), this is inconsistent with the dict API, which would return the number of columns (keys). So, with the proposed API len(data) != len(df). I think being fully consistent with the dict API would be misleading, but worth considering it.

Then, pandas offers some extra properties:

df.ndim == 2

df.shape == len(df), len(df.columns)

df.size == len(df) * len(df.columns)

I guess the reason for the first two is that pandas originally implemented Panel, a three dimensional data structure, and ndim and shape made sense with it. But I don't think they add much value now.

I don't think size is that commonly used (will check once we have the data of analyzing pandas usage), and it's trivial for the users to implement it, so I wouldn't add it to the API.

Proposal

  • len(df) returning the number of rows
  • len(df.columns) returning the number of columns

And nothing else regarding the shape of a dataframe.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions