Skip to content

Suggestion: documentation examples should use meaningful data where possible #16709

Closed
@colinmorris

Description

@colinmorris

Something like this is a pretty good example. I know something about animals, cats, dogs, and hair, so I can sort of keep the structure of the data in my head and follow along with the transformations without having to scroll up and down to check against the original dataframe.

But a lot of examples just use meaningless column names like A, B, C, D... or foo, bar, baz..., which makes it a lot harder to gain an intuition about what's going on. For example, if you don't know what groupby does, this example:

In [13]: df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A'], 'Y' : [1, 2, 3, 4]})

In [14]: df2.groupby(['X']).sum()
Out[14]: 
   Y
X   
A  7
B  3

might be less useful than this version:

In [13]: pets = pd.DataFrame({'animal' : ['dog', 'dog', 'cat', 'cat'], 'weight' : [10, 20, 8, 9]})

In [14]: pets.groupby(['weight']).mean()
Out[14]: 
   weight
animal   
dog  15
cat  8.5

I realize re-doing all the examples like this would be a significant amount of work, but if there's agreement that this is a desirable thing, I'd be happy to kick things off with a small P.R.

Also, I think it would be good to add this as a guideline to the documentation section of the contributing doc. (Again, if people agree this is worthwhile and not misguided.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions