Skip to content

DOC: consistent imports ('import pandas as pd' et al) #9886

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Throughout the docs, many different imports are used, not always visible to the reader. And ideally, I think this would be a bit more uniform. I think we need:

  • consistent imports
  • visible imports

So the code snippets can be ran without having to know what should be imported, apart from the default imports.
But in that case, of course, we have to decide/agree on which imports to use.


I think import pandas as pd is regarded as recommended? But still, it is not used that much in the docs. So I was wondering, does everybody agree to use that in all docs (and is it just a case of "it is the intention, but no one did it")?
So everywhere pd.DataFrame and pd.Series, or do we still want the convenience of using DataFrame(..)/Series(..) without the pd (so to only from pandas import DataFrame, Series, and use pd. for all the rest)?

So: Question 1: do we use import pandas as pd for everything or not?


Then we have the often used imports from other packages. I think this is standard, and should not need discussion:

import numpy as np
import matplotlib.pyplot as plt

But for the datetime package, the imports are not really uniform. The following two imports are both used mixed in the docs, but are not compatible with each other:

import datetime
from datetime import datetime

So: Question 2: How do we import datetime?

Further, there are some other imports used a lot, like from numpy.random import randn or from numpy import nan.

For other non-pandas imports, I propose that they should always use the standard import (eg np) or be done explicitely where used, and never hidden (as it is now the case sometimes, eg from dateutil.relativedelta import relativedelta)

So: Question 3: Do we agree that all other non-pandas imports should be done explicitely?

This means that imports in the suppressed code block like the following will be removed:

from numpy import nan
randn = np.random.randn
randint = np.random.randint
from dateutil.relativedelta import relativedelta
import random
import os
import csv

Third issue: there are also some pandas imports of non top-level things.

For pandas submodules, imports like this appear in the docs:

from pandas.tseries.api import *
from pandas.tseries.offsets import *

from pandas.core.reshape import *
from pandas.tools.tile import *

happen in the docs (and often not visible to the users), what I think is a bad idea. First, a lot of these imports should never been done as the functions used from there are also in the top-level pandas namespace.

If it is for functions that are used from the submodules, that we have to decide how to import them. At least, the imports should happen explicitely in the visible docs. But to do that, there are some different forms possible:

from pandas.tseries.offsets import *
... BMonthEnd()
... Day()

from pandas.tseries.offsets import BMonthEnd, Day
... BMonthEnd()
... Day()

import pandas.tseries.offsets as offsets (or from pandas.tseries import offsets)
... offsets.BMonthEnd()
... offsets.Day()

So: Question 4: How do we import from pandas submodules?

TO DO:

  • Decide on the imports (see questions above)
  • Clearly document this in the contributor guidelines
  • Adapt this in the documentation

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions