Skip to content

ENH: Export (a subset of?) pandas._typing for type checking #55231

Open
@caneff

Description

@caneff

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

There are public functions and methods whose arguments are types that are not actually exported. This makes it hard to propogate those types to other functions that then call the pandas ones.

For instance, merge has a how argument that has a type _typing.MergeHow = Literal['cross', 'inner', 'left', 'outer', 'right'], but since _typing is protected, there is no good way to take it as an argument and instead I have to say

def foo(df: pd.DataFrame, ...., how: str):
  ...
  assert how in ['cross', 'inner', 'left', 'outer', 'right']
  pd.merge(..., how=how)

For my type checker to be OK with it. This is both annoyingly verbose and fragile to updates of Pandas

Feature Description

Add a typing module that exposes a (possible subset) of _typing.

I say possible subset because from looking at the _typing module there are clearly types that are internal usage only and I'm guessing we don't want to have them public so that they can be changed easier.

I would propose the subset be all types that are used as arguments of public functions and methods.

This way my function above could have;

import pandas as pd
import pandas.typing as pd_typing
def foo(df: pd.DataFrame, ..., how: pd_typing.MergeHow):
   ...
   pd.merge(...., how=how)

and have everything work.

Alternative Solutions

Technically these types are "available" when imported by other modules, so you can access MergeHow via pandas.core.reshape.merge.MergeHow or pandas.core.frame.MergeHow but those are just imports from _typing imported to be used by those modules themselves, not something users should rely on.

Other alternatives

A) Split the public ones out of _typing into typing, could from typing import * in _typing if we don't want to rewrite everywhere the newly public types are used.

B) Just make all of typing public. As someone who is not heavy into Pandas internals I have no strong opinion here but my guess is that there are internal types that we don't want public.

Additional Context

I'm more than willing to take this PR myself I just want feedback about whether this would be accepted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementTypingtype annotations, mypy/pyright type checking

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions