Open
Description
This describes a procedure for using the command line tool pyright
(https://github.com/microsoft/pyright/blob/master/docs/command-line.md) to identify places in the pandas code that are missing type declarations. xref #28142
- Install pyright: See https://github.com/microsoft/pyright#command-line
- In your pandas development folder, create an empty file
py.typed
in the same folder aspandas\__init__.py
- To get the complete analysis as a text file, in your shell,
cd
to the folder containingREADME.md
from pandas, and typepyright --verifytypes pandas! > pyright.out
- To determine the modules that need the most work, use the script shown below named
verifytypes.py
which can be run from the command line aspython verifytypes.py
and will print the top 20 modules that need fixing.
Open issues for adding types:
- We will need to systematically bring over the typing work done by Microsoft in https://github.com/microsoft/python-type-stubs/tree/main/pandas to help enhance our type declarations.
- Using
pyright
to determine where thing are missing will not determine if we are missing appropriate overloads. See example below. - Most likely, the best way to test if we have all the overloads correct is by fully typing our
tests
code, and adding# ignore
comments when we are specifically testing for incorrect types.
verifytypes.py utility
import subprocess
import json
import pandas as pd
def getpyrightout() -> bytes:
try:
pyrightout = subprocess.run(
["pyright", "--outputjson", "--verifytypes", "pandas!"],
capture_output=True,
shell=True,
)
except Exception as e:
raise e
return pyrightout.stdout
def processjson(jsonstr: bytes):
d = json.loads(jsonstr)
msgsSeries = pd.Series([k["message"] for k in d["diagnostics"]])
msgsdf = msgsSeries.str.split('"', n=2, expand=True)
msgsdf.columns = ["primary", "element", "extra"]
typemsgs = msgsdf[msgsdf.primary.str.startswith("Type")].copy()
typemsgs["module"] = typemsgs["element"].str.replace(r"\.[A-Z][a-z_A-Z\.]*$", "")
notest = typemsgs[~typemsgs.module.str.startswith("pandas.tests")]
print(
notest.groupby(["module", "primary"])
.size()
.sort_values(ascending=False)
.head(20)
)
if __name__ == "__main__":
processjson(getpyrightout())
Example using DataFrame.rename() where overloads are needed
This is taken from https://github.com/microsoft/python-type-stubs/blob/main/pandas/core/frame.pyi
@overload
def fillna(
self,
value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
method: Optional[Literal["backfill", "bfill", "ffill", "pad"]] = ...,
axis: Optional[AxisType] = ...,
limit: int = ...,
downcast: Optional[Dict] = ...,
*,
inplace: Literal[True]
) -> None: ...
@overload
def fillna(
self,
value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
method: Optional[Literal["backfill", "bfill", "ffill", "pad"]] = ...,
axis: Optional[AxisType] = ...,
limit: int = ...,
downcast: Optional[Dict] = ...,
*,
inplace: Literal[False] = ...
) -> DataFrame: ...
@overload
def fillna(
self,
value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
method: Optional[Union[_str, Literal["backfill", "bfill", "ffill", "pad"]]] = ...,
axis: Optional[AxisType] = ...,
*,
limit: int = ...,
downcast: Optional[Dict] = ...,
) -> Union[None, DataFrame]: ...
@overload
def fillna(
self,
value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
method: Optional[Union[_str, Literal["backfill", "bfill", "ffill", "pad"]]] = ...,
axis: Optional[AxisType] = ...,
inplace: Optional[_bool] = ...,
limit: int = ...,
downcast: Optional[Dict] = ...,
) -> Union[None, DataFrame]: ...