ENH: Using pyright to analyze missing type declarations

This describes a procedure for using the command line tool `pyright` (https://github.com/microsoft/pyright/blob/master/docs/command-line.md)  to identify places in the pandas code that are missing type declarations.  xref #28142 

1. Install pyright:  See https://github.com/microsoft/pyright#command-line
2. In your pandas development folder, create an empty file `py.typed` in the same folder as `pandas\__init__.py`
3. To get the complete analysis as a text file, in your shell, `cd` to the folder containing `README.md` from pandas, and type `pyright --verifytypes pandas! > pyright.out`
4. To determine the modules that need the most work, use the script shown below named `verifytypes.py` which can be run from the command line as `python verifytypes.py` and will print the top 20 modules that need fixing.

Open issues for adding types:
1. We will need to systematically bring over the typing work done by Microsoft in https://github.com/microsoft/python-type-stubs/tree/main/pandas to help enhance our type declarations.
2. Using `pyright` to determine where thing are missing will not determine if we are missing appropriate overloads.  See example below.
3. Most likely, the best way to test if we have all the overloads correct is by fully typing our `tests` code, and adding `# ignore` comments when we are specifically testing for incorrect types.

<details>
<summary>verifytypes.py utility</summary>

```python
import subprocess
import json
import pandas as pd


def getpyrightout() -> bytes:
    try:
        pyrightout = subprocess.run(
            ["pyright", "--outputjson", "--verifytypes", "pandas!"],
            capture_output=True,
            shell=True,
        )
    except Exception as e:
        raise e

    return pyrightout.stdout


def processjson(jsonstr: bytes):
    d = json.loads(jsonstr)
    msgsSeries = pd.Series([k["message"] for k in d["diagnostics"]])
    msgsdf = msgsSeries.str.split('"', n=2, expand=True)
    msgsdf.columns = ["primary", "element", "extra"]
    typemsgs = msgsdf[msgsdf.primary.str.startswith("Type")].copy()
    typemsgs["module"] = typemsgs["element"].str.replace(r"\.[A-Z][a-z_A-Z\.]*$", "")
    notest = typemsgs[~typemsgs.module.str.startswith("pandas.tests")]
    print(
        notest.groupby(["module", "primary"])
        .size()
        .sort_values(ascending=False)
        .head(20)
    )


if __name__ == "__main__":
    processjson(getpyrightout())
```
</details>

<details>
<summary>Example using DataFrame.rename() where overloads are needed</summary>

This is taken from https://github.com/microsoft/python-type-stubs/blob/main/pandas/core/frame.pyi

```python
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Literal["backfill", "bfill", "ffill", "pad"]] = ...,
        axis: Optional[AxisType] = ...,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
        *,
        inplace: Literal[True]
    ) -> None: ...
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Literal["backfill", "bfill", "ffill", "pad"]] = ...,
        axis: Optional[AxisType] = ...,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
        *,
        inplace: Literal[False] = ...
    ) -> DataFrame: ...
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Union[_str, Literal["backfill", "bfill", "ffill", "pad"]]] = ...,
        axis: Optional[AxisType] = ...,
        *,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
    ) -> Union[None, DataFrame]: ...
    @overload
    def fillna(
        self,
        value: Optional[Union[Scalar, Dict, Series, DataFrame]] = ...,
        method: Optional[Union[_str, Literal["backfill", "bfill", "ffill", "pad"]]] = ...,
        axis: Optional[AxisType] = ...,
        inplace: Optional[_bool] = ...,
        limit: int = ...,
        downcast: Optional[Dict] = ...,
    ) -> Union[None, DataFrame]: ...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Using pyright to analyze missing type declarations #39813

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Using pyright to analyze missing type declarations #39813

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions