Skip to content

TYPING: some type hints for pandas\io\common.py #27598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Aug 2, 2019

Conversation

simonjayhawkins
Copy link
Member

No description provided.

@simonjayhawkins simonjayhawkins added Output-Formatting __repr__ of pandas objects, to_string Typing type annotations, mypy/pyright type checking labels Jul 25, 2019
def __init__(self, file, mode, compression=zipfile.ZIP_DEFLATED, **kwargs):
def __init__(
self,
file: Union[BinaryIO, str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm did FilePathOrBuffer not work here? Does ZipFile not accept pathlib.Path objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works. will change.

@@ -511,5 +526,7 @@ def UnicodeReader(f, dialect=csv.excel, encoding="utf-8", **kwds):
return csv.reader(f, dialect=dialect, **kwds)


def UnicodeWriter(f, dialect=csv.excel, encoding="utf-8", **kwds):
def UnicodeWriter(
f: TextIO, dialect: Type[csv.excel] = csv.excel, encoding: str = "utf-8", **kwds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f: TextIO, dialect: Type[csv.excel] = csv.excel, encoding: str = "utf-8", **kwds
f: TextIO, dialect: Type[csv.Dialect] = csv.excel, encoding: str = "utf-8", **kwds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -730,6 +731,11 @@ def to_string(self) -> None:
"""
Render a DataFrame to a console-friendly tabular output.
"""
# Note: the to_string method only accepts IO whereas to_html and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the notes - are these bugs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the sort of inconsistencies between to_html, to_string and to_latex that I was hoping to find by adding type hints.

This should help with refactoring the three methods to share buffer handling code.

Can't really call them bugs when the documentation doesn't say that filenames are accepted (even though they are for to_html and to_latex).

buf : StringIO-like, optional
Buffer to write to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what the lifecycle of self.buf = cast(...) is? Would that cast still just stay local to the function?

@@ -144,21 +147,22 @@ def _stringify_path(filepath_or_buffer):
strings, buffers, or anything else that's not even path-like.
"""
if hasattr(filepath_or_buffer, "__fspath__"):
return filepath_or_buffer.__fspath__()
# https://github.com/python/mypy/issues/1424
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we drop 3.5 support do you think we can just do `isinstance(filepath_or_buffer, os.PathLike) here instead?

@@ -730,6 +731,11 @@ def to_string(self) -> None:
"""
Render a DataFrame to a console-friendly tabular output.
"""
# Note: the to_string method only accepts IO whereas to_html and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what the lifecycle of self.buf = cast(...) is? Would that cast still just stay local to the function?

@simonjayhawkins
Copy link
Member Author

On hindsight, i shouldn't have used cast... i was just avoiding adding a ignore.

I think in this case, since a TypeError is raised at runtime, an ignore should have been used for the static check.

@simonjayhawkins
Copy link
Member Author

some changes so that one of the errors is now
pandas\io\formats\format.py:911: error: Argument 1 to "buffer_put_lines" has incompatible type "Union[str, Path, IO[str]]"; expected "IO[str]"

instead of
pandas\io\formats\format.py:905: error: Argument 1 to "buffer_put_lines" has incompatible type "Union[str, Path, IO[Any]]"; expected "TextIO"

still needs a # type: ignore anyway because of the usage of hasattr

if hasattr(buf, "write"):
yield buf
elif isinstance(buf, str):
import codecs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import at the top

@jreback jreback added this to the 1.0 milestone Aug 2, 2019

def get_result(
self,
buf: Optional[FilePathOrBuffer[str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you followup with some doc-strings for these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@jreback
Copy link
Contributor

jreback commented Aug 2, 2019

small comments, @WillAyd

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change / refactor one minor question

@@ -93,7 +96,9 @@ def _is_url(url):
return False


def _expand_user(filepath_or_buffer):
def _expand_user(
filepath_or_buffer: FilePathOrBuffer[AnyStr]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does subscripting with AnyStr do here? Add to Union?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FilePathOrBuffer is a just a Union. FilePathOrBuffer[AnyStr] with subscription is effectively a TypeVar in IO. so IO[str] can't become IO[bytes].

i think https://mypy.readthedocs.io/en/latest/generics.html#generic-type-aliases explains it. will send a different link if i come across a better explanation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty cool - thanks for sharing! Does this really change anything though? We already have IO[AnyStr] in FilePathOrBuffer so this just restates that (?)

Unrelated note - IO[AnyStr] might itself be wrong as AnyStr is a TypeVar and I think we need to parametrize IO with the actual type. I find the Python docs rather confusing on that so maybe we remove AnyStr altogether but can level that as a separate exercise (unless it helps simplify annotation here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated note - IO[AnyStr] might itself be wrong as AnyStr is a TypeVar and I think we need to parametrize IO with the actual type.

The alias is a Union, the Union has one and only one Generic. so parametrising the alias is parametrising the only Generic in the Union, i.e. IO.

We already have IO[AnyStr] in FilePathOrBuffer so this just restates that (?)

I don't think AnyStr is treated as a TypeVar inside the union when the alias is defined. so that's why it's needed in use

Does this really change anything though?

yes. mypy will fail without it.

in to_html etc only string buffers are supported. hence Optional[FilePathOrBuffer[str]] is used (note the parametrisation of FilePathOrBuffer here) and the TypeVar then becomes necessary otherwise a bytes buffer could be returned.

buffer_put_lines(buf: IO[str], lines: List[str]) -> None only supports string buffers. so mypy will raise if we don't use TypeVars to maintain the FilePathOrBuffer type in and out of the common functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I think the std documentation isn't very clear but just defining IO creates a generic accepting a type of AnyStr which is str / bytes. This is in contrast to other generics that really accept type T (essentially anything). You can see this if you try to inject a non-str or bytes type

from typing import IO

foo: IO[int]

yields

error: Value of type variable "AnyStr" of "IO" cannot be "int"

So I think an error to keep re-parametrizing IO with AnyStr in the _typing module and here.

Let's leave to a follow up to clean up though

@WillAyd WillAyd merged commit 15b11c8 into pandas-dev:master Aug 2, 2019
@WillAyd
Copy link
Member

WillAyd commented Aug 2, 2019

Thanks @simonjayhawkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants