-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYPING: some type hints for pandas\io\common.py #27598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYPING: some type hints for pandas\io\common.py #27598
Conversation
pandas/io/common.py
Outdated
def __init__(self, file, mode, compression=zipfile.ZIP_DEFLATED, **kwargs): | ||
def __init__( | ||
self, | ||
file: Union[BinaryIO, str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm did FilePathOrBuffer not work here? Does ZipFile not accept pathlib.Path objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works. will change.
pandas/io/common.py
Outdated
@@ -511,5 +526,7 @@ def UnicodeReader(f, dialect=csv.excel, encoding="utf-8", **kwds): | |||
return csv.reader(f, dialect=dialect, **kwds) | |||
|
|||
|
|||
def UnicodeWriter(f, dialect=csv.excel, encoding="utf-8", **kwds): | |||
def UnicodeWriter( | |||
f: TextIO, dialect: Type[csv.excel] = csv.excel, encoding: str = "utf-8", **kwds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f: TextIO, dialect: Type[csv.excel] = csv.excel, encoding: str = "utf-8", **kwds | |
f: TextIO, dialect: Type[csv.Dialect] = csv.excel, encoding: str = "utf-8", **kwds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
pandas/io/formats/format.py
Outdated
@@ -730,6 +731,11 @@ def to_string(self) -> None: | |||
""" | |||
Render a DataFrame to a console-friendly tabular output. | |||
""" | |||
# Note: the to_string method only accepts IO whereas to_html and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the notes - are these bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the sort of inconsistencies between to_html, to_string and to_latex that I was hoping to find by adding type hints.
This should help with refactoring the three methods to share buffer handling code.
Can't really call them bugs when the documentation doesn't say that filenames are accepted (even though they are for to_html and to_latex).
buf : StringIO-like, optional
Buffer to write to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the lifecycle of self.buf = cast(...)
is? Would that cast still just stay local to the function?
@@ -144,21 +147,22 @@ def _stringify_path(filepath_or_buffer): | |||
strings, buffers, or anything else that's not even path-like. | |||
""" | |||
if hasattr(filepath_or_buffer, "__fspath__"): | |||
return filepath_or_buffer.__fspath__() | |||
# https://github.com/python/mypy/issues/1424 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we drop 3.5 support do you think we can just do `isinstance(filepath_or_buffer, os.PathLike) here instead?
pandas/io/formats/format.py
Outdated
@@ -730,6 +731,11 @@ def to_string(self) -> None: | |||
""" | |||
Render a DataFrame to a console-friendly tabular output. | |||
""" | |||
# Note: the to_string method only accepts IO whereas to_html and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the lifecycle of self.buf = cast(...)
is? Would that cast still just stay local to the function?
On hindsight, i shouldn't have used cast... i was just avoiding adding a ignore. I think in this case, since a TypeError is raised at runtime, an ignore should have been used for the static check. |
some changes so that one of the errors is now instead of still needs a |
pandas/io/formats/format.py
Outdated
if hasattr(buf, "write"): | ||
yield buf | ||
elif isinstance(buf, str): | ||
import codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import at the top
|
||
def get_result( | ||
self, | ||
buf: Optional[FilePathOrBuffer[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you followup with some doc-strings for these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
small comments, @WillAyd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change / refactor one minor question
@@ -93,7 +96,9 @@ def _is_url(url): | |||
return False | |||
|
|||
|
|||
def _expand_user(filepath_or_buffer): | |||
def _expand_user( | |||
filepath_or_buffer: FilePathOrBuffer[AnyStr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does subscripting with AnyStr
do here? Add to Union?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FilePathOrBuffer is a just a Union. FilePathOrBuffer[AnyStr] with subscription is effectively a TypeVar in IO. so IO[str] can't become IO[bytes].
i think https://mypy.readthedocs.io/en/latest/generics.html#generic-type-aliases explains it. will send a different link if i come across a better explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty cool - thanks for sharing! Does this really change anything though? We already have IO[AnyStr]
in FilePathOrBuffer
so this just restates that (?)
Unrelated note - IO[AnyStr]
might itself be wrong as AnyStr
is a TypeVar and I think we need to parametrize IO
with the actual type. I find the Python docs rather confusing on that so maybe we remove AnyStr
altogether but can level that as a separate exercise (unless it helps simplify annotation here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated note -
IO[AnyStr]
might itself be wrong asAnyStr
is a TypeVar and I think we need to parametrizeIO
with the actual type.
The alias is a Union, the Union has one and only one Generic. so parametrising the alias is parametrising the only Generic in the Union, i.e. IO.
We already have
IO[AnyStr]
inFilePathOrBuffer
so this just restates that (?)
I don't think AnyStr is treated as a TypeVar inside the union when the alias is defined. so that's why it's needed in use
Does this really change anything though?
yes. mypy will fail without it.
in to_html etc only string buffers are supported. hence Optional[FilePathOrBuffer[str]]
is used (note the parametrisation of FilePathOrBuffer here) and the TypeVar then becomes necessary otherwise a bytes buffer could be returned.
buffer_put_lines(buf: IO[str], lines: List[str]) -> None
only supports string buffers. so mypy will raise if we don't use TypeVars to maintain the FilePathOrBuffer type in and out of the common functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I think the std documentation isn't very clear but just defining IO
creates a generic accepting a type of AnyStr
which is str / bytes. This is in contrast to other generics that really accept type T
(essentially anything). You can see this if you try to inject a non-str or bytes type
from typing import IO
foo: IO[int]
yields
error: Value of type variable "AnyStr" of "IO" cannot be "int"
So I think an error to keep re-parametrizing IO with AnyStr
in the _typing module and here.
Let's leave to a follow up to clean up though
Thanks @simonjayhawkins |
No description provided.