-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYPING: some type hints for pandas\io\common.py #27598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
WillAyd
merged 15 commits into
pandas-dev:master
from
simonjayhawkins:pandas-io-common.py
Aug 2, 2019
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
9421bb9
TYPING: some type hints for pandas\io\common.py
simonjayhawkins 0e4a1d9
BytesIO -> BinaryIO
simonjayhawkins 3abd219
Merge remote-tracking branch 'upstream/master' into pandas-io-common.py
simonjayhawkins 0b96604
address comments (WillAyd)
simonjayhawkins 749237e
Merge remote-tracking branch 'upstream/master' into pandas-io-common.py
simonjayhawkins dac39b3
remove ignores and casts
simonjayhawkins 32caa9f
revert addition of return value to get_filepath_or_buffer
simonjayhawkins 7502435
fix AttributeError: 'str' object has no attribute 'writelines' and ad…
simonjayhawkins ab3f546
use FilePathOrBuffer as TypeVar
simonjayhawkins 07980b9
refactor filepath_or_buffer handling
simonjayhawkins 9886585
add test for bad arg
simonjayhawkins 0d6af90
move import to top
simonjayhawkins 2701be9
Merge remote-tracking branch 'upstream/master' into pandas-io-common.py
simonjayhawkins 3c71335
resolve merge conflicts
simonjayhawkins 171b03a
Merge remote-tracking branch 'upstream/master' into pandas-io-common.py
simonjayhawkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
import mmap | ||
import os | ||
import pathlib | ||
from typing import IO, AnyStr, BinaryIO, Optional, TextIO, Type | ||
from urllib.error import URLError # noqa | ||
from urllib.parse import ( # noqa | ||
urlencode, | ||
|
@@ -32,6 +33,8 @@ | |
|
||
from pandas.core.dtypes.common import is_file_like | ||
|
||
from pandas._typing import FilePathOrBuffer | ||
|
||
# gh-12665: Alias for now and remove later. | ||
CParserError = ParserError | ||
|
||
|
@@ -68,14 +71,14 @@ class BaseIterator: | |
Useful only when the object being iterated is non-reusable (e.g. OK for a | ||
parser, not for an in-memory table, yes for its iterator).""" | ||
|
||
def __iter__(self): | ||
def __iter__(self) -> "BaseIterator": | ||
return self | ||
|
||
def __next__(self): | ||
raise AbstractMethodError(self) | ||
|
||
|
||
def _is_url(url): | ||
def _is_url(url) -> bool: | ||
"""Check to see if a URL has a valid protocol. | ||
|
||
Parameters | ||
|
@@ -93,7 +96,9 @@ def _is_url(url): | |
return False | ||
|
||
|
||
def _expand_user(filepath_or_buffer): | ||
def _expand_user( | ||
filepath_or_buffer: FilePathOrBuffer[AnyStr] | ||
) -> FilePathOrBuffer[AnyStr]: | ||
"""Return the argument with an initial component of ~ or ~user | ||
replaced by that user's home directory. | ||
|
||
|
@@ -111,7 +116,7 @@ def _expand_user(filepath_or_buffer): | |
return filepath_or_buffer | ||
|
||
|
||
def _validate_header_arg(header): | ||
def _validate_header_arg(header) -> None: | ||
if isinstance(header, bool): | ||
raise TypeError( | ||
"Passing a bool to header is invalid. " | ||
|
@@ -121,7 +126,9 @@ def _validate_header_arg(header): | |
) | ||
|
||
|
||
def _stringify_path(filepath_or_buffer): | ||
def _stringify_path( | ||
filepath_or_buffer: FilePathOrBuffer[AnyStr] | ||
) -> FilePathOrBuffer[AnyStr]: | ||
"""Attempt to convert a path-like object to a string. | ||
|
||
Parameters | ||
|
@@ -144,21 +151,22 @@ def _stringify_path(filepath_or_buffer): | |
strings, buffers, or anything else that's not even path-like. | ||
""" | ||
if hasattr(filepath_or_buffer, "__fspath__"): | ||
return filepath_or_buffer.__fspath__() | ||
# https://github.com/python/mypy/issues/1424 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When we drop 3.5 support do you think we can just do `isinstance(filepath_or_buffer, os.PathLike) here instead? |
||
return filepath_or_buffer.__fspath__() # type: ignore | ||
elif isinstance(filepath_or_buffer, pathlib.Path): | ||
return str(filepath_or_buffer) | ||
return _expand_user(filepath_or_buffer) | ||
|
||
|
||
def is_s3_url(url): | ||
def is_s3_url(url) -> bool: | ||
"""Check for an s3, s3n, or s3a url""" | ||
try: | ||
return parse_url(url).scheme in ["s3", "s3n", "s3a"] | ||
except Exception: | ||
return False | ||
|
||
|
||
def is_gcs_url(url): | ||
def is_gcs_url(url) -> bool: | ||
"""Check for a gcs url""" | ||
try: | ||
return parse_url(url).scheme in ["gcs", "gs"] | ||
|
@@ -167,7 +175,10 @@ def is_gcs_url(url): | |
|
||
|
||
def get_filepath_or_buffer( | ||
filepath_or_buffer, encoding=None, compression=None, mode=None | ||
filepath_or_buffer: FilePathOrBuffer, | ||
encoding: Optional[str] = None, | ||
compression: Optional[str] = None, | ||
mode: Optional[str] = None, | ||
): | ||
""" | ||
If the filepath_or_buffer is a url, translate and return the buffer. | ||
|
@@ -190,7 +201,7 @@ def get_filepath_or_buffer( | |
""" | ||
filepath_or_buffer = _stringify_path(filepath_or_buffer) | ||
|
||
if _is_url(filepath_or_buffer): | ||
if isinstance(filepath_or_buffer, str) and _is_url(filepath_or_buffer): | ||
req = urlopen(filepath_or_buffer) | ||
content_encoding = req.headers.get("Content-Encoding", None) | ||
if content_encoding == "gzip": | ||
|
@@ -224,7 +235,7 @@ def get_filepath_or_buffer( | |
return filepath_or_buffer, None, compression, False | ||
|
||
|
||
def file_path_to_url(path): | ||
def file_path_to_url(path: str) -> str: | ||
""" | ||
converts an absolute native path to a FILE URL. | ||
|
||
|
@@ -242,7 +253,9 @@ def file_path_to_url(path): | |
_compression_to_extension = {"gzip": ".gz", "bz2": ".bz2", "zip": ".zip", "xz": ".xz"} | ||
|
||
|
||
def _infer_compression(filepath_or_buffer, compression): | ||
def _infer_compression( | ||
filepath_or_buffer: FilePathOrBuffer, compression: Optional[str] | ||
) -> Optional[str]: | ||
""" | ||
Get the compression method for filepath_or_buffer. If compression='infer', | ||
the inferred compression method is returned. Otherwise, the input | ||
|
@@ -435,7 +448,13 @@ class BytesZipFile(zipfile.ZipFile, BytesIO): # type: ignore | |
""" | ||
|
||
# GH 17778 | ||
def __init__(self, file, mode, compression=zipfile.ZIP_DEFLATED, **kwargs): | ||
def __init__( | ||
self, | ||
file: FilePathOrBuffer, | ||
mode: str, | ||
compression: int = zipfile.ZIP_DEFLATED, | ||
**kwargs | ||
): | ||
if mode in ["wb", "rb"]: | ||
mode = mode.replace("b", "") | ||
super().__init__(file, mode, compression, **kwargs) | ||
|
@@ -461,16 +480,16 @@ class MMapWrapper(BaseIterator): | |
|
||
""" | ||
|
||
def __init__(self, f): | ||
def __init__(self, f: IO): | ||
self.mmap = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) | ||
|
||
def __getattr__(self, name): | ||
def __getattr__(self, name: str): | ||
return getattr(self.mmap, name) | ||
|
||
def __iter__(self): | ||
def __iter__(self) -> "MMapWrapper": | ||
return self | ||
|
||
def __next__(self): | ||
def __next__(self) -> str: | ||
newline = self.mmap.readline() | ||
|
||
# readline returns bytes, not str, but Python's CSV reader | ||
|
@@ -491,16 +510,16 @@ class UTF8Recoder(BaseIterator): | |
Iterator that reads an encoded stream and re-encodes the input to UTF-8 | ||
""" | ||
|
||
def __init__(self, f, encoding): | ||
def __init__(self, f: BinaryIO, encoding: str): | ||
self.reader = codecs.getreader(encoding)(f) | ||
|
||
def read(self, bytes=-1): | ||
def read(self, bytes: int = -1) -> bytes: | ||
return self.reader.read(bytes).encode("utf-8") | ||
|
||
def readline(self): | ||
def readline(self) -> bytes: | ||
return self.reader.readline().encode("utf-8") | ||
|
||
def next(self): | ||
def next(self) -> bytes: | ||
return next(self.reader).encode("utf-8") | ||
|
||
|
||
|
@@ -511,5 +530,7 @@ def UnicodeReader(f, dialect=csv.excel, encoding="utf-8", **kwds): | |
return csv.reader(f, dialect=dialect, **kwds) | ||
|
||
|
||
def UnicodeWriter(f, dialect=csv.excel, encoding="utf-8", **kwds): | ||
def UnicodeWriter( | ||
f: TextIO, dialect: Type[csv.Dialect] = csv.excel, encoding: str = "utf-8", **kwds | ||
): | ||
return csv.writer(f, dialect=dialect, **kwds) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does subscripting with
AnyStr
do here? Add to Union?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FilePathOrBuffer is a just a Union. FilePathOrBuffer[AnyStr] with subscription is effectively a TypeVar in IO. so IO[str] can't become IO[bytes].
i think https://mypy.readthedocs.io/en/latest/generics.html#generic-type-aliases explains it. will send a different link if i come across a better explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty cool - thanks for sharing! Does this really change anything though? We already have
IO[AnyStr]
inFilePathOrBuffer
so this just restates that (?)Unrelated note -
IO[AnyStr]
might itself be wrong asAnyStr
is a TypeVar and I think we need to parametrizeIO
with the actual type. I find the Python docs rather confusing on that so maybe we removeAnyStr
altogether but can level that as a separate exercise (unless it helps simplify annotation here)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alias is a Union, the Union has one and only one Generic. so parametrising the alias is parametrising the only Generic in the Union, i.e. IO.
I don't think AnyStr is treated as a TypeVar inside the union when the alias is defined. so that's why it's needed in use
yes. mypy will fail without it.
in to_html etc only string buffers are supported. hence
Optional[FilePathOrBuffer[str]]
is used (note the parametrisation of FilePathOrBuffer here) and the TypeVar then becomes necessary otherwise a bytes buffer could be returned.buffer_put_lines(buf: IO[str], lines: List[str]) -> None
only supports string buffers. so mypy will raise if we don't use TypeVars to maintain the FilePathOrBuffer type in and out of the common functions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I think the std documentation isn't very clear but just defining
IO
creates a generic accepting a type ofAnyStr
which is str / bytes. This is in contrast to other generics that really accept typeT
(essentially anything). You can see this if you try to inject a non-str or bytes typeyields
So I think an error to keep re-parametrizing IO with
AnyStr
in the _typing module and here.Let's leave to a follow up to clean up though