-
Notifications
You must be signed in to change notification settings - Fork 382
Add typehints for fsspec.caching
and fsspec.utils
#1284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fsspec/utils.py
Outdated
if length is not None: | ||
b = f.read(length) | ||
else: | ||
b = f.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we needed to do this (otherwise if delimiter is None and length is None
I'm pretty sure this crashes), but happy to revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change the logic in this PR. I am not immediately sure about what you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure -- can revert and add a type: ignore
instead.
@@ -500,7 +554,7 @@ def merge_offset_ranges(paths, starts, ends, max_gap=0, max_block=None, sort=Tru | |||
if not isinstance(starts, list): | |||
starts = [starts] * len(paths) | |||
if not isinstance(ends, list): | |||
ends = [starts] * len(paths) | |||
ends = [ends] * len(paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was just a bug? I can't figure out what this could would do otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly looks like a bug to me. That said, I don't think starts
or ends
ever correspond to anything other than list
anyway. Perhaps we can just drop the int
support since the result would have been "wrong" before anyway?
I am waiting on this one hoping that anyone with more types experience than me can have a look. I don't see any glaring problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are questions to educate myself.
fsspec/utils.py
Outdated
if length is not None: | ||
b = f.read(length) | ||
else: | ||
b = f.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change the logic in this PR. I am not immediately sure about what you suggest.
@@ -482,11 +530,18 @@ def wrapper(cls): | |||
|
|||
|
|||
@contextmanager | |||
def nullcontext(obj): | |||
def nullcontext(obj: T) -> Iterator[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generator before applying the decorator, but not a generator afterwards (it becomes a context). So, I'm not sure which state should be declared here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a little confusing. This is the correct because contextmanager
has a type annotation that turns the iterator into a context manager for you.
@@ -500,7 +554,7 @@ def merge_offset_ranges(paths, starts, ends, max_gap=0, max_block=None, sort=Tru | |||
if not isinstance(starts, list): | |||
starts = [starts] * len(paths) | |||
if not isinstance(ends, list): | |||
ends = [starts] * len(paths) | |||
ends = [ends] * len(paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are questions to educate myself.
@@ -500,7 +552,7 @@ def merge_offset_ranges(paths, starts, ends, max_gap=0, max_block=None, sort=Tru | |||
if not isinstance(starts, list): | |||
starts = [starts] * len(paths) | |||
if not isinstance(ends, list): | |||
ends = [starts] * len(paths) | |||
ends = [ends] * len(paths) | |||
if len(starts) != len(paths) or len(ends) != len(paths): | |||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bug, shouldn't we actually instatiate the error, and ideally print an error message about why it's thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A message would be a reasonable thing to add here, but it's not necessary to instantiate if there is no message - python will do it automatically.
>>> raise ValueError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed, the typing in this PR is actually well/properly done. @martindurant since you wanted someone with more typing experience to take a look, this is actually pretty done well and except for minor comments all the typing annotations look correct.
fsspec/utils.py
Outdated
@@ -266,11 +295,11 @@ def read_block(f, offset, length, delimiter=None, split_before=False): | |||
length = end - start | |||
|
|||
f.seek(offset) | |||
b = f.read(length) | |||
b = f.read(length) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like a typing error here. When can length ever be None and not cause an error (unless delimiter is not None). Instead of adding a type ignore comment, do an assert length is not None
above it (unless f.read can actually take None and it's just a type stubbing problem?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even go further and raise a ValueError if length is None and delimiter is None that they cannot both be None at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an assert
here, but I wonder if we should to what I originally suggested of:
if length is None:
b = f.read()
else:
b = f.read(length)
The docs say that if length is None
we want to read until the end of the string, which I believe this will.
unless f.read can actually take None and it's just a type stubbing problem?
Arguably it is? I did just test with an actual file and confirm that open(file).read(None)
gives reasonable results.
Thanks for having a look, @Skylion007 . @alanhdu , do you expect to have time to respond and fix the conflicts that have appeared since you submitted this? |
This work has gone stale - is there any chance of reviving it? |
Co-authored-by: Martin Durant <[email protected]>
@martindurant @Skylion007 Sorry for the delay here. I rebased and things should look be ok (modulo the discussion on #1284 (comment)). I'm ok adding the |
One type error outstanding
|
Fixed! |
This add typehints to
fsspec.caching
andfsspec.utils
(as best as I can understand them), and then turns oncheck_untyped_defs
for them as well (so that every function in those files will be type-checked, note just those with type annotations).