Skip to content

Add typehints for fsspec.caching and fsspec.utils #1284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 20, 2023

Conversation

alanhdu
Copy link
Contributor

@alanhdu alanhdu commented Jun 1, 2023

This add typehints to fsspec.caching and fsspec.utils (as best as I can understand them), and then turns on check_untyped_defs for them as well (so that every function in those files will be type-checked, note just those with type annotations).

fsspec/utils.py Outdated
Comment on lines 298 to 301
if length is not None:
b = f.read(length)
else:
b = f.read()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we needed to do this (otherwise if delimiter is None and length is None I'm pretty sure this crashes), but happy to revert this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not change the logic in this PR. I am not immediately sure about what you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- can revert and add a type: ignore instead.

@@ -500,7 +554,7 @@ def merge_offset_ranges(paths, starts, ends, max_gap=0, max_block=None, sort=Tru
if not isinstance(starts, list):
starts = [starts] * len(paths)
if not isinstance(ends, list):
ends = [starts] * len(paths)
ends = [ends] * len(paths)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was just a bug? I can't figure out what this could would do otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly looks like a bug to me. That said, I don't think starts or ends ever correspond to anything other than list anyway. Perhaps we can just drop the int support since the result would have been "wrong" before anyway?

@martindurant
Copy link
Member

I am waiting on this one hoping that anyone with more types experience than me can have a look. I don't see any glaring problems.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are questions to educate myself.

fsspec/utils.py Outdated
Comment on lines 298 to 301
if length is not None:
b = f.read(length)
else:
b = f.read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not change the logic in this PR. I am not immediately sure about what you suggest.

@@ -482,11 +530,18 @@ def wrapper(cls):


@contextmanager
def nullcontext(obj):
def nullcontext(obj: T) -> Iterator[T]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generator before applying the decorator, but not a generator afterwards (it becomes a context). So, I'm not sure which state should be declared here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a little confusing. This is the correct because contextmanager has a type annotation that turns the iterator into a context manager for you.

@@ -500,7 +554,7 @@ def merge_offset_ranges(paths, starts, ends, max_gap=0, max_block=None, sort=Tru
if not isinstance(starts, list):
starts = [starts] * len(paths)
if not isinstance(ends, list):
ends = [starts] * len(paths)
ends = [ends] * len(paths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these are questions to educate myself.

@@ -500,7 +552,7 @@ def merge_offset_ranges(paths, starts, ends, max_gap=0, max_block=None, sort=Tru
if not isinstance(starts, list):
starts = [starts] * len(paths)
if not isinstance(ends, list):
ends = [starts] * len(paths)
ends = [ends] * len(paths)
if len(starts) != len(paths) or len(ends) != len(paths):
raise ValueError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bug, shouldn't we actually instatiate the error, and ideally print an error message about why it's thrown?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A message would be a reasonable thing to add here, but it's not necessary to instantiate if there is no message - python will do it automatically.

>>> raise ValueError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError

Copy link
Contributor

@Skylion007 Skylion007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm impressed, the typing in this PR is actually well/properly done. @martindurant since you wanted someone with more typing experience to take a look, this is actually pretty done well and except for minor comments all the typing annotations look correct.

fsspec/utils.py Outdated
@@ -266,11 +295,11 @@ def read_block(f, offset, length, delimiter=None, split_before=False):
length = end - start

f.seek(offset)
b = f.read(length)
b = f.read(length) # type: ignore[arg-type]
Copy link
Contributor

@Skylion007 Skylion007 Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a typing error here. When can length ever be None and not cause an error (unless delimiter is not None). Instead of adding a type ignore comment, do an assert length is not None above it (unless f.read can actually take None and it's just a type stubbing problem?)

Copy link
Contributor

@Skylion007 Skylion007 Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even go further and raise a ValueError if length is None and delimiter is None that they cannot both be None at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an assert here, but I wonder if we should to what I originally suggested of:

if length is None:
    b = f.read()
else:
    b = f.read(length)

The docs say that if length is None we want to read until the end of the string, which I believe this will.

unless f.read can actually take None and it's just a type stubbing problem?

Arguably it is? I did just test with an actual file and confirm that open(file).read(None) gives reasonable results.

@martindurant
Copy link
Member

Thanks for having a look, @Skylion007 . @alanhdu , do you expect to have time to respond and fix the conflicts that have appeared since you submitted this?

@martindurant
Copy link
Member

This work has gone stale - is there any chance of reviving it?

@alanhdu
Copy link
Contributor Author

alanhdu commented Oct 19, 2023

@martindurant @Skylion007 Sorry for the delay here. I rebased and things should look be ok (modulo the discussion on #1284 (comment)).

I'm ok adding the assert (or raise a more informative ValueError), but we could keep it as # type: ignore if we want to guarantee no change in the runtime behavior.

@martindurant
Copy link
Member

One type error outstanding

fsspec/caching.py:101: error: Incompatible return value type (got "bytearray", expected "mmap")  [return-value]

@alanhdu
Copy link
Contributor Author

alanhdu commented Oct 20, 2023

Fixed!

@martindurant martindurant merged commit dbad63f into fsspec:master Oct 20, 2023
@alanhdu alanhdu deleted the alan/mypy branch October 20, 2023 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants