Description
I wanted to get feedback about an idea based on @martindurant's proposal in #41.
Here is his original comment:
Like open/open_files, which find the file-system of interest and use it with parameters, could implement a generic top-level file-system which finds the correct instance and use it depending on the protocol implicit in the path given. This would become the primary user-facing API.
This can also be the place to implement globs or recursive where multiple files make sense.
Should allow copy/move/(sync?) between file-systems.
In short, I am thinking a relatively simple generic fsspect filesystem could be created based on UPath by @andrewfulton9 with @Quansight.
This might provide a clean approach to managing storage options for inter-filesystem operations, since each path object could be instantiated with appropriate options, which I think might be a more straightforward approach than passing around lots of *_kwargs
style dicts (not to say that should or should not also be supported). This might make it easier to then implement requests like #588, which currently seems to be somewhat challenging, as #723 is attempting to do.
Since UPath inherits from pathlib
, it is also conceivable that this generic filesystem could at some point optionally work as a replacement for the current local filesystem implementation.
Another possible benefit is that this might provide a simple way to integrate a base test suite as suggested by @TomAugspurger in #650, see UPath's BaseTests. I think running a base set of tests like these, although possibly reorganized a bit as in #651, directly on the upstream file systems could go a long way to achieving consistency and compatibility across the various implementations.
It might also be worth mentioning the possibility of making this async compatible at some point, maybe with something like aiopath or aiofiles, although I have not tested or used these.
There are a couple of current issues that may be a bit of a hindrance, although I am confident these issues are fixable. To my knowledge (based on very limited and simple testing), it seems that UPath currently does not support chained URLs. Also, some filesystems may not yet work properly without some effort.
Currently, UPath includes some subclassed implementations based on the core implementation. I wonder if it would be possible to reduce UPath to a single implementation, perhaps with tighter integration between the projects? I suspect that any generic top-level file-system would face similar issues and find opportunities to make upstream adjustments to ensure compatibility.
For some additional background, the discussion related to UPath in #434 is worth noting here (which I will quote from slightly out of context), in particular this comment by @martindurant:
Since it needs no further dependencies, it might well be hosted within fsspec
And this subsequent comment by @andrewfulton9:
I am definitely open to merging it into fsspec
I'm eager for any thoughts, issues, concerns, etc about this idea. If there is interest, and time permitting, I'd be willing to take a swing at an initial POC of the generic filesystem, although that should by no means dissuade anybody who has interest in making such an attempt.
Finally, thanks to all that have contributed or supported these awesome projects, which make my job easier and more fun.