Description
Feature or enhancement
This enhancement proposes that we allow state to be shared between related instances of subclasses of pathlib.PurePath
and pathlib.Path
.
Pitch
Now that #68320 is resolved, users can subclass pathlib.PurePath
and pathlib.Path
directly:
import pathlib
class MyPath(pathlib.Path):
def my_custom_method(self):
pass
etc = MyPath('/etc')
etc_hosts = etc / 'hosts'
etc_hosts.my_custom_method()
However, some user implementations of classes - such as TarPath
or S3Path
- would require underlying tarfile.TarFile
or botocore.Resource
objects to be shared between path objects (etc and etc_hosts in the example above). Such sharing of resources is presently rather difficult, as there's no single instance method used to derive new path objects.
This feature request proposes that we add a new PurePath.makepath()
method, which is called whenever one path object is derived from another, such as in joinpath()
, iterdir()
, etc. The default implementation of this method looks something like:
def makepath(self, *args):
return type(self)(*args)
Users may redefine this method in a subclass, in conjunction with a customized initializer:
class SessionPath(pathlib.Path):
def __init__(self, *args, session_id):
super().__init__(*args)
self.session_id = session_id
def makepath(self, *args):
return type(self)(*args, session_id=self.session_id)
etc = SessionPath('/etc', session_id=42)
etc_hosts = etc / 'hosts'
print(etc_hosts.session_id) # 42
I propose the name "makepath" for this method due to its close relationship with the existing "joinpath" method: a.joinpath(b) == a.makepath(a, b)
.
Performance
edit: this change has been taken care of elsewhere, and so implementing this feature request should no longer have much affect on performance.
This change will affect the performance of some pathlib operations, because it requires us to remove the _from_parsed_parts()
constructor, which is an internal optimization used in cases where path parsing and normalization can be skipped (for example, in the parents
sequence). I suggest that, within the standard library, pathlib is not a particularly performance-sensitive module -- few folks reach to pathlib for reason of speed. Within pathlib itself, the savings from optimizing these "pure" methods are usually drowned out by the I/O costs of "impure" methods. With the appeal of this feature in mind, I believe the performance cost is justified.
However, if the performance degradation is considered unacceptable, there's a possible alternative: add a normalize keyword argument to the path initializer and to makepath()
. This would require some serious internal surgery to make work, and might be difficult to communicate to users, so at this stage it's not my preferred route forward.
Previous discussion
https://discuss.python.org/t/make-pathlib-extensible/3428/47 (and replies)