Skip to content

Support for sharing state between pathlib subclasses #100479

Closed
@barneygale

Description

@barneygale

Feature or enhancement

This enhancement proposes that we allow state to be shared between related instances of subclasses of pathlib.PurePath and pathlib.Path.

Pitch

Now that #68320 is resolved, users can subclass pathlib.PurePath and pathlib.Path directly:

import pathlib

class MyPath(pathlib.Path):
    def my_custom_method(self):
        pass

etc = MyPath('/etc')
etc_hosts = etc / 'hosts'
etc_hosts.my_custom_method()

However, some user implementations of classes - such as TarPath or S3Path - would require underlying tarfile.TarFile or botocore.Resource objects to be shared between path objects (etc and etc_hosts in the example above). Such sharing of resources is presently rather difficult, as there's no single instance method used to derive new path objects.

This feature request proposes that we add a new PurePath.makepath() method, which is called whenever one path object is derived from another, such as in joinpath(), iterdir(), etc. The default implementation of this method looks something like:

def makepath(self, *args):
    return type(self)(*args)

Users may redefine this method in a subclass, in conjunction with a customized initializer:

class SessionPath(pathlib.Path):
    def __init__(self, *args, session_id):
        super().__init__(*args)
        self.session_id = session_id

    def makepath(self, *args):
        return type(self)(*args, session_id=self.session_id)

etc = SessionPath('/etc', session_id=42)
etc_hosts = etc / 'hosts'
print(etc_hosts.session_id)  # 42

I propose the name "makepath" for this method due to its close relationship with the existing "joinpath" method: a.joinpath(b) == a.makepath(a, b).

Performance

edit: this change has been taken care of elsewhere, and so implementing this feature request should no longer have much affect on performance.

This change will affect the performance of some pathlib operations, because it requires us to remove the _from_parsed_parts() constructor, which is an internal optimization used in cases where path parsing and normalization can be skipped (for example, in the parents sequence). I suggest that, within the standard library, pathlib is not a particularly performance-sensitive module -- few folks reach to pathlib for reason of speed. Within pathlib itself, the savings from optimizing these "pure" methods are usually drowned out by the I/O costs of "impure" methods. With the appeal of this feature in mind, I believe the performance cost is justified.

However, if the performance degradation is considered unacceptable, there's a possible alternative: add a normalize keyword argument to the path initializer and to makepath(). This would require some serious internal surgery to make work, and might be difficult to communicate to users, so at this stage it's not my preferred route forward.

Previous discussion

https://discuss.python.org/t/make-pathlib-extensible/3428/47 (and replies)

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions