Skip to content

Update dirfs.py: propagate transaction context to underlying filesystem #1824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

patrickwolf
Copy link

@patrickwolf patrickwolf commented Apr 23, 2025

DirFileSystem now delegates its transaction and transaction_type to the wrapped filesystem, ensuring that with fs.transaction: on a dir:// instance correctly toggles the underlying file:// transaction. This restores atomic write semantics when using the directory wrapper.

Closes #1823 - the ticket also contains code to test the fix

DirFileSystem now delegates its `transaction` and `transaction_type` to the wrapped filesystem, ensuring that `with fs.transaction:` on a `dir://` instance correctly toggles the underlying `file://` transaction. This restores atomic write semantics when using the directory wrapper.

Closes fsspec#1823
@martindurant
Copy link
Member

Can we please have a test to show that this works?

This test verifies that the fix for issue fsspec#1823 works correctly by ensuring
that DirFileSystem properly propagates its transaction context to the underlying
filesystem. When a transaction is started on a DirFileSystem, both the wrapper
and the wrapped filesystem should have their _intrans flags set, allowing writes
to be deferred until the transaction is committed.

Test confirms that:
- Both filesystems have _intrans=True inside transaction
- Files do not appear on disk until transaction commits
- Files appear correctly after transaction commit
This test identifies that CachingFileSystem has an issue similar to the one fixed for DirFileSystem in fsspec#1824. When a transaction is started on a CachingFileSystem, the transaction context is not propagated to the underlying filesystem, causing files to be written immediately rather than deferred.

Test demonstrates that:
- CachingFileSystem sets _intrans=True inside transaction
- But wrapped filesystem remains with _intrans=False
- Files are written immediately during transaction instead of being deferred

This test documents the current behavior and could help guide a future fix.
@patrickwolf
Copy link
Author

Add tests for transaction context propagation in filesystem wrappers

This PR adds tests related to issue #1823 and its fix in PR #1824:

  1. test_dirfs_transaction_propagation: Verifies that DirFileSystem now
    correctly propagates transaction context to its wrapped filesystem,
    ensuring files are properly deferred until commit.

  2. test_cachingfs_transaction_missing_propagation: Documents that
    CachingFileSystem has a similar issue where transaction context is not
    propagated, causing immediate writes instead of deferred ones.

These tests both validate the current fix and identify areas for potential
future improvements in the transaction handling system.

…o base class

def start_transaction(self):
def end_transaction(self):    
def invalidate_cache(self, path=None):
@patrickwolf
Copy link
Author

I updated it in dirfs.py but really it also needs to be in cached.py and seeing that its in two classes there needs to be actually a base class for implementations that wrap existing file systems.

Currently they inherit from AbstractFileSystem or AsynFileSystem but some of them declare fs in the init method and others don't...

Any class that declares fs is a wrapper class and might need to inherit from the same class or interface so that wrappers can forward to implementations

@martindurant
Copy link
Member

You are probably right, and we can later extract some of this common functionality for "wrapper" classes. Whether or not they delegate to the wrapped class will vary, though: write transactions for simplecache, for example, should write to local before committing, rather than depending on the remote storage's commit mechanism.


# Check if file exists on disk - bug: it will exist immediately since
# transaction context was not propagated
assert os.path.exists(test_file), "File exists during transaction - bug confirmed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The desired behaviour is not exists(), right? Perhaps make the assert what we want it to be, and mark the test as xfail until it is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DirFileSystem does not propagate transaction context to underlying filesystem
2 participants