-
Notifications
You must be signed in to change notification settings - Fork 382
Update dirfs.py: propagate transaction context to underlying filesystem #1824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
DirFileSystem now delegates its `transaction` and `transaction_type` to the wrapped filesystem, ensuring that `with fs.transaction:` on a `dir://` instance correctly toggles the underlying `file://` transaction. This restores atomic write semantics when using the directory wrapper. Closes fsspec#1823
Can we please have a test to show that this works? |
This test verifies that the fix for issue fsspec#1823 works correctly by ensuring that DirFileSystem properly propagates its transaction context to the underlying filesystem. When a transaction is started on a DirFileSystem, both the wrapper and the wrapped filesystem should have their _intrans flags set, allowing writes to be deferred until the transaction is committed. Test confirms that: - Both filesystems have _intrans=True inside transaction - Files do not appear on disk until transaction commits - Files appear correctly after transaction commit
This test identifies that CachingFileSystem has an issue similar to the one fixed for DirFileSystem in fsspec#1824. When a transaction is started on a CachingFileSystem, the transaction context is not propagated to the underlying filesystem, causing files to be written immediately rather than deferred. Test demonstrates that: - CachingFileSystem sets _intrans=True inside transaction - But wrapped filesystem remains with _intrans=False - Files are written immediately during transaction instead of being deferred This test documents the current behavior and could help guide a future fix.
Add tests for transaction context propagation in filesystem wrappers This PR adds tests related to issue #1823 and its fix in PR #1824:
These tests both validate the current fix and identify areas for potential |
…o base class def start_transaction(self): def end_transaction(self): def invalidate_cache(self, path=None):
I updated it in dirfs.py but really it also needs to be in cached.py and seeing that its in two classes there needs to be actually a base class for implementations that wrap existing file systems. Currently they inherit from AbstractFileSystem or AsynFileSystem but some of them declare fs in the init method and others don't... Any class that declares fs is a wrapper class and might need to inherit from the same class or interface so that wrappers can forward to implementations |
You are probably right, and we can later extract some of this common functionality for "wrapper" classes. Whether or not they delegate to the wrapped class will vary, though: write transactions for simplecache, for example, should write to local before committing, rather than depending on the remote storage's commit mechanism. |
|
||
# Check if file exists on disk - bug: it will exist immediately since | ||
# transaction context was not propagated | ||
assert os.path.exists(test_file), "File exists during transaction - bug confirmed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The desired behaviour is not exists()
, right? Perhaps make the assert what we want it to be, and mark the test as xfail
until it is fixed.
DirFileSystem now delegates its
transaction
andtransaction_type
to the wrapped filesystem, ensuring thatwith fs.transaction:
on adir://
instance correctly toggles the underlyingfile://
transaction. This restores atomic write semantics when using the directory wrapper.Closes #1823 - the ticket also contains code to test the fix