Skip to content

finalize_entry uses path and FD that may diverge, could use just FD #1786

Closed
@EliahKagan

Description

@EliahKagan

Current behavior 😯

When finishing checking out a file, we do this:

// For possibly existing, overwritten files, we must change the file mode explicitly.
#[cfg(unix)]
if let Some(path) = set_executable_after_creation {
let old_perm = std::fs::symlink_metadata(path)?.permissions();
if let Some(new_perm) = set_mode_executable(old_perm) {
std::fs::set_permissions(path, new_perm)?;
}
}
// NOTE: we don't call `file.sync_all()` here knowing that some filesystems don't handle this well.
// revisit this once there is a bug to fix.
entry.stat = Stat::from_fs(&gix_index::fs::Metadata::from_file(&file)?)?;
file.close()?;

This has both the path and an open file object, the latter of which on a Unix-like system has an associated open file descriptor. The path is used to set permissions on the file when it is being made executable (and when this was not done eagerly when creating the file anew). But the File object is used to access metadata and update the index. The metadata it accesses does not include the permissions, so this does not inherently produce any inconsistencies (and the commented note is not likely related to permissions). But this combination is nonetheless unintuitive, especially if the file is moved or replaced:

  • We set permissions on a path that may, in rare cases, be the path to a different file that has replaced this one. See be0abaf (#1764) and d674a43 (here, #1779).
  • The index gets information from the open file descriptor, which if I understand correctly applies to the original open file even it is no longer where it was originally and even if something else is there now.

As noted in #1783 ("Expected behavior", 5(iii)), we should be able to set the permissions through the open file descriptor (from the File object) rather than using the path:

File permissions on a Unix-like system can be set through an open file descriptor, without using the path. (If an OS is not Unix-like then we are not setting +x anyway.) POSIX standardizes fchmod for this purpose.

I believe gix_index::fs, used there, already depends on rustix where cfg!(unix). One way to call fchmod is through rustix::fs::fchmod. (It does not support file descriptors opened with the O_PATH flag, but I believe that is inapplicable here, because that's only useful when the file contents are not to be accessed in any way, whereas in contrast the file descriptor associated with our File object represents a file that is open in the full traditional sense and that has, presumably, just been written to.)

Expected behavior 🤔

I am not sure what the expected behavior is, because I don't know if there is a specific reason for timestamp and size metadata to be read through the file descriptor that was presumably used for writing, while using the original path to set permissions instead of using fchmod.

  • If this distinction supports valuable semantics, then it should be kept, and the only bug is that nothing explains that this is intentional. A comment would fix it. Depending on the reason and how important it is, a documentation comment on a function might be warranted.
  • If this distinction does not support valuable semantics, then fchmod (possibly through a higher-level library like rustix) should be used. Unless there is a separate reason not to do that.

This may also eventually no longer be relevant: Setting permissions might stop being needed at all if files that don't have the target permissions are replaced instead of having their permissions changed.

Git behavior

I'm not sure, in full. However, as noted in #1783 and #1784, Git replaces files during checkout (deleting/unlinking them and creating a new file of the same), rather than reusing them (keeping them but overwriting their contents), in more circumstances than gitoxide does. So I wouldn't expect the choice between chmod and fchmod to arise as often in the Git checkout implementation.

Steps to reproduce 🕹

For the aspect of this that is about it being unclear why one operation uses the open file and the other uses the path, see above links.

For the aspect that speculates that this could worsen some race conditions, I don't have steps to reproduce that. I don't know how much of a problem it is and, as noted above, it may be that the current code is correct except for being under-documented with respect to why this disparity is actually useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acknowledgedan issue is accepted as shortcoming to be fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions