Skip to content

stat.ino() is assumed to be small #1817

Closed
@EliahKagan

Description

@EliahKagan

Current behavior 😯

gix_index::entry::Stat::from_fs obtains information from a stat structure on like this on all Unix-like systems:

#[cfg(not(windows))]
let res = {
Stat {
mtime: mtime.try_into().unwrap_or_default(),
ctime: ctime.try_into().unwrap_or_default(),
// truncating to 32 bits is fine here because
// that's what the linux syscalls returns
// just rust upcasts to 64 bits for some reason?
// numbers this large are impractical anyway (that's a lot of hard-drives).
dev: stat.dev() as u32,
ino: stat.ino() as u32,
uid: stat.uid(),
gid: stat.gid(),
// truncation to 32 bits is on purpose (git does the same).
size: stat.len() as u32,
}
};

But even if that the rationale holds for Linux, it is not clear that the device number always fits in a 32-bit integer without loss of information, because enormous device numbers do exist on some notable Unix-like operating systems with some filesystems. It is furthermore not accurate that such large numbers are impractical, because devices need not be assigned sequentially. For example, this forum discussion pertains to huge device numbers for ZFS volumes on FreeBSD.

In addition, I am not confident this holds on Linux, since even on the same architecture there is more than one stat-related syscall. This reference of syscalls gives stat as an example of an entry point with multiple syscalls on the same platforms that use data types with different sizes.

Expected behavior 🤔

I don't know if this actually causes a problem. This issue is thus incomplete in that I am not sure what happens with such devices. Maybe the truncation is okay? But it is not obvious why it would be.

However, even if the implementation is not changed, the comments should be revised, once the effect is known, to avoid implying that device numbers must be sequential or that the field fits in 32-bits on all systems, and to explain either:

  • why is okay to cast it to u32, or
  • what kinds of limitations or problems may arise.

Git behavior

It looks like Git is similar. On most systems, unsigned int is 32-bit, even if unsigned long is 64-bit, and Git defines:

struct stat_data {
	struct cache_time sd_ctime;
	struct cache_time sd_mtime;
	unsigned int sd_dev;
	unsigned int sd_ino;
	unsigned int sd_uid;
	unsigned int sd_gid;
	unsigned int sd_size;
};

Which is populated like this:

void fill_stat_data(struct stat_data *sd, struct stat *st)
{
	sd->sd_ctime.sec = (unsigned int)st->st_ctime;
	sd->sd_mtime.sec = (unsigned int)st->st_mtime;
	sd->sd_ctime.nsec = ST_CTIME_NSEC(*st);
	sd->sd_mtime.nsec = ST_MTIME_NSEC(*st);
	sd->sd_dev = st->st_dev;
	sd->sd_ino = st->st_ino;
	sd->sd_uid = st->st_uid;
	sd->sd_gid = st->st_gid;
	sd->sd_size = munge_st_size(st->st_size);
}

It is not clear that this is intended in Git either. Other lossy conversions are identified as such.

Steps to reproduce 🕹

I don't currently have a procedure to attempt to produce any incorrect runtime behavior based on this. I am unsure if it is only the comment that should be corrected, or if the representation and/or runtime behavior should also be changed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acknowledgedan issue is accepted as shortcoming to be fixedhelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions