Description
Current behavior 😯
gix_index::entry::Stat::from_fs
obtains information from a stat
structure on like this on all Unix-like systems:
gitoxide/gix-index/src/entry/stat.rs
Lines 94 to 110 in ffb73b5
But even if that the rationale holds for Linux, it is not clear that the device number always fits in a 32-bit integer without loss of information, because enormous device numbers do exist on some notable Unix-like operating systems with some filesystems. It is furthermore not accurate that such large numbers are impractical, because devices need not be assigned sequentially. For example, this forum discussion pertains to huge device numbers for ZFS volumes on FreeBSD.
In addition, I am not confident this holds on Linux, since even on the same architecture there is more than one stat-related syscall. This reference of syscalls gives stat as an example of an entry point with multiple syscalls on the same platforms that use data types with different sizes.
Expected behavior 🤔
I don't know if this actually causes a problem. This issue is thus incomplete in that I am not sure what happens with such devices. Maybe the truncation is okay? But it is not obvious why it would be.
However, even if the implementation is not changed, the comments should be revised, once the effect is known, to avoid implying that device numbers must be sequential or that the field fits in 32-bits on all systems, and to explain either:
- why is okay to cast it to
u32
, or - what kinds of limitations or problems may arise.
Git behavior
It looks like Git is similar. On most systems, unsigned int
is 32-bit, even if unsigned long
is 64-bit, and Git defines:
struct stat_data {
struct cache_time sd_ctime;
struct cache_time sd_mtime;
unsigned int sd_dev;
unsigned int sd_ino;
unsigned int sd_uid;
unsigned int sd_gid;
unsigned int sd_size;
};
Which is populated like this:
void fill_stat_data(struct stat_data *sd, struct stat *st)
{
sd->sd_ctime.sec = (unsigned int)st->st_ctime;
sd->sd_mtime.sec = (unsigned int)st->st_mtime;
sd->sd_ctime.nsec = ST_CTIME_NSEC(*st);
sd->sd_mtime.nsec = ST_MTIME_NSEC(*st);
sd->sd_dev = st->st_dev;
sd->sd_ino = st->st_ino;
sd->sd_uid = st->st_uid;
sd->sd_gid = st->st_gid;
sd->sd_size = munge_st_size(st->st_size);
}
It is not clear that this is intended in Git either. Other lossy conversions are identified as such.
Steps to reproduce 🕹
I don't currently have a procedure to attempt to produce any incorrect runtime behavior based on this. I am unsure if it is only the comment that should be corrected, or if the representation and/or runtime behavior should also be changed.