Skip to content

Commit 809fe33

Browse files
Guillermo PérezGuillermo Pérez
Guillermo Pérez
authored and
Guillermo Pérez
committed
Make pygit2 throw if tree of a commit is not found
Commit objects in git always have a tree_id associated, that points to the corresponding Tree object. When the tree object is missing, the repo is corrupted. In those cases: * official git cli fatals with status code 128 and message: fatal: unable to read tree <hash> * libgit2 returns error GIT_ENOTFOUND when calling git_commit_tree() * pygit2 when accessing the tree by id with repo[commit.tree_id] raises a KeyError: <hash> But on the other hand, on the commit object, rather than throwing and exception, pygit2 is swallowing the error returned by libgit2 and setting the <Commit object>.tree property to None. None is arguable the wrong choice to encode an error condition, specially in python that is used heavily. In particular this caused in our system to assume there was an empty tree, and the sync service that tails git repo changes decided to DELETE everything. The code was using None to represent empty tree, usefull for example when we need to compare a path between two commits (the path might be non-existant at one of the commits you are comparing). I think that in this case the right decision would be to raise since is an exceptional case, caused by a corrupted repo, is more consistent with other tools, and ensures user code does not take the wrong decissions. For curiosity the corrupted repository can happen more commonly than expected. We run our repositories on a shared NFS filer, and one of our servers didn't have the lookupcache=positive flag. This makes NFS cache the metadata (files on a directory for example) and use that for negative lookups (to deny existance of files). In this case, the commit object was on a directory not cached, so the commit was seen immediately, but the tree object was in a folder that was cached, the cache didn't contained the tree object, and thus for some seconds the tree was not existing and the repo was corrupted. Our sync service saw tree being None and decided to delete everything, causing a lot of issues down the way.
1 parent 8029765 commit 809fe33

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

src/commit.c

+6-2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
#include "oid.h"
3636

3737
extern PyTypeObject TreeType;
38+
extern PyObject *GitError;
3839

3940

4041
PyDoc_STRVAR(Commit_message_encoding__doc__, "Message encoding.");
@@ -131,8 +132,11 @@ Commit_tree__get__(Commit *commit)
131132
int err;
132133

133134
err = git_commit_tree(&tree, commit->commit);
134-
if (err == GIT_ENOTFOUND)
135-
Py_RETURN_NONE;
135+
if (err == GIT_ENOTFOUND) {
136+
char tree_id[GIT_OID_HEXSZ + 1] = { 0 };
137+
git_oid_fmt(tree_id, git_commit_tree_id(commit->commit));
138+
return PyErr_Format(GitError, "Unable to read tree %s", tree_id);
139+
}
136140

137141
if (err < 0)
138142
return Error_set(err);

0 commit comments

Comments
 (0)