Description
Context: for DeepSpeech, we perform tensorflow builds and then keep the cache in a tar (capturing the whole of the home directory of the build user). We then untar it and the deepspeech build through bazel build
picks the proper cached items so it does not rebuild anything.
Recently, we started to have increased (2.5x) build time on CUDA-enabled builds. Debugging with Bazel showed that it was rebuilding because the actionKey
computed for stream_executor_impl
was different. Instrumenting Bazel to get more informations, I could get down to the reason of the different actionKey: the ordering of the CUDA includes was different. The list itself contained the exact same content, just a different ordering.
Those includes are symlinks, and they are generated from a genrule. This is all taken care of by
tensorflow/third_party/gpus/cuda_configure.bzl
Lines 915 to 1035 in ba64f53
Checking more carefully, one will see that the headers are discovered by _read_dir
function:
tensorflow/third_party/gpus/cuda_configure.bzl
Lines 891 to 894 in ba64f53
find
. This is dependant on the ordering provided by readdir
syscall.
In our case, the ordering on the filesystem before making the tar archive, and after untarring it would be different.
One simple fix for that is to force ordering the list of headers, this way we are sure the order is always the same and we are not dependant on what readdir
is going to get us.
In the past, Bazel would force the ordering of the elements considered to compute the actionKey. This was removed with 0.3.0 but it might have make the issue hidden bazelbuild/bazel@9dc3211