When setting up the index used for benchmarking, use math/rand instead of
crypto/rand since the generated ids don't need to be evenly distributed,
and not be secure against guessing. As such, use a different random id
function (only available during tests) that uses math/rand instead.
This reduces the chance of duplicate blobs, otherwise the tests fail
(make the contents of a blob depend on a pseudo-random number instead of
the size, sizes may be duplicate).
When backing up several million files (>14M tested here) with few changes,
a large amount of time is spent failing to find an id in an index and creating
an error to signify this. Since this is checked using the Has method,
which doesn't use this error, this time creating the error is wasted.
Instead, directly check if the given id and type are present in the index.
This also avoids reporting all the packs containing this blob, further
reducing cpu usage.
This commits adds rudimentary support for a cache directory, enabled by
default. The cache directory is created if it does not exist. The cache
is used if there's anything in it, newly created snapshot and index
files are written to the cache automatically.
The code now bundles tree blobs and data blobs into different pack
files, so we'll end up with pack files that either only contain data or
trees. This is in preparation to adding a cache (#1040), because
tree-only pack files can easily be cached later on.