This commit changes the signatures for repository.LoadAndDecrypt and
utils.LoadAll to allow passing in a []byte as the buffer to use. This
buffer is enlarged as needed, and returned back to the caller for
further use.
In later commits, this allows reducing allocations by reusing a buffer
for multiple calls, e.g. in a worker function.
As mentioned in issue [#1560](https://github.com/restic/restic/pull/1560#issuecomment-364689346)
this changes the signature for `backend.Save()`. It now takes a
parameter of interface type `RewindReader`, so that the backend
implementations or our `RetryBackend` middleware can reset the reader to
the beginning and then retry an upload operation.
The `RewindReader` interface also provides a `Length()` method, which is
used in the backend to get the size of the data to be saved. This
removes several ugly hacks we had to do to pull the size back out of the
`io.Reader` passed to `Save()` before. In the `s3` and `rest` backend
this is actively used.
The logging in these functions double the time they take to execute.
However, it is only really useful on failures, which are better
reported by the calling functions.
benchmark old ns/op new ns/op delta
BenchmarkMasterIndexLookupSingleIndex-6 897 395 -55.96%
BenchmarkMasterIndexLookupMultipleIndex-6 2001 1090 -45.53%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 492 215 -56.30%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 1649 912 -44.69%
benchmark old allocs new allocs delta
BenchmarkMasterIndexLookupSingleIndex-6 9 1 -88.89%
BenchmarkMasterIndexLookupMultipleIndex-6 19 1 -94.74%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 6 0 -100.00%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 16 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkMasterIndexLookupSingleIndex-6 160 96 -40.00%
BenchmarkMasterIndexLookupMultipleIndex-6 240 96 -60.00%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 48 0 -100.00%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 128 0 -100.00%
When looking up a blob in the master index, with several
indexes present in the master index, a significant amount of time
is spent generating errors for each failed lookup. However, these
errors are often used to check if a blob is present, but the contents
are not inspected making the overhead of the error not useful.
Instead, change Index.Lookup (and Index.LookupSize) to instead return
a boolean denoting if the blob was found instead of an error. Also change
all the calls to these functions to handle the new function signature.
benchmark old ns/op new ns/op delta
BenchmarkMasterIndexLookupSingleIndex-6 820 897 +9.39%
BenchmarkMasterIndexLookupMultipleIndex-6 12821 2001 -84.39%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 5378 492 -90.85%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 17026 1649 -90.31%
benchmark old allocs new allocs delta
BenchmarkMasterIndexLookupSingleIndex-6 9 9 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 59 19 -67.80%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 22 6 -72.73%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 72 16 -77.78%
benchmark old bytes new bytes delta
BenchmarkMasterIndexLookupSingleIndex-6 160 160 +0.00%
BenchmarkMasterIndexLookupMultipleIndex-6 3200 240 -92.50%
BenchmarkMasterIndexLookupSingleIndexUnknown-6 1232 48 -96.10%
BenchmarkMasterIndexLookupMultipleIndexUnknown-6 4272 128 -97.00%
When setting up the index used for benchmarking, use math/rand instead of
crypto/rand since the generated ids don't need to be evenly distributed,
and not be secure against guessing. As such, use a different random id
function (only available during tests) that uses math/rand instead.
This reduces the chance of duplicate blobs, otherwise the tests fail
(make the contents of a blob depend on a pseudo-random number instead of
the size, sizes may be duplicate).
When backing up several million files (>14M tested here) with few changes,
a large amount of time is spent failing to find an id in an index and creating
an error to signify this. Since this is checked using the Has method,
which doesn't use this error, this time creating the error is wasted.
Instead, directly check if the given id and type are present in the index.
This also avoids reporting all the packs containing this blob, further
reducing cpu usage.