The SemaphoreBackend now uniformly enforces the limit of concurrent
backend operations. In addition, it unifies the parameter validation.
The List() methods no longer uses a semaphore. Restic already never runs
multiple list operations in parallel.
By managing the semaphore in a wrapper backend, the sections that hold a
semaphore token grow slightly. However, the main bottleneck is IO, so
this shouldn't make much of a difference.
The key insight that enables the SemaphoreBackend is that all of the
complex semaphore handling in `openReader()` still happens within the
original call to `Load()`. Thus, getting and releasing the semaphore
tokens can be refactored to happen directly in `Load()`. This eliminates
the need for wrapping the reader in `openReader()` to release the token.
The Test method was only used in exactly one place, namely when trying
to create a new repository it was used to check whether a config file
already exists.
Use a combination of Stat() and IsNotExist() instead.
... called backend/sema. I resisted the temptation to call the main
type sema.Phore. Also, semaphores are now passed by value to skip a
level of indirection when using them.
For files below 256MB this uses the md5 hash calculated while assembling
the pack file. For larger files the hash for each 100MB part is
calculated on the fly. That hash is also reused as temporary filename.
As restic only uploads encrypted data which includes among others a
random initialization vector, the file hash shouldn't be susceptible to
md5 collision attacks (even though the algorithm is broken).
This enables the backends to request the calculation of a
backend-specific hash. For the currently supported backends this will
always be MD5. The hash calculation happens as early as possible, for
pack files this is during assembly of the pack file. That way the hash
would even capture corruptions of the temporary pack file on disk.
The azureAdapter was used directly without a pointer, but the Len()
method was only defined with a pointer receiver (which means Len() is
not present on a azureAdapter{}, only on a pointer to it).
Bugs in the error handling while uploading a file to the backend could
cause incomplete files, e.g. https://github.com/golang/go/issues/42400
which could affect the local backend.
Proactively add sanity checks which will treat an upload as failed if
the reported upload size does not match the actual file size.
As mentioned in issue [#1560](https://github.com/restic/restic/pull/1560#issuecomment-364689346)
this changes the signature for `backend.Save()`. It now takes a
parameter of interface type `RewindReader`, so that the backend
implementations or our `RetryBackend` middleware can reset the reader to
the beginning and then retry an upload operation.
The `RewindReader` interface also provides a `Length()` method, which is
used in the backend to get the size of the data to be saved. This
removes several ugly hacks we had to do to pull the size back out of the
`io.Reader` passed to `Save()` before. In the `s3` and `rest` backend
this is actively used.
Before, all backend implementations were required to return an error if
the file that is to be written already exists in the backend. For most
backends, that means making a request (e.g. via HTTP) and returning an
error when the file already exists.
This is not accurate, the file could have been created between the HTTP
request testing for it, and when writing starts. In addition, apart from
the `config` file in the repo, all other file names have pseudo-random
names with a very very low probability of a collision. And even if a
file name is written again, the way the restic repo is structured this
just means that the same content is placed there again. Which is not a
problem, just not very efficient.
So, this commit relaxes the requirement to return an error when the file
in the backend already exists, which allows reducing the number of API
requests and thereby the latency for remote backends.