If our ssh process has died, not only the next, but all subsequent
calls to clientError() should indicate the error.
restic output when the ssh process is killed with "kill -9":
Save(<data/afb68adbf9>) returned error, retrying after 253.661803ms: Write: failed to send packet header: write |1: file already closed
Save(<data/afb68adbf9>) returned error, retrying after 580.752212ms: ssh command exited: signal: killed
Save(<data/afb68adbf9>) returned error, retrying after 790.150468ms: ssh command exited: signal: killed
Save(<data/afb68adbf9>) returned error, retrying after 1.769595051s: ssh command exited: signal: killed
[...]
error in cleanup handler: ssh command exited: signal: killed
Before this patch:
Save(<data/de698d934f>) returned error, retrying after 252.84163ms: Write: failed to send packet header: write |1: file already closed
Save(<data/de698d934f>) returned error, retrying after 660.236963ms: OpenFile: failed to send packet header: write |1: file already closed
Save(<data/de698d934f>) returned error, retrying after 568.049909ms: OpenFile: failed to send packet header: write |1: file already closed
Save(<data/de698d934f>) returned error, retrying after 2.428813824s: OpenFile: failed to send packet header: write |1: file already closed
[...]
error in cleanup handler: failed to send packet header: write |1: file already closed
Previously, the function read from ARGV[1] (hardcoded) rather than the
value passed to it, the command-line argument as it exists in globalOptions.
Resolves #1745
This change removes the hardcoded Google auth mechanism for the GCS
backend, instead using Google's provided client library to discover and
generate credential material.
Google recommend that client libraries use their common auth mechanism
in order to authorise requests against Google services. Doing so means
you automatically support various types of authentication, from the
standard GOOGLE_APPLICATION_CREDENTIALS environment variable to making
use of Google's metadata API if running within Google Container Engine.
Before this change restic would attempt to JSON decode the error
message resulting in confusing `Decode: invalid character 'B' looking
for beginning of value` messages. Afterwards it will return `List
failed, server response: 400 Bad Request (400)`
As mentioned in issue [#1560](https://github.com/restic/restic/pull/1560#issuecomment-364689346)
this changes the signature for `backend.Save()`. It now takes a
parameter of interface type `RewindReader`, so that the backend
implementations or our `RetryBackend` middleware can reset the reader to
the beginning and then retry an upload operation.
The `RewindReader` interface also provides a `Length()` method, which is
used in the backend to get the size of the data to be saved. This
removes several ugly hacks we had to do to pull the size back out of the
`io.Reader` passed to `Save()` before. In the `s3` and `rest` backend
this is actively used.
This is a bug fix: Before, when the worker function fn in List() of the
RetryBackend returned an error, the operation is retried with the next
file. This is not consistent with the documentation, the intention was
that when fn returns an error, this is passed on to the caller and the
List() operation is aborted. Only errors happening on the underlying
backend are retried.
The error leads to restic ignoring exclusive locks that are present in
the repo, so it may happen that a new backup is written which references
data that is going to be removed by a concurrently running `prune`
operation.
The bug was reported by a user here:
https://forum.restic.net/t/restic-backup-returns-0-exit-code-when-already-locked/484
Before, all backend implementations were required to return an error if
the file that is to be written already exists in the backend. For most
backends, that means making a request (e.g. via HTTP) and returning an
error when the file already exists.
This is not accurate, the file could have been created between the HTTP
request testing for it, and when writing starts. In addition, apart from
the `config` file in the repo, all other file names have pseudo-random
names with a very very low probability of a collision. And even if a
file name is written again, the way the restic repo is structured this
just means that the same content is placed there again. Which is not a
problem, just not very efficient.
So, this commit relaxes the requirement to return an error when the file
in the backend already exists, which allows reducing the number of API
requests and thereby the latency for remote backends.
During the development of #1524 I discovered that the Google Cloud
Storage backend did not yet use the HTTP transport, so things such as
bandwidth limiting did not work. This commit does the necessary magic to
make the GS library use our HTTP transport.
This is a follow-up on fb9729fdb9, which
runs the `ssh` in its own process group and selects that process group
as the foreground group. After the sftp connection is established,
restic switches back to the previous foreground process group.
This allows `ssh` to prompt for the password, but it won't receive
the interrupt signal (SIGINT, ^C) later on, because it is not in the
foreground process group any more, allowing a clean tear down.
We added previously a code to fix the issue of chaining
credentials, we do not need this anymore since the
upstream minio-go already has this relevant change.
Before, creating a new repo via REST would use the defaut HTTP client,
which is not a problem unless the server uses HTTPS and a TLS
certificate which isn't signed by a CA in the system's CA store. In this
case, all commands work except the 'init' command, which fails with a
message like "invalid certificate".
chaining failed because chaining provider
was only looking for subsequent credentials
provider after an error. Writer a new
chaining provider which proceeds to fetch
new credentials also under situations where
providers do not return but instead return
no keys at all.
Fixes https://github.com/restic/restic/issues/1422
List().
move comment regarding problematic List() backend api (it's s3's ListObjects
that has a problem, NOT swift's ObjectsWalk).
As per discussion in PR #1399.
This is a fix for the following situation (gh-1188):
List() grabs a semaphore token upon entry, starts a goroutine, and
does not release the token until the routine exits (via a defer).
The goroutine iterates over the results from ListCurrentObjects(),
sending them one at a time to a channel, where they are ultimately
processed by be.Load().
Since be.Load() also needs a token, this will result in deadlock if
b2.connections=1.
This fix changes List() so that the token is only held during the call
to ListCurrentObjects().
- be explicit when discarding returned errors from .Close(), etc.
- remove named return values from funcs when naked return not used
- fix some "err" shadowing when redeclaration not needed
Sometimes s3 listobjects for a directory includes an entry for that
directory. The restic s3 backend doesn't expect that and returns
an error.
Symptom is:
ReadDir: invalid key name restic/key/, removing prefix
restic/key/ yielded empty string
I'm not sure when s3 does that; I'm unable to reproduce it myself.
But in any case, it seems correct to ignore that when it happens.
Fixes #1068
By default, the GCS Go packages have an internal "chunk size" of 8MB,
used for blob uploads.
Media().Do() will buffer a full 8MB from the io.Reader (or less if EOF
is reached) then write that full 8MB to the network all at once.
This behavior does not play nicely with --limit-upload, which only
limits the Reader passed to Media. While the long-term average upload
rate will be correctly limited, the actual network bandwidth will be
very spikey.
e.g., if an 8MB/s connection is limited to 1MB/s, Media().Do() will
spend 8s reading from the rate-limited reader (performing no network
requests), then 1s writing to the network at 8MB/s.
This is bad for network connections hurt by full-speed uploads,
particularly when writing 8MB will take several seconds.
Disable resumable uploads entirely by setting the chunk size to zero.
This causes the io.Reader to be passed further down the request stack,
where there is less (but still some) buffering.
My connection is around 1.5MB/s up, with nominal ~15ms ping times to
8.8.8.8.
Without this change, --limit-upload 1024 results in several seconds of
~200ms ping times (uploading), followed by several seconds of ~15ms ping
times (reading from rate-limited reader). A bandwidth monitor reports
this as several seconds of ~1.5MB/s followed by several seconds of
0.0MB/s.
With this change, --limit-upload 1024 results in ~20ms ping times and
the bandwidth monitor reports a constant ~1MB/s.
I've elected to make this change unconditional of --limit-upload because
the resumable uploads shouldn't be providing much benefit anyways, as
restic already uploads mostly small blobs and already has a retry
mechanism.
--limit-download is not affected by this problem, as Get().Download()
returns the real http.Response.Body without any internal buffering.
Updates #1216
This PR adds the ability of chaining the credentials provider,
such that restic as a tool attempts to honor credentials from
multiple different ways.
Currently supported mechanisms are
- static (user-provided)
- IAM profile (only valid inside configured ec2 instances)
- Standard AWS envs (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
- Standard Minio envs (MINIO_ACCESS_KEY, MINIO_SECRET_KEY)
Refer https://github.com/restic/restic/issues/1341
If the service account used with restic does not have the
storage.buckets.get permission (in the "Storage Admin" role), Create
cannot use Get to determine if the bucket is accessible.
Rather than always trying to create the bucket on Get error, gracefully
fall back to assuming the bucket is accessible. If it is, restic init
will complete successfully. If it is not, it will fail on a later call.
Here is what init looks like now in different cases.
Service account without "Storage Admin":
Bucket exists and is accessible (this is the case that didn't work
before):
$ ./restic init -r gs:this-bucket-does-exist:/
enter password for new backend:
enter password again:
created restic backend c02e2edb67 at gs:this-bucket-does-exist:/
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
Bucket exists but is not accessible:
$ ./restic init -r gs:this-bucket-does-exist:/
enter password for new backend:
enter password again:
create key in backend at gs:this-bucket-does-exist:/ failed:
service.Objects.Insert: googleapi: Error 403:
my-service-account@myproject.iam.gserviceaccount.com does not have
storage.objects.create access to object this-bucket-exists/keys/0fa714e695c8ecd58cb467cdeb04d36f3b710f883496a90f23cae0315daf0b93., forbidden
Bucket does not exist:
$ ./restic init -r gs:this-bucket-does-not-exist:/
create backend at gs:this-bucket-does-not-exist:/ failed:
service.Buckets.Insert: googleapi: Error 403:
my-service-account@myproject.iam.gserviceaccount.com does not have storage.buckets.create access to bucket this-bucket-does-not-exist., forbidden
Service account with "Storage Admin":
Bucket exists and is accessible: Same
Bucket exists but is not accessible: Same. Previously this would fail
when Create tried to create the bucket. Now it fails when trying to
create the keys.
Bucket does not exist:
$ ./restic init -r gs:this-bucket-does-not-exist:/
enter password for new backend:
enter password again:
created restic backend c3c48b481d at gs:this-bucket-does-not-exist:/
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
In the manual, state which standard roles the service account must
have to work correctly, as well as the specific permissions required,
for creating even more specific custom roles.
This was a bit tricky: We start the ssh binary, but we want it to ignore
SIGINT. In contrast, restic itself should process SIGINT and clean up
properly. Before, we used `setsid()` to give the ssh process its own
process group, but that means it cannot prompt the user for a password
because the tty is gone.
So, now we're passing in two functions that ignore SIGINT just before
the ssh process is started and re-install it after start.