The ETA restic displays was based on a rate computed across the entire
backup operation. Often restic can progress at uneven rates. In the worst
case, restic progresses over most of the backup at a very high rate and
then finds new data to back up. The displayed ETA is then unrealistic and
never adapts.
Restic now estimates the transfer rate based on a sliding window, with the
goal of adapting to observed changes in rate. To avoid wild changes in the
estimate, several heuristics are used to keep the sliding window wide
enough to be relatively stable.
Modifies format module to add options for human readable storage size formatting, using size parsing already in ui/format.
Cmd flag --human-readable added to ls and find commands.
Additional option added to formatNode to support printing size in regular or new human readable format
restic must be able to refresh lock files in time. However, large
uploads over slow connections can cause the lock refresh to be stuck
behind the large uploads and thus time out.
Since 0.15 (#4020), inodes are generated as hashes of names, xor'd with
the parent inode. That means that the inode of a/b/b is
h(a/b/b) = h(a) ^ h(b) ^ h(b) = h(a).
I.e., the grandchild has the same inode as the grandparent. GNU find
trips over this because it thinks it has encountered a loop in the
filesystem, and fails to search a/b/b. This happens more generally when
the same name occurs an even number of times.
Fix this by multiplying the parent by a large prime, so the combining
operation is not longer symmetric in its arguments. This is what the FNV
hash does, which we used prior to 0.15. The hash is now
h(a/b/b) = h(b) ^ p*(h(b) ^ p*h(a))
Note that we already ensure that h(x) is never zero.
Collisions can still occur, but they should be much less likely to occur
within a single path.
Fixes #4253.
This turns snapshotFilterOptions from cmd into a restic.SnapshotFilter
type and makes restic.FindFilteredSnapshot and FindFilteredSnapshots
methods on that type. This fixes #4211 by ensuring that hosts and paths
are named struct fields instead of unnamed function arguments in long
lists of such.
Timestamp limits are also included in the new type. To avoid too much
pointer handling, the convention is that time zero means no limit.
That's January 1st, year 1, 00:00 UTC, which is so unlikely a date that
we can sacrifice it for simpler code.
Fixes restic#719
If the option is passed, restic will wait the specified duration of time
and retry locking the repo every 10 seconds (or more often if the total
timeout is relatively small).
- Play nice with json output
- Reduce wait time in lock tests
- Rework timeout last attempt
- Reduce test wait time to 0.1s
- Use exponential back off for the retry lock
- Don't pass gopts to lockRepo functions
- Use global variable for retry sleep setup
- Exit retry lock on cancel
- Better wording for flag help
- Reorder debug statement
- Refactor tests
- Lower max sleep time to 1m
- Test that we cancel/timeout in time
- Use non blocking sleep function
- Refactor into minDuration func
Co-authored-by: Julian Brost <julian@0x4a42.net>
Since 0.15 (#4020), inodes are generated as hashes of names, xor'd with
the parent inode. That means that the inode of a/b/b is
h(a/b/b) = h(a) ^ h(b) ^ h(b) = h(a).
I.e., the grandchild has the same inode as the grandparent. GNU find
trips over this because it thinks it has encountered a loop in the
filesystem, and fails to search a/b/b. This happens more generally when
the same name occurs an even number of times.
Fix this by multiplying the parent by a large prime, so the combining
operation is not longer symmetric in its arguments. This is what the FNV
hash does, which we used prior to 0.15. The hash is now
h(a/b/b) = h(b) ^ p*(h(b) ^ p*h(a))
Note that we already ensure that h(x) is never zero.
Collisions can still occur, but they should be much less likely to occur
within a single path.
Fixes #4253.
This turns snapshotFilterOptions from cmd into a restic.SnapshotFilter
type and makes restic.FindFilteredSnapshot and FindFilteredSnapshots
methods on that type. This fixes #4211 by ensuring that hosts and paths
are named struct fields instead of unnamed function arguments in long
lists of such.
Timestamp limits are also included in the new type. To avoid too much
pointer handling, the convention is that time zero means no limit.
That's January 1st, year 1, 00:00 UTC, which is so unlikely a date that
we can sacrifice it for simpler code.
The scanner process has only cosmetic effect for the progress printer,
and can be disabled without impacting functionality when the user does
not need an estimate of completion.
In many cases the scanner process can provide beneficial priming of
the file system cache, so as general advice it should not be disabled.
However, tests have shown that backup of NFS and fuse based filesystems,
where stat(2) is relatively expensive, can be significantly faster
without the scanner.
The Original field is meant to remember the original snapshot id if e.g.
changing its tags. It was only set by the `rewrite` command if it was
not set previously. However, a rewritten snapshot is potentially rather
different from the original snapshot. Thus just always set the Original
field. This also makes it easier to later on detect and potentially
remove the original snapshots.
Automatically fall back to hiding files if not authorized to permanently
delete files. This allows using restic with an append-only application
key with B2. Thus, an attacker cannot directly delete backups with the
API key used by restic.
To use this feature create an application key without the deleteFiles
capability. It is recommended to restrict the key to just one bucket.
For example using the b2 command line tool:
b2 create-key --bucket <bucketName> <keyName> listBuckets,readFiles,writeFiles,listFiles
Suggested-by: Daniel Gröber <dxld@darkboxed.org>
Counting the first occurrence of a duplicate blob as used and counting
all other as duplicates, independent of which instance of the blob is
kept, is only accurate if all copies of the blob have the same size. This
is no longer the case for a repository containing both compressed and
uncompressed blobs.
Thus for duplicated blobs first count all instances as duplicates and
then subtract the actually used instance later on.
The backup command failed if a directory contains duplicate entries.
Downgrade the severity of this problem from fatal error to a warning.
This allows users to still create a backup.
While searching for lock file from concurrently running restic
instances, restic ignored unreadable lock files. These can either be
in fact invalid or just be temporarily unreadable. As it is not really
possible to differentiate between both cases, just err on the side of
caution and consider the repository as already locked.
The code retries searching for other locks up to three times to smooth
out temporarily unreadable lock files.
Some backends generate additional files for each existing file, e.g.
1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef.sha256
For some commands this leads to an "multiple IDs with prefix" error when
trying to reference a snapshot.
`restic unlock` now only shows `successfully removed locks` if there were locks to be removed.
In addition, it also reports the number of the removed lock files.
This is especially useful if ssh asks for a password or if closing the
initial connection could return an error due to a problematic server
implementation.
bazil/fuse expects us to return context.Canceled to signal that a
syscall was successfully interrupted. Returning a wrapped version of
that error however causes the fuse library to signal an EIO (input/output
error). Thus unwrap context.Canceled errors before returning them.
`init` and `copy` use `--repo2` with two different meaning which has
proven to be confusing for users. `--from-repo` now consistently marks a
source repository from which data is read. `--repo` is now always the
target/destination repository.
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.
Apparently SMB/CIFS on Linux/macOS returns somewhat random errnos when
trying to sync a windows share which does not support calling fsync for
a directory.
The `stats` command checks inodes to not count hardlinked files multiple
times into the restore size. This check applies across all snapshots and
not only within snapshots. As a result the result size was far too low
when calculating it for multiple snapshots and it would vary depending
on the order in which snapshots were listed.
The new option allows prune to operate with nearly no scratch space by only removing
no longer necessary pack files and first deleting the index before
rebuilding it. By first deleting the index it becomes safe to just
delete no longer necessary pack files. However, as a downside there's
now the risk that the repository becomes inaccessible if prune fails.
To recover from that problem a user might have to manually delete the
repository index and then run (a full) `rebuild-index` again.
* Write new file payload to a temp file before touching the original
binary. Minimizes the possibility of failing mid-write and corrupting
the binary.
* On Windows, move the original binary out to a temp file rather than
removing it as the running binary is locked. Fixes issue #2248.
Nodes in trees were always printed with a `+` in diff, regardless of
whether or not a dir was added or removed. Let's use the mode we were
passed in printDir().
Closes #3685
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
This is quite similar to gitignore. If a pattern is suffixed by an
exclamation mark and match a file that was previously matched by a
regular pattern, the match is cancelled. Notably, this can be used
with `--exclude-file` to cancel the exclusion of some files.
Like for gitignore, once a directory is excluded, it is not possible
to include files inside the directory. For example, a user wanting to
only keep `*.c` in some directory should not use:
~/work
!~/work/*.c
But:
~/work/*
!~/work/*.c
I didn't write documentation or changelog entry. I would like to get
feedback if this is the right approach for excluding/including files
at will for backups. I use something like this as an exclude file to
backup my home:
$HOME/**/*
!$HOME/Documents
!$HOME/code
!$HOME/.emacs.d
!$HOME/games
# [...]
node_modules
*~
*.o
*.lo
*.pyc
# [...]
$HOME/code/linux/*
!$HOME/code/linux/.git
# [...]
There are some limitations for this change:
- Patterns are not mixed accross methods: patterns from file are
handled first and if a file is excluded with this method, it's not
possible to reinclude it with `--exclude !something`.
- Patterns starting with `!` are now interpreted as a negative
pattern. I don't think anyone was relying on that.
- The whole list of patterns is walked for each match. We may
optimize later by exiting early if we know no pattern is starting
with `!`.
Fix #233
There's no point in locking the repository just to list the currently
existing lock files. This won't work for an exclusively locked
repository and is also confusing to users.
Loading any parent tree for these only wastes time and memory.
Fixes #3641, where it was shown that the most recent tree will get
picked.
--parent is now implicitly ignored when --stdin is given.
Create a temporary file with a sufficiently random name to essentially
avoid any chance of conflicts. Once the upload has finished remove the
temporary suffix. Interrupted upload thus will be ignored by restic.
Currently, `restic backup` (if a `--parent` is not provided)
will choose the most recent matching snapshot as the parent snapshot.
This makes sense in the usual case,
where we tag the snapshot-being-created with the current time.
However, this doesn't make sense if the user has passed `--time`
and is currently creating a snapshot older than the latest snapshot.
Instead, choose the most recent snapshot
which is not newer than the snapshot-being-created's timestamp,
to avoid any time travel.
Impetus for this change:
I'm using restic for the first time!
I have a number of existing BTRFS snapshots
I am backing up via restic to serve as my initial set of backups.
I initially `restic backup`'d the most recent snapshot to test,
then started backing up each of the other snapshots.
I noticed in `restic cat snapshot <id>` output
that all the remaining snapshots have the most recent as the parent.
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.
This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.
This commit also addresses the TODO comment in the copyTree function.
Related work:
A more thorough improvement of the restic copy performance can be found
in PR #3513
If a request fails with "x509: certificate signed by unknown authority",
the B2 backend now returns the error without retrying the request.
Closes #3556
Closes #2355
When deleting a file, B2 sometimes returns a "500 Service Unavailable"
error but nevertheless correctly deletes the file. Due to retries in
the B2 library blazer, we sometimes also see a "400 File not present"
error. The retries of restic for the delete request then fail with
"404 File with such name does not exist.".
As we have to rely on request retries in a distributed system to handle
temporary errors, also consider a delete request to be successful if the
file is reported as not existing. This should be safe as B2 claims to
provide a strongly consistent bucket listing and thus a missing file
shouldn't mysteriously show up again later on.