Tree packs are cached locally at clients and thus benefit a lot from
being compressed. Ensure this be having prune always repack pack files
containing uncompressed trees.
The new option allows prune to operate with nearly no scratch space by only removing
no longer necessary pack files and first deleting the index before
rebuilding it. By first deleting the index it becomes safe to just
delete no longer necessary pack files. However, as a downside there's
now the risk that the repository becomes inaccessible if prune fails.
To recover from that problem a user might have to manually delete the
repository index and then run (a full) `rebuild-index` again.
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
During a backup the index is written before the corresponding snapshots.
To ensure that a concurrent/later restic run can read a snapshot's data,
restic thus must first load the snapshots and only afterwards the index.
Otherwise it is not possible to ensure that the loaded index is recent
enough to cover all of the snapshot's data.
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
* PrintProgress no longer does unnecessary Sprintf calls, and performs
fewer allocations in general
* newProgressMax's callback checks whether the terminal supports
line updates once instead of once per call
* the callback looks up the terminal width once per call instead of
twice (on Windows)
* the status shortening now uses the Unicode-aware version from
internal/ui/termstatus (future-proofing)
This commit changes the error message so that a list of file names is
printed. Before, just the raw map was printed, which is not a great user
interface.
Note that this fix only solves the statistics problem, if
all duplicates are marked for repacking.
If not all duplicates are marked for repacking, we lack the
information which
The situation that not all duplicates are marked for repacking can occur
when using the `max-repack-size` option
Add a callback to the PruneOptions struct which calculates the number of
bytes allowed to be unused after prune is done. This way, the logic is
closer to the option parsing code.
Also, add an explicit option `unlimited` for the use case when storage
does not matter but bandwidth and time do. Internally, this sets the
maximum number of unused bytes to MaxUint64.
Rework the documentation slightly so that no more "packs" are
mentioned and it talks about "files" instead.
Make it clear in the documentation that the percentage given to
`--max-unused` is relative to the whole repository size after pruning is
done. If specified, it must be below 100%, otherwise the repository
would contain 100% of unused data, which is pointless.
I had a hard time coming up with the correct formula to calculate the
maximum number of unused bytes based on the number of used bytes. For a
fraction `p` (0 ≤ p < 1), a repo with `u` bytes used, and the number of
unused bytes `x` the following holds:
x ≤ p * (u+x)
⇔ x ≤ p*u + p*x
⇔ x - p*x ≤ p*u
⇔ x * (1-p) ≤ p*u
⇔ x ≤ p/(1-p) * u
The previous check only approximately verified whether all required
blobs were found. However, after forgetting a few snapshots the
repository contains lots of unused blobs whose number can be sufficient
to make up for missing packs.
When coupled with a malfunctioning backend that temporarily returns broken
data this could cause restic to regard the corresponding packs as
invalid and thereby delete data that's still in use. This change lets
restic play it safe and refuse to delete anything if data is missing.
The seen BlobSet always contained a subset of the entries in blobs.
Thus use blobs instead and avoid the memory overhead of the second set.
Suggested-by: Alexander Weiss <alex@weissfam.de>