When a file system is mounted at a directory, lstat() returns attributes
of the root node of the mounted file system, including the device ID of
the other file system. The previous code used when --one-file-system is
specified excluded the directory itself because of that.
This commit changes the code so that mountpoints are kept as empty
directories, its attributes set to the root note of the mounted file
system. The behavior mimics `tar`, which does the same.
Note that this fix only solves the statistics problem, if
all duplicates are marked for repacking.
If not all duplicates are marked for repacking, we lack the
information which
The situation that not all duplicates are marked for repacking can occur
when using the `max-repack-size` option
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
Add a callback to the PruneOptions struct which calculates the number of
bytes allowed to be unused after prune is done. This way, the logic is
closer to the option parsing code.
Also, add an explicit option `unlimited` for the use case when storage
does not matter but bandwidth and time do. Internally, this sets the
maximum number of unused bytes to MaxUint64.
Rework the documentation slightly so that no more "packs" are
mentioned and it talks about "files" instead.
Make it clear in the documentation that the percentage given to
`--max-unused` is relative to the whole repository size after pruning is
done. If specified, it must be below 100%, otherwise the repository
would contain 100% of unused data, which is pointless.
I had a hard time coming up with the correct formula to calculate the
maximum number of unused bytes based on the number of used bytes. For a
fraction `p` (0 ≤ p < 1), a repo with `u` bytes used, and the number of
unused bytes `x` the following holds:
x ≤ p * (u+x)
⇔ x ≤ p*u + p*x
⇔ x - p*x ≤ p*u
⇔ x * (1-p) ≤ p*u
⇔ x ≤ p/(1-p) * u