2
2
mirror of https://github.com/octoleo/restic.git synced 2024-12-22 10:58:55 +00:00

doc: Add config

This commit is contained in:
Alexander Neumann 2015-05-03 15:00:26 +02:00
parent 13a42ec5ec
commit 062c328f2d

View File

@ -21,49 +21,65 @@ been backed up at some point in time. The state here means the content and meta
data like the name and modification time for the file or the directory and its data like the name and modification time for the file or the directory and its
contents. contents.
*Storage ID*: A storage ID is the hash of the content of a file stored in the
repository. This ID is needed in order to load the file from the repository.
The storage hash is the SHA-256 hash of the content.
Repository Format Repository Format
================= =================
All data is stored in a restic repository. A repository is able to store data All data is stored in a restic repository. A repository is able to store data
of several different types, which can later be requested based on an ID. The ID of several different types, which can later be requested based on an ID. This
is the hash (SHA-256) of the content of a file. All files in a repository are so-called "storage ID" is the hash (SHA-256) of the content of a file. All
only written once and never modified afterwards. This allows accessing and even files in a repository are only written once and never modified afterwards. This
writing to the repository with multiple clients in parallel. Only the delete allows accessing and even writing to the repository with multiple clients in
operation changes data in the repository. parallel. Only the delete operation removes data from the repository.
At the time of writing, the only implemented repository type is based on At the time of writing, the only implemented repository type is based on
directories and files. Such repositories can be accessed locally on the same directories and files. Such repositories can be accessed locally on the same
system or via the integrated SFTP client. The directory layout is the same for system or via the integrated SFTP client. The directory layout is the same for
both access methods. This repository type is described in the following. both access methods. This repository type is described in the following.
Repositories consists of several directories and a file called `version`. This Repositories consists of several directories and a file called `config`. For
file contains the version number of the repository. At the moment, this file all other files stored in the repository, the name for the file is the lower
is expected to hold the string `1`, with an optional newline character. case hexadecimal representation of the storage ID, which is the SHA-256 hash of
Additionally there is a file named `id` which contains 32 random bytes, encoded the file's contents. This allows easily checking all files for accidental
in hexadecimal. This uniquely identifies the repository, regardless if it is modifications like disk read errors by simply running the program `sha256sum`
accessed via SFTP or locally. and comparing its output to the file name. If the prefix of a filename is
unique amongst all the other files in the same directory, the prefix may be
used instead of the complete filename.
For all other files stored in the repository, the name for the file is the Apart from the files stored below the `keys` directory, all files are encrypted
lower case hexadecimal representation of the SHA-256 hash of the file's with AES-256 in counter mode (CTR). The integrity of the encrypted data is
contents. This allows easily checking all files for accidental modifications secured by a Poly1305-AES message authentication code (sometimes also referred
like disk read errors by simply running the program `sha256sum` and comparing to as a "signature").
its output to the file name. If the prefix of a filename is unique amongst all
the other files in the same directory, the prefix may be used instead of the
complete filename.
Apart from the files `version`, `id` and the files stored below the `keys`
directory, all files are encrypted with AES-256 in counter mode (CTR). The
integrity of the encrypted data is secured by a Poly1305-AES message
authentication code (sometimes also referred to as a "signature").
In the first 16 bytes of each encrypted file the initialisation vector (IV) is In the first 16 bytes of each encrypted file the initialisation vector (IV) is
stored. It is followed by the encrypted data and completed by the 16 byte stored. It is followed by the encrypted data and completed by the 16 byte
MAC. The format is: `IV || CIPHERTEXT || MAC`. The complete encryption MAC. The format is: `IV || CIPHERTEXT || MAC`. The complete encryption
overhead is 32 byte. For each file, a new random IV is selected. overhead is 32 bytes. For each file, a new random IV is selected.
The basic layout of a sample restic repository is shown below: The file `config` is encrypted this way and contains a JSON document like the
following:
{
"version": 1,
"id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
"chunker_polynomial": "25b468838dcb75"
}
After decryption, restic first checks that the version field contains a version
number that it understands, otherwise it aborts. At the moment, the version is
expected to be 1. The field `id` holds a unique ID which consists of 32
random bytes, encoded in hexadecimal. This uniquely identifies the repository,
regardless if it is accessed via SFTP or locally. The field
`chunker_polynomial` contains a parameter that is used for splitting large
files into smaller chunks (see below).
The basic layout of a sample restic repository is shown here:
/tmp/restic-repo /tmp/restic-repo
├── config
├── data ├── data
│ ├── 21 │ ├── 21
│ │ └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1 │ │ └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
@ -74,7 +90,6 @@ The basic layout of a sample restic repository is shown below:
│ ├── 73 │ ├── 73
│ │ └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c │ │ └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
│ [...] │ [...]
├── id
├── index ├── index
│ ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d │ ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
│ └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd │ └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
@ -83,8 +98,7 @@ The basic layout of a sample restic repository is shown below:
├── locks ├── locks
├── snapshots ├── snapshots
│ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec │ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
├── tmp └── tmp
└── version
A repository can be initialized with the `restic init` command, e.g.: A repository can be initialized with the `restic init` command, e.g.:
@ -93,21 +107,21 @@ A repository can be initialized with the `restic init` command, e.g.:
Pack Format Pack Format
----------- -----------
All files in the repository except Key and Data files just contain raw data, All files in the repository except Key and Pack files just contain raw data,
stored as `IV || Ciphertext || MAC`. Data files may contain one or more Blobs stored as `IV || Ciphertext || MAC`. Pack files may contain one or more Blobs
of data. The format is described in the following. of data.
The Pack's structure is as follows: A Pack's structure is as follows:
EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length
At the end of the Pack is a header, which describes the content. The header is At the end of the Pack file is a header, which describes the content. The
encrypted and authenticated. `Header_Length` is the length of the encrypted header header is encrypted and authenticated. `Header_Length` is the length of the
encoded as a four byte integer in little-endian encoding. Placing the header at encrypted header encoded as a four byte integer in little-endian encoding.
the end of a file allows writing the blobs in a continuous stream as soon as Placing the header at the end of a file allows writing the blobs in a
they are read during the backup phase. This reduces code complexity and avoids continuous stream as soon as they are read during the backup phase. This
having to re-write a file once the pack is complete and the content and length reduces code complexity and avoids having to re-write a file once the pack is
of the header is known. complete and the content and length of the header is known.
All the blobs (`EncryptedBlob1`, `EncryptedBlobN` etc.) are authenticated and All the blobs (`EncryptedBlob1`, `EncryptedBlobN` etc.) are authenticated and
encrypted independently. This enables repository reorganisation without having encrypted independently. This enables repository reorganisation without having
@ -178,7 +192,7 @@ listed afterwards.
There may be an arbitrary number of index files, containing information on There may be an arbitrary number of index files, containing information on
non-disjoint sets of Packs. The number of packs described in a single file is non-disjoint sets of Packs. The number of packs described in a single file is
chosen so that the file size is kep below 8 MiB. chosen so that the file size is kept below 8 MiB.
Keys, Encryption and MAC Keys, Encryption and MAC
------------------------ ------------------------
@ -230,9 +244,8 @@ tampered with, the computed MAC will not match the last 16 bytes of the data,
and restic exits with an error. Otherwise, the data is decrypted with the and restic exits with an error. Otherwise, the data is decrypted with the
encryption key derived from `scrypt`. This yields a JSON document which encryption key derived from `scrypt`. This yields a JSON document which
contains the master encryption and message authentication keys for this contains the master encryption and message authentication keys for this
repository (encoded in Base64) and the polynomial that is used for CDC. The repository (encoded in Base64). The command `restic cat masterkey` can be used
command `restic cat masterkey` can be used as follows to decrypt and as follows to decrypt and pretty-print the master key:
pretty-print the master key:
$ restic -r /tmp/restic-repo cat masterkey $ restic -r /tmp/restic-repo cat masterkey
{ {
@ -241,7 +254,6 @@ pretty-print the master key:
"r": "E9eEDnSJZgqwTOkDtOp+Dw==" "r": "E9eEDnSJZgqwTOkDtOp+Dw=="
}, },
"encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=", "encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=",
"chunker_polynomial": "2f0797d9c2363f"
} }
All data in the repository is encrypted and authenticated with these master keys. All data in the repository is encrypted and authenticated with these master keys.
@ -284,9 +296,9 @@ hash. Before saving, each file is split into variable sized Blobs of data. The
SHA-256 hashes of all Blobs are saved in an ordered list which then represents SHA-256 hashes of all Blobs are saved in an ordered list which then represents
the content of the file. the content of the file.
In order to relate these plain text hashes to the actual encrypted storage In order to relate these plain text hashes to the actual location within a Pack
hashes (which vary due to random IVs), an index is used. If the index is not file , an index is used. If the index is not available, the header of all data
available, the header of all data Blobs can be read. Blobs can be read.
Trees and Data Trees and Data
-------------- --------------
@ -321,7 +333,7 @@ The command `restic cat tree` can be used to inspect the tree referenced above:
A tree contains a list of entries (in the field `nodes`) which contain meta A tree contains a list of entries (in the field `nodes`) which contain meta
data like a name and timestamps. When the entry references a directory, the data like a name and timestamps. When the entry references a directory, the
field `subtree` contains the plain text ID of another tree object. field `subtree` contains the plain text ID of another tree object.
When the command `restic cat tree` is used, the storage hash is needed to print When the command `restic cat tree` is used, the storage hash is needed to print
a tree. The tree referenced above can be dumped as follows: a tree. The tree referenced above can be dumped as follows:
@ -372,8 +384,9 @@ For creating a backup, restic scans the source directory for all files,
sub-directories and other entries. The data from each file is split into sub-directories and other entries. The data from each file is split into
variable length Blobs cut at offsets defined by a sliding window of 64 byte. variable length Blobs cut at offsets defined by a sliding window of 64 byte.
The implementation uses Rabin Fingerprints for implementing this Content The implementation uses Rabin Fingerprints for implementing this Content
Defined Chunking (CDC). An irreducible polynomial is selected at random when a Defined Chunking (CDC). An irreducible polynomial is selected at random and
repository is initialized. saved in the file `config` when a repository is initialized, so that watermark
attacks are much harder.
Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB in Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB in
size. The implementation aims for 1 MiB Blob size on average. size. The implementation aims for 1 MiB Blob size on average.