mirror of
https://github.com/octoleo/restic.git
synced 2024-11-24 13:47:42 +00:00
parent
228a970540
commit
ddf2065ce2
@ -18,793 +18,6 @@ References
|
||||
Design
|
||||
******
|
||||
|
||||
Terminology
|
||||
===========
|
||||
|
||||
This section introduces terminology used in this document.
|
||||
|
||||
*Repository*: All data produced during a backup is sent to and stored in
|
||||
a repository in a structured form, for example in a file system
|
||||
hierarchy with several subdirectories. A repository implementation must
|
||||
be able to fulfill a number of operations, e.g. list the contents.
|
||||
|
||||
*Blob*: A Blob combines a number of data bytes with identifying
|
||||
information like the SHA-256 hash of the data and its length.
|
||||
|
||||
*Pack*: A Pack combines one or more Blobs, e.g. in a single file.
|
||||
|
||||
*Snapshot*: A Snapshot stands for the state of a file or directory that
|
||||
has been backed up at some point in time. The state here means the
|
||||
content and meta data like the name and modification time for the file
|
||||
or the directory and its contents.
|
||||
|
||||
*Storage ID*: A storage ID is the SHA-256 hash of the content stored in
|
||||
the repository. This ID is required in order to load the file from the
|
||||
repository.
|
||||
|
||||
Repository Format
|
||||
=================
|
||||
|
||||
All data is stored in a restic repository. A repository is able to store
|
||||
data of several different types, which can later be requested based on
|
||||
an ID. This so-called "storage ID" is the SHA-256 hash of the content of
|
||||
a file. All files in a repository are only written once and never
|
||||
modified afterwards. This allows accessing and even writing to the
|
||||
repository with multiple clients in parallel. Only the ``prune`` operation
|
||||
removes data from the repository.
|
||||
|
||||
Repositories consist of several directories and a top-level file called
|
||||
``config``. For all other files stored in the repository, the name for
|
||||
the file is the lower case hexadecimal representation of the storage ID,
|
||||
which is the SHA-256 hash of the file's contents. This allows for easy
|
||||
verification of files for accidental modifications, like disk read
|
||||
errors, by simply running the program ``sha256sum`` on the file and
|
||||
comparing its output to the file name. If the prefix of a filename is
|
||||
unique amongst all the other files in the same directory, the prefix may
|
||||
be used instead of the complete filename.
|
||||
|
||||
Apart from the files stored within the ``keys`` directory, all files are
|
||||
encrypted with AES-256 in counter mode (CTR). The integrity of the
|
||||
encrypted data is secured by a Poly1305-AES message authentication code
|
||||
(sometimes also referred to as a "signature").
|
||||
|
||||
In the first 16 bytes of each encrypted file the initialisation vector
|
||||
(IV) is stored. It is followed by the encrypted data and completed by
|
||||
the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The
|
||||
complete encryption overhead is 32 bytes. For each file, a new random IV
|
||||
is selected.
|
||||
|
||||
The file ``config`` is encrypted this way and contains a JSON document
|
||||
like the following:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"version": 1,
|
||||
"id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
|
||||
"chunker_polynomial": "25b468838dcb75"
|
||||
}
|
||||
|
||||
After decryption, restic first checks that the version field contains a
|
||||
version number that it understands, otherwise it aborts. At the moment,
|
||||
the version is expected to be 1. The field ``id`` holds a unique ID
|
||||
which consists of 32 random bytes, encoded in hexadecimal. This uniquely
|
||||
identifies the repository, regardless if it is accessed via SFTP or
|
||||
locally. The field ``chunker_polynomial`` contains a parameter that is
|
||||
used for splitting large files into smaller chunks (see below).
|
||||
|
||||
Repository Layout
|
||||
-----------------
|
||||
|
||||
The ``local`` and ``sftp`` backends are implemented using files and
|
||||
directories stored in a file system. The directory layout is the same
|
||||
for both backend types.
|
||||
|
||||
The basic layout of a repository stored in a ``local`` or ``sftp``
|
||||
backend is shown here:
|
||||
|
||||
::
|
||||
|
||||
/tmp/restic-repo
|
||||
├── config
|
||||
├── data
|
||||
│ ├── 21
|
||||
│ │ └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
|
||||
│ ├── 32
|
||||
│ │ └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
|
||||
│ ├── 59
|
||||
│ │ └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
|
||||
│ ├── 73
|
||||
│ │ └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
|
||||
│ [...]
|
||||
├── index
|
||||
│ ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
|
||||
│ └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
|
||||
├── keys
|
||||
│ └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
|
||||
├── locks
|
||||
├── snapshots
|
||||
│ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
|
||||
└── tmp
|
||||
|
||||
A local repository can be initialized with the ``restic init`` command,
|
||||
e.g.:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo init
|
||||
|
||||
The local and sftp backends will auto-detect and accept all layouts described
|
||||
in the following sections, so that remote repositories mounted locally e.g. via
|
||||
fuse can be accessed. The layout auto-detection can be overridden by specifying
|
||||
the option ``-o local.layout=default``, valid values are ``default`` and
|
||||
``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the
|
||||
s3 backend ``s3.layout``.
|
||||
|
||||
S3 Legacy Layout
|
||||
----------------
|
||||
|
||||
Unfortunately during development the AWS S3 backend uses slightly different
|
||||
paths (directory names use singular instead of plural for ``key``,
|
||||
``lock``, and ``snapshot`` files), and the data files are stored directly below
|
||||
the ``data`` directory. The S3 Legacy repository layout looks like this:
|
||||
|
||||
::
|
||||
|
||||
/config
|
||||
/data
|
||||
├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
|
||||
├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
|
||||
├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
|
||||
├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
|
||||
[...]
|
||||
/index
|
||||
├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
|
||||
└── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
|
||||
/key
|
||||
└── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
|
||||
/lock
|
||||
/snapshot
|
||||
└── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
|
||||
|
||||
The S3 backend understands and accepts both forms, new backends are
|
||||
always created with the default layout for compatibility reasons.
|
||||
|
||||
Pack Format
|
||||
===========
|
||||
|
||||
All files in the repository except Key and Pack files just contain raw
|
||||
data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one
|
||||
or more Blobs of data.
|
||||
|
||||
A Pack's structure is as follows:
|
||||
|
||||
::
|
||||
|
||||
EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length
|
||||
|
||||
At the end of the Pack file is a header, which describes the content.
|
||||
The header is encrypted and authenticated. ``Header_Length`` is the
|
||||
length of the encrypted header encoded as a four byte integer in
|
||||
little-endian encoding. Placing the header at the end of a file allows
|
||||
writing the blobs in a continuous stream as soon as they are read during
|
||||
the backup phase. This reduces code complexity and avoids having to
|
||||
re-write a file once the pack is complete and the content and length of
|
||||
the header is known.
|
||||
|
||||
All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are
|
||||
authenticated and encrypted independently. This enables repository
|
||||
reorganisation without having to touch the encrypted Blobs. In addition
|
||||
it also allows efficient indexing, for only the header needs to be read
|
||||
in order to find out which Blobs are contained in the Pack. Since the
|
||||
header is authenticated, authenticity of the header can be checked
|
||||
without having to read the complete Pack.
|
||||
|
||||
After decryption, a Pack's header consists of the following elements:
|
||||
|
||||
::
|
||||
|
||||
Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) ||
|
||||
[...]
|
||||
Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) ||
|
||||
|
||||
This is enough to calculate the offsets for all the Blobs in the Pack.
|
||||
Length is the length of a Blob as a four byte integer in little-endian
|
||||
format. The type field is a one byte field and labels the content of a
|
||||
blob according to the following table:
|
||||
|
||||
+--------+-----------+
|
||||
| Type | Meaning |
|
||||
+========+===========+
|
||||
| 0 | data |
|
||||
+--------+-----------+
|
||||
| 1 | tree |
|
||||
+--------+-----------+
|
||||
|
||||
All other types are invalid, more types may be added in the future.
|
||||
|
||||
For reconstructing the index or parsing a pack without an index, first
|
||||
the last four bytes must be read in order to find the length of the
|
||||
header. Afterwards, the header can be read and parsed, which yields all
|
||||
plaintext hashes, types, offsets and lengths of all included blobs.
|
||||
|
||||
Indexing
|
||||
========
|
||||
|
||||
Index files contain information about Data and Tree Blobs and the Packs
|
||||
they are contained in and store this information in the repository. When
|
||||
the local cached index is not accessible any more, the index files can
|
||||
be downloaded and used to reconstruct the index. The files are encrypted
|
||||
and authenticated like Data and Tree Blobs, so the outer structure is
|
||||
``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON
|
||||
document like the following:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"supersedes": [
|
||||
"ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452"
|
||||
],
|
||||
"packs": [
|
||||
{
|
||||
"id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c",
|
||||
"blobs": [
|
||||
{
|
||||
"id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
|
||||
"type": "data",
|
||||
"offset": 0,
|
||||
"length": 25
|
||||
},{
|
||||
"id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
|
||||
"type": "tree",
|
||||
"offset": 38,
|
||||
"length": 100
|
||||
},
|
||||
{
|
||||
"id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
|
||||
"type": "data",
|
||||
"offset": 150,
|
||||
"length": 123
|
||||
}
|
||||
]
|
||||
}, [...]
|
||||
]
|
||||
}
|
||||
|
||||
This JSON document lists Packs and the blobs contained therein. In this
|
||||
example, the Pack ``73d04e61`` contains two data Blobs and one Tree
|
||||
blob, the plaintext hashes are listed afterwards.
|
||||
|
||||
The field ``supersedes`` lists the storage IDs of index files that have
|
||||
been replaced with the current index file. This happens when index files
|
||||
are repacked, for example when old snapshots are removed and Packs are
|
||||
recombined.
|
||||
|
||||
There may be an arbitrary number of index files, containing information
|
||||
on non-disjoint sets of Packs. The number of packs described in a single
|
||||
file is chosen so that the file size is kept below 8 MiB.
|
||||
|
||||
Keys, Encryption and MAC
|
||||
========================
|
||||
|
||||
All data stored by restic in the repository is encrypted with AES-256 in
|
||||
counter mode and authenticated using Poly1305-AES. For encrypting new
|
||||
data first 16 bytes are read from a cryptographically secure
|
||||
pseudorandom number generator as a random nonce. This is used both as
|
||||
the IV for counter mode and the nonce for Poly1305. This operation needs
|
||||
three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and
|
||||
a 16 byte key for Poly1305. For details see the original paper `The
|
||||
Poly1305-AES message-authentication
|
||||
code <http://cr.yp.to/mac/poly1305-20050329.pdf>`__ by Dan Bernstein.
|
||||
The data is then encrypted with AES-256 and afterwards a message
|
||||
authentication code (MAC) is computed over the ciphertext, everything is
|
||||
then stored as IV \|\| CIPHERTEXT \|\| MAC.
|
||||
|
||||
The directory ``keys`` contains key files. These are simple JSON
|
||||
documents which contain all data that is needed to derive the
|
||||
repository's master encryption and message authentication keys from a
|
||||
user's password. The JSON document from the repository can be
|
||||
pretty-printed for example by using the Python module ``json``
|
||||
(shortened to increase readability):
|
||||
|
||||
::
|
||||
|
||||
$ python -mjson.tool /tmp/restic-repo/keys/b02de82*
|
||||
{
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0"
|
||||
"kdf": "scrypt",
|
||||
"N": 65536,
|
||||
"r": 8,
|
||||
"p": 1,
|
||||
"created": "2015-01-02T18:10:13.48307196+01:00",
|
||||
"data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==",
|
||||
"salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==",
|
||||
}
|
||||
|
||||
When the repository is opened by restic, the user is prompted for the
|
||||
repository password. This is then used with ``scrypt``, a key derivation
|
||||
function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and
|
||||
``salt``) to derive 64 key bytes. The first 32 bytes are used as the
|
||||
encryption key (for AES-256) and the last 32 bytes are used as the
|
||||
message authentication key (for Poly1305-AES). These last 32 bytes are
|
||||
divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key
|
||||
``r``. The key ``r`` is then masked for use with Poly1305 (see the paper
|
||||
for details).
|
||||
|
||||
Those keys are used to authenticate and decrypt the bytes contained in
|
||||
the JSON field ``data`` with AES-256 and Poly1305-AES as if they were
|
||||
any other blob (after removing the Base64 encoding). If the
|
||||
password is incorrect or the key file has been tampered with, the
|
||||
computed MAC will not match the last 16 bytes of the data, and restic
|
||||
exits with an error. Otherwise, the data yields a JSON document
|
||||
which contains the master encryption and message authentication keys for
|
||||
this repository (encoded in Base64). The command
|
||||
``restic cat masterkey`` can be used as follows to decrypt and
|
||||
pretty-print the master key:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat masterkey
|
||||
{
|
||||
"mac": {
|
||||
"k": "evFWd9wWlndL9jc501268g==",
|
||||
"r": "E9eEDnSJZgqwTOkDtOp+Dw=="
|
||||
},
|
||||
"encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=",
|
||||
}
|
||||
|
||||
All data in the repository is encrypted and authenticated with these
|
||||
master keys. For encryption, the AES-256 algorithm in Counter mode is
|
||||
used. For message authentication, Poly1305-AES is used as described
|
||||
above.
|
||||
|
||||
A repository can have several different passwords, with a key file for
|
||||
each. This way, the password can be changed without having to re-encrypt
|
||||
all data.
|
||||
|
||||
Snapshots
|
||||
=========
|
||||
|
||||
A snapshot represents a directory with all files and sub-directories at
|
||||
a given point in time. For each backup that is made, a new snapshot is
|
||||
created. A snapshot is a JSON document that is stored in an encrypted
|
||||
file below the directory ``snapshots`` in the repository. The filename
|
||||
is the storage ID. This string is unique and used within restic to
|
||||
uniquely identify a snapshot.
|
||||
|
||||
The command ``restic cat snapshot`` can be used as follows to decrypt
|
||||
and pretty-print the contents of a snapshot file:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat snapshot 251c2e58
|
||||
enter password for repository:
|
||||
{
|
||||
"time": "2015-01-02T18:10:50.895208559+01:00",
|
||||
"tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
|
||||
"dir": "/tmp/testdata",
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"tags": [
|
||||
"NL"
|
||||
]
|
||||
}
|
||||
|
||||
Here it can be seen that this snapshot represents the contents of the
|
||||
directory ``/tmp/testdata``. The most important field is ``tree``. When
|
||||
the meta data (e.g. the tags) of a snapshot change, the snapshot needs
|
||||
to be re-encrypted and saved. This will change the storage ID, so in
|
||||
order to relate these seemingly different snapshots, a field
|
||||
``original`` is introduced which contains the ID of the original
|
||||
snapshot, e.g. after adding the tag ``DE`` to the snapshot above it
|
||||
becomes:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat snapshot 22a5af1b
|
||||
enter password for repository:
|
||||
{
|
||||
"time": "2015-01-02T18:10:50.895208559+01:00",
|
||||
"tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
|
||||
"dir": "/tmp/testdata",
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"tags": [
|
||||
"NL",
|
||||
"DE"
|
||||
],
|
||||
"original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837"
|
||||
}
|
||||
|
||||
Once introduced, the ``original`` field is not modified when the
|
||||
snapshot's meta data is changed again.
|
||||
|
||||
All content within a restic repository is referenced according to its
|
||||
SHA-256 hash. Before saving, each file is split into variable sized
|
||||
Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered
|
||||
list which then represents the content of the file.
|
||||
|
||||
In order to relate these plaintext hashes to the actual location within
|
||||
a Pack file , an index is used. If the index is not available, the
|
||||
header of all data Blobs can be read.
|
||||
|
||||
Trees and Data
|
||||
==============
|
||||
|
||||
A snapshot references a tree by the SHA-256 hash of the JSON string
|
||||
representation of its contents. Trees and data are saved in pack files
|
||||
in a subdirectory of the directory ``data``.
|
||||
|
||||
The command ``restic cat blob`` can be used to inspect the tree
|
||||
referenced above (piping the output of the command to ``jq .`` so that
|
||||
the JSON is indented):
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq .
|
||||
enter password for repository:
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"name": "testdata",
|
||||
"type": "dir",
|
||||
"mode": 493,
|
||||
"mtime": "2014-12-22T14:47:59.912418701+01:00",
|
||||
"atime": "2014-12-06T17:49:21.748468803+01:00",
|
||||
"ctime": "2014-12-22T14:47:59.912418701+01:00",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"user": "fd0",
|
||||
"inode": 409704562,
|
||||
"content": null,
|
||||
"subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
A tree contains a list of entries (in the field ``nodes``) which contain
|
||||
meta data like a name and timestamps. When the entry references a
|
||||
directory, the field ``subtree`` contains the plain text ID of another
|
||||
tree object.
|
||||
|
||||
When the command ``restic cat blob`` is used, the plaintext ID is needed
|
||||
to print a tree. The tree referenced above can be dumped as follows:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc
|
||||
enter password for repository:
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"name": "testfile",
|
||||
"type": "file",
|
||||
"mode": 420,
|
||||
"mtime": "2014-12-06T17:50:23.34513538+01:00",
|
||||
"atime": "2014-12-06T17:50:23.338468713+01:00",
|
||||
"ctime": "2014-12-06T17:50:23.34513538+01:00",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"user": "fd0",
|
||||
"inode": 416863351,
|
||||
"size": 1234,
|
||||
"links": 1,
|
||||
"content": [
|
||||
"50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d"
|
||||
]
|
||||
},
|
||||
[...]
|
||||
]
|
||||
}
|
||||
|
||||
This tree contains a file entry. This time, the ``subtree`` field is not
|
||||
present and the ``content`` field contains a list with one plain text
|
||||
SHA-256 hash.
|
||||
|
||||
The command ``restic cat blob`` can also be used to extract and decrypt
|
||||
data given a plaintext ID, e.g. for the data mentioned above:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum
|
||||
enter password for repository:
|
||||
50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d -
|
||||
|
||||
As can be seen from the output of the program ``sha256sum``, the hash
|
||||
matches the plaintext hash from the map included in the tree above, so
|
||||
the correct data has been returned.
|
||||
|
||||
Locks
|
||||
=====
|
||||
|
||||
The restic repository structure is designed in a way that allows
|
||||
parallel access of multiple instance of restic and even parallel writes.
|
||||
However, there are some functions that work more efficient or even
|
||||
require exclusive access of the repository. In order to implement these
|
||||
functions, restic processes are required to create a lock on the
|
||||
repository before doing anything.
|
||||
|
||||
Locks come in two types: Exclusive and non-exclusive locks. At most one
|
||||
process can have an exclusive lock on the repository, and during that
|
||||
time there must not be any other locks (exclusive and non-exclusive).
|
||||
There may be multiple non-exclusive locks in parallel.
|
||||
|
||||
A lock is a file in the subdir ``locks`` whose filename is the storage
|
||||
ID of the contents. It is encrypted and authenticated the same way as
|
||||
other files in the repository and contains the following JSON structure:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"time": "2015-06-27T12:18:51.759239612+02:00",
|
||||
"exclusive": false,
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0",
|
||||
"pid": 13607,
|
||||
"uid": 1000,
|
||||
"gid": 100
|
||||
}
|
||||
|
||||
The field ``exclusive`` defines the type of lock. When a new lock is to
|
||||
be created, restic checks all locks in the repository. When a lock is
|
||||
found, it is tested if the lock is stale, which is the case for locks
|
||||
with timestamps older than 30 minutes. If the lock was created on the
|
||||
same machine, even for younger locks it is tested whether the process is
|
||||
still alive by sending a signal to it. If that fails, restic assumes
|
||||
that the process is dead and considers the lock to be stale.
|
||||
|
||||
When a new lock is to be created and no other conflicting locks are
|
||||
detected, restic creates a new lock, waits, and checks if other locks
|
||||
appeared in the repository. Depending on the type of the other locks and
|
||||
the lock to be created, restic either continues or fails.
|
||||
|
||||
Backups and Deduplication
|
||||
=========================
|
||||
|
||||
For creating a backup, restic scans the source directory for all files,
|
||||
sub-directories and other entries. The data from each file is split into
|
||||
variable length Blobs cut at offsets defined by a sliding window of 64
|
||||
byte. The implementation uses Rabin Fingerprints for implementing this
|
||||
Content Defined Chunking (CDC). An irreducible polynomial is selected at
|
||||
random and saved in the file ``config`` when a repository is
|
||||
initialized, so that watermark attacks are much harder.
|
||||
|
||||
Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB
|
||||
in size. The implementation aims for 1 MiB Blob size on average.
|
||||
|
||||
For modified files, only modified Blobs have to be saved in a subsequent
|
||||
backup. This even works if bytes are inserted or removed at arbitrary
|
||||
positions within the file.
|
||||
|
||||
Threat Model
|
||||
============
|
||||
|
||||
The design goals for restic include being able to securely store backups
|
||||
in a location that is not completely trusted, e.g. a shared system where
|
||||
others can potentially access the files or (in the case of the system
|
||||
administrator) even modify or delete them.
|
||||
|
||||
General assumptions:
|
||||
|
||||
- The host system a backup is created on is trusted. This is the most
|
||||
basic requirement, and essential for creating trustworthy backups.
|
||||
|
||||
The restic backup program guarantees the following:
|
||||
|
||||
- Accessing the unencrypted content of stored files and metadata should
|
||||
not be possible without a password for the repository. Everything
|
||||
except the metadata included for informational purposes in the key
|
||||
files is encrypted and authenticated.
|
||||
|
||||
- Modifications (intentional or unintentional) can be detected
|
||||
automatically on several layers:
|
||||
|
||||
1. For all accesses of data stored in the repository it is checked
|
||||
whether the cryptographic hash of the contents matches the storage
|
||||
ID (the file's name). This way, modifications (bad RAM, broken
|
||||
harddisk) can be detected easily.
|
||||
|
||||
2. Before decrypting any data, the MAC on the encrypted data is
|
||||
checked. If there has been a modification, the MAC check will
|
||||
fail. This step happens even before the data is decrypted, so data
|
||||
that has been tampered with is not decrypted at all.
|
||||
|
||||
However, the restic backup program is not designed to protect against
|
||||
attackers deleting files at the storage location. There is nothing that
|
||||
can be done about this. If this needs to be guaranteed, get a secure
|
||||
location without any access from third parties. If you assume that
|
||||
attackers have write access to your files at the storage location,
|
||||
attackers are able to figure out (e.g. based on the timestamps of the
|
||||
stored files) which files belong to what snapshot. When only these files
|
||||
are deleted, the particular snapshot vanished and all snapshots
|
||||
depending on data that has been added in the snapshot cannot be restored
|
||||
completely. Restic is not designed to detect this attack.
|
||||
|
||||
************
|
||||
Local Cache
|
||||
***********
|
||||
|
||||
In order to speed up certain operations, restic manages a local cache of data.
|
||||
This document describes the data structures for the local cache with version 1.
|
||||
|
||||
Versions
|
||||
========
|
||||
|
||||
The cache directory is selected according to the `XDG base dir specification
|
||||
<http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html>`__.
|
||||
Each repository has its own cache sub-directory, consting of the repository ID
|
||||
which is chosen at ``init``. All cache directories for different repos are
|
||||
independent of each other.
|
||||
|
||||
The cache dir for a repo contains a file named ``version``, which contains a
|
||||
single ASCII integer line that stands for the current version of the cache. If
|
||||
a lower version number is found the cache is recreated with the current
|
||||
version. If a higher version number is found the cache is ignored and left as
|
||||
is.
|
||||
|
||||
Snapshots, Data and Indexes
|
||||
===========================
|
||||
|
||||
Snapshot, Data and Index files are cached in the sub-directories ``snapshots``,
|
||||
``data`` and ``index``, as read from the repository.
|
||||
|
||||
Expiry
|
||||
======
|
||||
|
||||
Whenever a cache directory for a repo is used, that directory's modification
|
||||
timestamp is updated to the current time. By looking at the modification
|
||||
timestamps of the repo cache directories it is easy to decide which directories
|
||||
are old and haven't been used in a long time. Those are probably stale and can
|
||||
be removed.
|
||||
|
||||
|
||||
************
|
||||
REST Backend
|
||||
************
|
||||
|
||||
Restic can interact with HTTP Backend that respects the following REST
|
||||
API.
|
||||
|
||||
The following values are valid for ``{type}``:
|
||||
|
||||
* ``data``
|
||||
* ``keys``
|
||||
* ``locks``
|
||||
* ``snapshots``
|
||||
* ``index``
|
||||
* ``config``
|
||||
|
||||
The API version is selected via the ``Accept`` HTTP header in the request. The
|
||||
following values are defined:
|
||||
|
||||
* ``application/vnd.x.restic.rest.v1`` or empty: Select API version 1
|
||||
* ``application/vnd.x.restic.rest.v2``: Select API version 2
|
||||
|
||||
The server will respond with the value of the highest version it supports in
|
||||
the ``Content-Type`` HTTP response header for the HTTP requests which should
|
||||
return JSON. Any different value for this header means API version 1.
|
||||
|
||||
The placeholder ``{path}`` in this document is a path to the repository, so
|
||||
that multiple different repositories can be accessed. The default path is
|
||||
``/``. The path must end with a slash.
|
||||
|
||||
POST {path}?create=true
|
||||
=======================
|
||||
|
||||
This request is used to initially create a new repository. The server
|
||||
responds with "200 OK" if the repository structure was created
|
||||
successfully or already exists, otherwise an error is returned.
|
||||
|
||||
DELETE {path}
|
||||
=============
|
||||
|
||||
Deletes the repository on the server side. The server responds with "200
|
||||
OK" if the repository was successfully removed. If this function is not
|
||||
implemented the server returns "501 Not Implemented", if this it is
|
||||
denied by the server it returns "403 Forbidden".
|
||||
|
||||
HEAD {path}/config
|
||||
==================
|
||||
|
||||
Returns "200 OK" if the repository has a configuration, an HTTP error
|
||||
otherwise.
|
||||
|
||||
GET {path}/config
|
||||
=================
|
||||
|
||||
Returns the content of the configuration file if the repository has a
|
||||
configuration, an HTTP error otherwise.
|
||||
|
||||
Response format: binary/octet-stream
|
||||
|
||||
POST {path}/config
|
||||
==================
|
||||
|
||||
Returns "200 OK" if the configuration of the request body has been
|
||||
saved, an HTTP error otherwise.
|
||||
|
||||
GET {path}/{type}/
|
||||
==================
|
||||
|
||||
API version 1
|
||||
-------------
|
||||
|
||||
Returns a JSON array containing the names of all the blobs stored for a given
|
||||
type, example:
|
||||
|
||||
.. code:: json
|
||||
|
||||
[
|
||||
"245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
|
||||
"85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
|
||||
"8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
|
||||
"e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649"
|
||||
]
|
||||
|
||||
API version 2
|
||||
-------------
|
||||
|
||||
Returns a JSON array containing an object for each file of the given type. The
|
||||
objects have two keys: ``name`` for the file name, and ``size`` for the size in
|
||||
bytes.
|
||||
|
||||
.. code:: json
|
||||
|
||||
[
|
||||
{
|
||||
"name": "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
|
||||
"size": 2341058
|
||||
},
|
||||
{
|
||||
"name": "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
|
||||
"size": 2908900
|
||||
},
|
||||
{
|
||||
"name": "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
|
||||
"size": 3030712
|
||||
},
|
||||
{
|
||||
"name": "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649",
|
||||
"size": 2804
|
||||
}
|
||||
]
|
||||
|
||||
HEAD {path}/{type}/{name}
|
||||
=========================
|
||||
|
||||
Returns "200 OK" if the blob with the given name and type is stored in
|
||||
the repository, "404 not found" otherwise. If the blob exists, the HTTP
|
||||
header ``Content-Length`` is set to the file size.
|
||||
|
||||
GET {path}/{type}/{name}
|
||||
========================
|
||||
|
||||
Returns the content of the blob with the given name and type if it is
|
||||
stored in the repository, "404 not found" otherwise.
|
||||
|
||||
If the request specifies a partial read with a Range header field, then
|
||||
the status code of the response is 206 instead of 200 and the response
|
||||
only contains the specified range.
|
||||
|
||||
Response format: binary/octet-stream
|
||||
|
||||
POST {path}/{type}/{name}
|
||||
=========================
|
||||
|
||||
Saves the content of the request body as a blob with the given name and
|
||||
type, an HTTP error otherwise.
|
||||
|
||||
Request format: binary/octet-stream
|
||||
|
||||
DELETE {path}/{type}/{name}
|
||||
===========================
|
||||
|
||||
Returns "200 OK" if the blob with the given name and type has been
|
||||
deleted from the repository, an HTTP error otherwise.
|
||||
|
||||
|
||||
.. include:: design.rst
|
||||
.. include:: cache.rst
|
||||
.. include:: REST_backend.rst
|
||||
|
145
doc/REST_backend.rst
Normal file
145
doc/REST_backend.rst
Normal file
@ -0,0 +1,145 @@
|
||||
************
|
||||
REST Backend
|
||||
************
|
||||
|
||||
Restic can interact with HTTP Backend that respects the following REST
|
||||
API.
|
||||
|
||||
The following values are valid for ``{type}``:
|
||||
|
||||
* ``data``
|
||||
* ``keys``
|
||||
* ``locks``
|
||||
* ``snapshots``
|
||||
* ``index``
|
||||
* ``config``
|
||||
|
||||
The API version is selected via the ``Accept`` HTTP header in the request. The
|
||||
following values are defined:
|
||||
|
||||
* ``application/vnd.x.restic.rest.v1`` or empty: Select API version 1
|
||||
* ``application/vnd.x.restic.rest.v2``: Select API version 2
|
||||
|
||||
The server will respond with the value of the highest version it supports in
|
||||
the ``Content-Type`` HTTP response header for the HTTP requests which should
|
||||
return JSON. Any different value for this header means API version 1.
|
||||
|
||||
The placeholder ``{path}`` in this document is a path to the repository, so
|
||||
that multiple different repositories can be accessed. The default path is
|
||||
``/``. The path must end with a slash.
|
||||
|
||||
POST {path}?create=true
|
||||
=======================
|
||||
|
||||
This request is used to initially create a new repository. The server
|
||||
responds with "200 OK" if the repository structure was created
|
||||
successfully or already exists, otherwise an error is returned.
|
||||
|
||||
DELETE {path}
|
||||
=============
|
||||
|
||||
Deletes the repository on the server side. The server responds with "200
|
||||
OK" if the repository was successfully removed. If this function is not
|
||||
implemented the server returns "501 Not Implemented", if this it is
|
||||
denied by the server it returns "403 Forbidden".
|
||||
|
||||
HEAD {path}/config
|
||||
==================
|
||||
|
||||
Returns "200 OK" if the repository has a configuration, an HTTP error
|
||||
otherwise.
|
||||
|
||||
GET {path}/config
|
||||
=================
|
||||
|
||||
Returns the content of the configuration file if the repository has a
|
||||
configuration, an HTTP error otherwise.
|
||||
|
||||
Response format: binary/octet-stream
|
||||
|
||||
POST {path}/config
|
||||
==================
|
||||
|
||||
Returns "200 OK" if the configuration of the request body has been
|
||||
saved, an HTTP error otherwise.
|
||||
|
||||
GET {path}/{type}/
|
||||
==================
|
||||
|
||||
API version 1
|
||||
-------------
|
||||
|
||||
Returns a JSON array containing the names of all the blobs stored for a given
|
||||
type, example:
|
||||
|
||||
.. code:: json
|
||||
|
||||
[
|
||||
"245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
|
||||
"85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
|
||||
"8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
|
||||
"e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649"
|
||||
]
|
||||
|
||||
API version 2
|
||||
-------------
|
||||
|
||||
Returns a JSON array containing an object for each file of the given type. The
|
||||
objects have two keys: ``name`` for the file name, and ``size`` for the size in
|
||||
bytes.
|
||||
|
||||
.. code:: json
|
||||
|
||||
[
|
||||
{
|
||||
"name": "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
|
||||
"size": 2341058
|
||||
},
|
||||
{
|
||||
"name": "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
|
||||
"size": 2908900
|
||||
},
|
||||
{
|
||||
"name": "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
|
||||
"size": 3030712
|
||||
},
|
||||
{
|
||||
"name": "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649",
|
||||
"size": 2804
|
||||
}
|
||||
]
|
||||
|
||||
HEAD {path}/{type}/{name}
|
||||
=========================
|
||||
|
||||
Returns "200 OK" if the blob with the given name and type is stored in
|
||||
the repository, "404 not found" otherwise. If the blob exists, the HTTP
|
||||
header ``Content-Length`` is set to the file size.
|
||||
|
||||
GET {path}/{type}/{name}
|
||||
========================
|
||||
|
||||
Returns the content of the blob with the given name and type if it is
|
||||
stored in the repository, "404 not found" otherwise.
|
||||
|
||||
If the request specifies a partial read with a Range header field, then
|
||||
the status code of the response is 206 instead of 200 and the response
|
||||
only contains the specified range.
|
||||
|
||||
Response format: binary/octet-stream
|
||||
|
||||
POST {path}/{type}/{name}
|
||||
=========================
|
||||
|
||||
Saves the content of the request body as a blob with the given name and
|
||||
type, an HTTP error otherwise.
|
||||
|
||||
Request format: binary/octet-stream
|
||||
|
||||
DELETE {path}/{type}/{name}
|
||||
===========================
|
||||
|
||||
Returns "200 OK" if the blob with the given name and type has been
|
||||
deleted from the repository, an HTTP error otherwise.
|
||||
|
||||
|
36
doc/cache.rst
Normal file
36
doc/cache.rst
Normal file
@ -0,0 +1,36 @@
|
||||
***********
|
||||
Local Cache
|
||||
***********
|
||||
|
||||
In order to speed up certain operations, restic manages a local cache of data.
|
||||
This document describes the data structures for the local cache with version 1.
|
||||
|
||||
Versions
|
||||
========
|
||||
|
||||
The cache directory is selected according to the `XDG base dir specification
|
||||
<http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html>`__.
|
||||
Each repository has its own cache sub-directory, consting of the repository ID
|
||||
which is chosen at ``init``. All cache directories for different repos are
|
||||
independent of each other.
|
||||
|
||||
The cache dir for a repo contains a file named ``version``, which contains a
|
||||
single ASCII integer line that stands for the current version of the cache. If
|
||||
a lower version number is found the cache is recreated with the current
|
||||
version. If a higher version number is found the cache is ignored and left as
|
||||
is.
|
||||
|
||||
Snapshots, Data and Indexes
|
||||
===========================
|
||||
|
||||
Snapshot, Data and Index files are cached in the sub-directories ``snapshots``,
|
||||
``data`` and ``index``, as read from the repository.
|
||||
|
||||
Expiry
|
||||
======
|
||||
|
||||
Whenever a cache directory for a repo is used, that directory's modification
|
||||
timestamp is updated to the current time. By looking at the modification
|
||||
timestamps of the repo cache directories it is easy to decide which directories
|
||||
are old and haven't been used in a long time. Those are probably stale and can
|
||||
be removed.
|
608
doc/design.rst
Normal file
608
doc/design.rst
Normal file
@ -0,0 +1,608 @@
|
||||
|
||||
Terminology
|
||||
===========
|
||||
|
||||
This section introduces terminology used in this document.
|
||||
|
||||
*Repository*: All data produced during a backup is sent to and stored in
|
||||
a repository in a structured form, for example in a file system
|
||||
hierarchy with several subdirectories. A repository implementation must
|
||||
be able to fulfill a number of operations, e.g. list the contents.
|
||||
|
||||
*Blob*: A Blob combines a number of data bytes with identifying
|
||||
information like the SHA-256 hash of the data and its length.
|
||||
|
||||
*Pack*: A Pack combines one or more Blobs, e.g. in a single file.
|
||||
|
||||
*Snapshot*: A Snapshot stands for the state of a file or directory that
|
||||
has been backed up at some point in time. The state here means the
|
||||
content and meta data like the name and modification time for the file
|
||||
or the directory and its contents.
|
||||
|
||||
*Storage ID*: A storage ID is the SHA-256 hash of the content stored in
|
||||
the repository. This ID is required in order to load the file from the
|
||||
repository.
|
||||
|
||||
Repository Format
|
||||
=================
|
||||
|
||||
All data is stored in a restic repository. A repository is able to store
|
||||
data of several different types, which can later be requested based on
|
||||
an ID. This so-called "storage ID" is the SHA-256 hash of the content of
|
||||
a file. All files in a repository are only written once and never
|
||||
modified afterwards. This allows accessing and even writing to the
|
||||
repository with multiple clients in parallel. Only the ``prune`` operation
|
||||
removes data from the repository.
|
||||
|
||||
Repositories consist of several directories and a top-level file called
|
||||
``config``. For all other files stored in the repository, the name for
|
||||
the file is the lower case hexadecimal representation of the storage ID,
|
||||
which is the SHA-256 hash of the file's contents. This allows for easy
|
||||
verification of files for accidental modifications, like disk read
|
||||
errors, by simply running the program ``sha256sum`` on the file and
|
||||
comparing its output to the file name. If the prefix of a filename is
|
||||
unique amongst all the other files in the same directory, the prefix may
|
||||
be used instead of the complete filename.
|
||||
|
||||
Apart from the files stored within the ``keys`` directory, all files are
|
||||
encrypted with AES-256 in counter mode (CTR). The integrity of the
|
||||
encrypted data is secured by a Poly1305-AES message authentication code
|
||||
(sometimes also referred to as a "signature").
|
||||
|
||||
In the first 16 bytes of each encrypted file the initialisation vector
|
||||
(IV) is stored. It is followed by the encrypted data and completed by
|
||||
the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The
|
||||
complete encryption overhead is 32 bytes. For each file, a new random IV
|
||||
is selected.
|
||||
|
||||
The file ``config`` is encrypted this way and contains a JSON document
|
||||
like the following:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"version": 1,
|
||||
"id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
|
||||
"chunker_polynomial": "25b468838dcb75"
|
||||
}
|
||||
|
||||
After decryption, restic first checks that the version field contains a
|
||||
version number that it understands, otherwise it aborts. At the moment,
|
||||
the version is expected to be 1. The field ``id`` holds a unique ID
|
||||
which consists of 32 random bytes, encoded in hexadecimal. This uniquely
|
||||
identifies the repository, regardless if it is accessed via SFTP or
|
||||
locally. The field ``chunker_polynomial`` contains a parameter that is
|
||||
used for splitting large files into smaller chunks (see below).
|
||||
|
||||
Repository Layout
|
||||
-----------------
|
||||
|
||||
The ``local`` and ``sftp`` backends are implemented using files and
|
||||
directories stored in a file system. The directory layout is the same
|
||||
for both backend types.
|
||||
|
||||
The basic layout of a repository stored in a ``local`` or ``sftp``
|
||||
backend is shown here:
|
||||
|
||||
::
|
||||
|
||||
/tmp/restic-repo
|
||||
├── config
|
||||
├── data
|
||||
│ ├── 21
|
||||
│ │ └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
|
||||
│ ├── 32
|
||||
│ │ └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
|
||||
│ ├── 59
|
||||
│ │ └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
|
||||
│ ├── 73
|
||||
│ │ └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
|
||||
│ [...]
|
||||
├── index
|
||||
│ ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
|
||||
│ └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
|
||||
├── keys
|
||||
│ └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
|
||||
├── locks
|
||||
├── snapshots
|
||||
│ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
|
||||
└── tmp
|
||||
|
||||
A local repository can be initialized with the ``restic init`` command,
|
||||
e.g.:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo init
|
||||
|
||||
The local and sftp backends will auto-detect and accept all layouts described
|
||||
in the following sections, so that remote repositories mounted locally e.g. via
|
||||
fuse can be accessed. The layout auto-detection can be overridden by specifying
|
||||
the option ``-o local.layout=default``, valid values are ``default`` and
|
||||
``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the
|
||||
s3 backend ``s3.layout``.
|
||||
|
||||
S3 Legacy Layout
|
||||
----------------
|
||||
|
||||
Unfortunately during development the AWS S3 backend uses slightly different
|
||||
paths (directory names use singular instead of plural for ``key``,
|
||||
``lock``, and ``snapshot`` files), and the data files are stored directly below
|
||||
the ``data`` directory. The S3 Legacy repository layout looks like this:
|
||||
|
||||
::
|
||||
|
||||
/config
|
||||
/data
|
||||
├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
|
||||
├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
|
||||
├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
|
||||
├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
|
||||
[...]
|
||||
/index
|
||||
├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
|
||||
└── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
|
||||
/key
|
||||
└── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
|
||||
/lock
|
||||
/snapshot
|
||||
└── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
|
||||
|
||||
The S3 backend understands and accepts both forms, new backends are
|
||||
always created with the default layout for compatibility reasons.
|
||||
|
||||
Pack Format
|
||||
===========
|
||||
|
||||
All files in the repository except Key and Pack files just contain raw
|
||||
data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one
|
||||
or more Blobs of data.
|
||||
|
||||
A Pack's structure is as follows:
|
||||
|
||||
::
|
||||
|
||||
EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length
|
||||
|
||||
At the end of the Pack file is a header, which describes the content.
|
||||
The header is encrypted and authenticated. ``Header_Length`` is the
|
||||
length of the encrypted header encoded as a four byte integer in
|
||||
little-endian encoding. Placing the header at the end of a file allows
|
||||
writing the blobs in a continuous stream as soon as they are read during
|
||||
the backup phase. This reduces code complexity and avoids having to
|
||||
re-write a file once the pack is complete and the content and length of
|
||||
the header is known.
|
||||
|
||||
All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are
|
||||
authenticated and encrypted independently. This enables repository
|
||||
reorganisation without having to touch the encrypted Blobs. In addition
|
||||
it also allows efficient indexing, for only the header needs to be read
|
||||
in order to find out which Blobs are contained in the Pack. Since the
|
||||
header is authenticated, authenticity of the header can be checked
|
||||
without having to read the complete Pack.
|
||||
|
||||
After decryption, a Pack's header consists of the following elements:
|
||||
|
||||
::
|
||||
|
||||
Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) ||
|
||||
[...]
|
||||
Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) ||
|
||||
|
||||
This is enough to calculate the offsets for all the Blobs in the Pack.
|
||||
Length is the length of a Blob as a four byte integer in little-endian
|
||||
format. The type field is a one byte field and labels the content of a
|
||||
blob according to the following table:
|
||||
|
||||
+--------+-----------+
|
||||
| Type | Meaning |
|
||||
+========+===========+
|
||||
| 0 | data |
|
||||
+--------+-----------+
|
||||
| 1 | tree |
|
||||
+--------+-----------+
|
||||
|
||||
All other types are invalid, more types may be added in the future.
|
||||
|
||||
For reconstructing the index or parsing a pack without an index, first
|
||||
the last four bytes must be read in order to find the length of the
|
||||
header. Afterwards, the header can be read and parsed, which yields all
|
||||
plaintext hashes, types, offsets and lengths of all included blobs.
|
||||
|
||||
Indexing
|
||||
========
|
||||
|
||||
Index files contain information about Data and Tree Blobs and the Packs
|
||||
they are contained in and store this information in the repository. When
|
||||
the local cached index is not accessible any more, the index files can
|
||||
be downloaded and used to reconstruct the index. The files are encrypted
|
||||
and authenticated like Data and Tree Blobs, so the outer structure is
|
||||
``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON
|
||||
document like the following:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"supersedes": [
|
||||
"ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452"
|
||||
],
|
||||
"packs": [
|
||||
{
|
||||
"id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c",
|
||||
"blobs": [
|
||||
{
|
||||
"id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
|
||||
"type": "data",
|
||||
"offset": 0,
|
||||
"length": 25
|
||||
},{
|
||||
"id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
|
||||
"type": "tree",
|
||||
"offset": 38,
|
||||
"length": 100
|
||||
},
|
||||
{
|
||||
"id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
|
||||
"type": "data",
|
||||
"offset": 150,
|
||||
"length": 123
|
||||
}
|
||||
]
|
||||
}, [...]
|
||||
]
|
||||
}
|
||||
|
||||
This JSON document lists Packs and the blobs contained therein. In this
|
||||
example, the Pack ``73d04e61`` contains two data Blobs and one Tree
|
||||
blob, the plaintext hashes are listed afterwards.
|
||||
|
||||
The field ``supersedes`` lists the storage IDs of index files that have
|
||||
been replaced with the current index file. This happens when index files
|
||||
are repacked, for example when old snapshots are removed and Packs are
|
||||
recombined.
|
||||
|
||||
There may be an arbitrary number of index files, containing information
|
||||
on non-disjoint sets of Packs. The number of packs described in a single
|
||||
file is chosen so that the file size is kept below 8 MiB.
|
||||
|
||||
Keys, Encryption and MAC
|
||||
========================
|
||||
|
||||
All data stored by restic in the repository is encrypted with AES-256 in
|
||||
counter mode and authenticated using Poly1305-AES. For encrypting new
|
||||
data first 16 bytes are read from a cryptographically secure
|
||||
pseudorandom number generator as a random nonce. This is used both as
|
||||
the IV for counter mode and the nonce for Poly1305. This operation needs
|
||||
three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and
|
||||
a 16 byte key for Poly1305. For details see the original paper `The
|
||||
Poly1305-AES message-authentication
|
||||
code <http://cr.yp.to/mac/poly1305-20050329.pdf>`__ by Dan Bernstein.
|
||||
The data is then encrypted with AES-256 and afterwards a message
|
||||
authentication code (MAC) is computed over the ciphertext, everything is
|
||||
then stored as IV \|\| CIPHERTEXT \|\| MAC.
|
||||
|
||||
The directory ``keys`` contains key files. These are simple JSON
|
||||
documents which contain all data that is needed to derive the
|
||||
repository's master encryption and message authentication keys from a
|
||||
user's password. The JSON document from the repository can be
|
||||
pretty-printed for example by using the Python module ``json``
|
||||
(shortened to increase readability):
|
||||
|
||||
::
|
||||
|
||||
$ python -mjson.tool /tmp/restic-repo/keys/b02de82*
|
||||
{
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0"
|
||||
"kdf": "scrypt",
|
||||
"N": 65536,
|
||||
"r": 8,
|
||||
"p": 1,
|
||||
"created": "2015-01-02T18:10:13.48307196+01:00",
|
||||
"data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==",
|
||||
"salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==",
|
||||
}
|
||||
|
||||
When the repository is opened by restic, the user is prompted for the
|
||||
repository password. This is then used with ``scrypt``, a key derivation
|
||||
function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and
|
||||
``salt``) to derive 64 key bytes. The first 32 bytes are used as the
|
||||
encryption key (for AES-256) and the last 32 bytes are used as the
|
||||
message authentication key (for Poly1305-AES). These last 32 bytes are
|
||||
divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key
|
||||
``r``. The key ``r`` is then masked for use with Poly1305 (see the paper
|
||||
for details).
|
||||
|
||||
Those keys are used to authenticate and decrypt the bytes contained in
|
||||
the JSON field ``data`` with AES-256 and Poly1305-AES as if they were
|
||||
any other blob (after removing the Base64 encoding). If the
|
||||
password is incorrect or the key file has been tampered with, the
|
||||
computed MAC will not match the last 16 bytes of the data, and restic
|
||||
exits with an error. Otherwise, the data yields a JSON document
|
||||
which contains the master encryption and message authentication keys for
|
||||
this repository (encoded in Base64). The command
|
||||
``restic cat masterkey`` can be used as follows to decrypt and
|
||||
pretty-print the master key:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat masterkey
|
||||
{
|
||||
"mac": {
|
||||
"k": "evFWd9wWlndL9jc501268g==",
|
||||
"r": "E9eEDnSJZgqwTOkDtOp+Dw=="
|
||||
},
|
||||
"encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=",
|
||||
}
|
||||
|
||||
All data in the repository is encrypted and authenticated with these
|
||||
master keys. For encryption, the AES-256 algorithm in Counter mode is
|
||||
used. For message authentication, Poly1305-AES is used as described
|
||||
above.
|
||||
|
||||
A repository can have several different passwords, with a key file for
|
||||
each. This way, the password can be changed without having to re-encrypt
|
||||
all data.
|
||||
|
||||
Snapshots
|
||||
=========
|
||||
|
||||
A snapshot represents a directory with all files and sub-directories at
|
||||
a given point in time. For each backup that is made, a new snapshot is
|
||||
created. A snapshot is a JSON document that is stored in an encrypted
|
||||
file below the directory ``snapshots`` in the repository. The filename
|
||||
is the storage ID. This string is unique and used within restic to
|
||||
uniquely identify a snapshot.
|
||||
|
||||
The command ``restic cat snapshot`` can be used as follows to decrypt
|
||||
and pretty-print the contents of a snapshot file:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat snapshot 251c2e58
|
||||
enter password for repository:
|
||||
{
|
||||
"time": "2015-01-02T18:10:50.895208559+01:00",
|
||||
"tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
|
||||
"dir": "/tmp/testdata",
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"tags": [
|
||||
"NL"
|
||||
]
|
||||
}
|
||||
|
||||
Here it can be seen that this snapshot represents the contents of the
|
||||
directory ``/tmp/testdata``. The most important field is ``tree``. When
|
||||
the meta data (e.g. the tags) of a snapshot change, the snapshot needs
|
||||
to be re-encrypted and saved. This will change the storage ID, so in
|
||||
order to relate these seemingly different snapshots, a field
|
||||
``original`` is introduced which contains the ID of the original
|
||||
snapshot, e.g. after adding the tag ``DE`` to the snapshot above it
|
||||
becomes:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat snapshot 22a5af1b
|
||||
enter password for repository:
|
||||
{
|
||||
"time": "2015-01-02T18:10:50.895208559+01:00",
|
||||
"tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
|
||||
"dir": "/tmp/testdata",
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"tags": [
|
||||
"NL",
|
||||
"DE"
|
||||
],
|
||||
"original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837"
|
||||
}
|
||||
|
||||
Once introduced, the ``original`` field is not modified when the
|
||||
snapshot's meta data is changed again.
|
||||
|
||||
All content within a restic repository is referenced according to its
|
||||
SHA-256 hash. Before saving, each file is split into variable sized
|
||||
Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered
|
||||
list which then represents the content of the file.
|
||||
|
||||
In order to relate these plaintext hashes to the actual location within
|
||||
a Pack file , an index is used. If the index is not available, the
|
||||
header of all data Blobs can be read.
|
||||
|
||||
Trees and Data
|
||||
==============
|
||||
|
||||
A snapshot references a tree by the SHA-256 hash of the JSON string
|
||||
representation of its contents. Trees and data are saved in pack files
|
||||
in a subdirectory of the directory ``data``.
|
||||
|
||||
The command ``restic cat blob`` can be used to inspect the tree
|
||||
referenced above (piping the output of the command to ``jq .`` so that
|
||||
the JSON is indented):
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq .
|
||||
enter password for repository:
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"name": "testdata",
|
||||
"type": "dir",
|
||||
"mode": 493,
|
||||
"mtime": "2014-12-22T14:47:59.912418701+01:00",
|
||||
"atime": "2014-12-06T17:49:21.748468803+01:00",
|
||||
"ctime": "2014-12-22T14:47:59.912418701+01:00",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"user": "fd0",
|
||||
"inode": 409704562,
|
||||
"content": null,
|
||||
"subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
A tree contains a list of entries (in the field ``nodes``) which contain
|
||||
meta data like a name and timestamps. When the entry references a
|
||||
directory, the field ``subtree`` contains the plain text ID of another
|
||||
tree object.
|
||||
|
||||
When the command ``restic cat blob`` is used, the plaintext ID is needed
|
||||
to print a tree. The tree referenced above can be dumped as follows:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc
|
||||
enter password for repository:
|
||||
{
|
||||
"nodes": [
|
||||
{
|
||||
"name": "testfile",
|
||||
"type": "file",
|
||||
"mode": 420,
|
||||
"mtime": "2014-12-06T17:50:23.34513538+01:00",
|
||||
"atime": "2014-12-06T17:50:23.338468713+01:00",
|
||||
"ctime": "2014-12-06T17:50:23.34513538+01:00",
|
||||
"uid": 1000,
|
||||
"gid": 100,
|
||||
"user": "fd0",
|
||||
"inode": 416863351,
|
||||
"size": 1234,
|
||||
"links": 1,
|
||||
"content": [
|
||||
"50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d"
|
||||
]
|
||||
},
|
||||
[...]
|
||||
]
|
||||
}
|
||||
|
||||
This tree contains a file entry. This time, the ``subtree`` field is not
|
||||
present and the ``content`` field contains a list with one plain text
|
||||
SHA-256 hash.
|
||||
|
||||
The command ``restic cat blob`` can also be used to extract and decrypt
|
||||
data given a plaintext ID, e.g. for the data mentioned above:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum
|
||||
enter password for repository:
|
||||
50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d -
|
||||
|
||||
As can be seen from the output of the program ``sha256sum``, the hash
|
||||
matches the plaintext hash from the map included in the tree above, so
|
||||
the correct data has been returned.
|
||||
|
||||
Locks
|
||||
=====
|
||||
|
||||
The restic repository structure is designed in a way that allows
|
||||
parallel access of multiple instance of restic and even parallel writes.
|
||||
However, there are some functions that work more efficient or even
|
||||
require exclusive access of the repository. In order to implement these
|
||||
functions, restic processes are required to create a lock on the
|
||||
repository before doing anything.
|
||||
|
||||
Locks come in two types: Exclusive and non-exclusive locks. At most one
|
||||
process can have an exclusive lock on the repository, and during that
|
||||
time there must not be any other locks (exclusive and non-exclusive).
|
||||
There may be multiple non-exclusive locks in parallel.
|
||||
|
||||
A lock is a file in the subdir ``locks`` whose filename is the storage
|
||||
ID of the contents. It is encrypted and authenticated the same way as
|
||||
other files in the repository and contains the following JSON structure:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"time": "2015-06-27T12:18:51.759239612+02:00",
|
||||
"exclusive": false,
|
||||
"hostname": "kasimir",
|
||||
"username": "fd0",
|
||||
"pid": 13607,
|
||||
"uid": 1000,
|
||||
"gid": 100
|
||||
}
|
||||
|
||||
The field ``exclusive`` defines the type of lock. When a new lock is to
|
||||
be created, restic checks all locks in the repository. When a lock is
|
||||
found, it is tested if the lock is stale, which is the case for locks
|
||||
with timestamps older than 30 minutes. If the lock was created on the
|
||||
same machine, even for younger locks it is tested whether the process is
|
||||
still alive by sending a signal to it. If that fails, restic assumes
|
||||
that the process is dead and considers the lock to be stale.
|
||||
|
||||
When a new lock is to be created and no other conflicting locks are
|
||||
detected, restic creates a new lock, waits, and checks if other locks
|
||||
appeared in the repository. Depending on the type of the other locks and
|
||||
the lock to be created, restic either continues or fails.
|
||||
|
||||
Backups and Deduplication
|
||||
=========================
|
||||
|
||||
For creating a backup, restic scans the source directory for all files,
|
||||
sub-directories and other entries. The data from each file is split into
|
||||
variable length Blobs cut at offsets defined by a sliding window of 64
|
||||
byte. The implementation uses Rabin Fingerprints for implementing this
|
||||
Content Defined Chunking (CDC). An irreducible polynomial is selected at
|
||||
random and saved in the file ``config`` when a repository is
|
||||
initialized, so that watermark attacks are much harder.
|
||||
|
||||
Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB
|
||||
in size. The implementation aims for 1 MiB Blob size on average.
|
||||
|
||||
For modified files, only modified Blobs have to be saved in a subsequent
|
||||
backup. This even works if bytes are inserted or removed at arbitrary
|
||||
positions within the file.
|
||||
|
||||
Threat Model
|
||||
============
|
||||
|
||||
The design goals for restic include being able to securely store backups
|
||||
in a location that is not completely trusted, e.g. a shared system where
|
||||
others can potentially access the files or (in the case of the system
|
||||
administrator) even modify or delete them.
|
||||
|
||||
General assumptions:
|
||||
|
||||
- The host system a backup is created on is trusted. This is the most
|
||||
basic requirement, and essential for creating trustworthy backups.
|
||||
|
||||
The restic backup program guarantees the following:
|
||||
|
||||
- Accessing the unencrypted content of stored files and metadata should
|
||||
not be possible without a password for the repository. Everything
|
||||
except the metadata included for informational purposes in the key
|
||||
files is encrypted and authenticated.
|
||||
|
||||
- Modifications (intentional or unintentional) can be detected
|
||||
automatically on several layers:
|
||||
|
||||
1. For all accesses of data stored in the repository it is checked
|
||||
whether the cryptographic hash of the contents matches the storage
|
||||
ID (the file's name). This way, modifications (bad RAM, broken
|
||||
harddisk) can be detected easily.
|
||||
|
||||
2. Before decrypting any data, the MAC on the encrypted data is
|
||||
checked. If there has been a modification, the MAC check will
|
||||
fail. This step happens even before the data is decrypted, so data
|
||||
that has been tampered with is not decrypted at all.
|
||||
|
||||
However, the restic backup program is not designed to protect against
|
||||
attackers deleting files at the storage location. There is nothing that
|
||||
can be done about this. If this needs to be guaranteed, get a secure
|
||||
location without any access from third parties. If you assume that
|
||||
attackers have write access to your files at the storage location,
|
||||
attackers are able to figure out (e.g. based on the timestamps of the
|
||||
stored files) which files belong to what snapshot. When only these files
|
||||
are deleted, the particular snapshot vanished and all snapshots
|
||||
depending on data that has been added in the snapshot cannot be restored
|
||||
completely. Restic is not designed to detect this attack.
|
||||
|
Loading…
Reference in New Issue
Block a user