doc: Split references out into smaller files

Closes #1852
2024-11-24 13:47:42 +00:00 · 2018-06-18 22:17:30 +02:00 · 2018-06-18 22:17:30 +02:00 · ddf2065ce2
commit ddf2065ce2
parent 228a970540
4 changed files with 792 additions and 790 deletions
--- a/doc/100_references.rst
+++ b/doc/100_references.rst
@ -18,793 +18,6 @@ References
 Design
 ******

-Terminology
-===========
-
-This section introduces terminology used in this document.
-
-*Repository*: All data produced during a backup is sent to and stored in
-a repository in a structured form, for example in a file system
-hierarchy with several subdirectories. A repository implementation must
-be able to fulfill a number of operations, e.g. list the contents.
-
-*Blob*: A Blob combines a number of data bytes with identifying
-information like the SHA-256 hash of the data and its length.
-
-*Pack*: A Pack combines one or more Blobs, e.g. in a single file.
-
-*Snapshot*: A Snapshot stands for the state of a file or directory that
-has been backed up at some point in time. The state here means the
-content and meta data like the name and modification time for the file
-or the directory and its contents.
-
-*Storage ID*: A storage ID is the SHA-256 hash of the content stored in
-the repository. This ID is required in order to load the file from the
-repository.
-
-Repository Format
-=================
-
-All data is stored in a restic repository. A repository is able to store
-data of several different types, which can later be requested based on
-an ID. This so-called "storage ID" is the SHA-256 hash of the content of
-a file. All files in a repository are only written once and never
-modified afterwards. This allows accessing and even writing to the
-repository with multiple clients in parallel. Only the ``prune`` operation
-removes data from the repository.
-
-Repositories consist of several directories and a top-level file called
-``config``. For all other files stored in the repository, the name for
-the file is the lower case hexadecimal representation of the storage ID,
-which is the SHA-256 hash of the file's contents. This allows for easy
-verification of files for accidental modifications, like disk read
-errors, by simply running the program ``sha256sum`` on the file and
-comparing its output to the file name. If the prefix of a filename is
-unique amongst all the other files in the same directory, the prefix may
-be used instead of the complete filename.
-
-Apart from the files stored within the ``keys`` directory, all files are
-encrypted with AES-256 in counter mode (CTR). The integrity of the
-encrypted data is secured by a Poly1305-AES message authentication code
-(sometimes also referred to as a "signature").
-
-In the first 16 bytes of each encrypted file the initialisation vector
-(IV) is stored. It is followed by the encrypted data and completed by
-the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The
-complete encryption overhead is 32 bytes. For each file, a new random IV
-is selected.
-
-The file ``config`` is encrypted this way and contains a JSON document
-like the following:
-
-.. code:: json
-
-    {
-      "version": 1,
-      "id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
-      "chunker_polynomial": "25b468838dcb75"
-    }
-
-After decryption, restic first checks that the version field contains a
-version number that it understands, otherwise it aborts. At the moment,
-the version is expected to be 1. The field ``id`` holds a unique ID
-which consists of 32 random bytes, encoded in hexadecimal. This uniquely
-identifies the repository, regardless if it is accessed via SFTP or
-locally. The field ``chunker_polynomial`` contains a parameter that is
-used for splitting large files into smaller chunks (see below).
-
-Repository Layout
-----------------
-
-The ``local`` and ``sftp`` backends are implemented using files and
-directories stored in a file system. The directory layout is the same
-for both backend types.
-
-The basic layout of a repository stored in a ``local`` or ``sftp``
-backend is shown here:
-
-::
-
-    /tmp/restic-repo
-    ├── config
-    ├── data
-    │   ├── 21
-    │   │   └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
-    │   ├── 32
-    │   │   └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
-    │   ├── 59
-    │   │   └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
-    │   ├── 73
-    │   │   └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
-    │   [...]
-    ├── index
-    │   ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
-    │   └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
-    ├── keys
-    │   └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
-    ├── locks
-    ├── snapshots
-    │   └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
-    └── tmp
-
-A local repository can be initialized with the ``restic init`` command,
-e.g.:
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo init
-
-The local and sftp backends will auto-detect and accept all layouts described
-in the following sections, so that remote repositories mounted locally e.g. via
-fuse can be accessed. The layout auto-detection can be overridden by specifying
-the option ``-o local.layout=default``, valid values are ``default`` and
-``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the
-s3 backend ``s3.layout``.
-
-S3 Legacy Layout
----------------
-
-Unfortunately during development the AWS S3 backend uses slightly different
-paths (directory names use singular instead of plural for ``key``,
-``lock``, and ``snapshot`` files), and the data files are stored directly below
-the ``data`` directory. The S3 Legacy repository layout looks like this:
-
-::
-
-    /config
-    /data
-     ├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
-     ├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
-     ├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
-     ├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
-    [...]
-    /index
-     ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
-     └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
-    /key
-     └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
-    /lock
-    /snapshot
-     └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
-
-The S3 backend understands and accepts both forms, new backends are
-always created with the default layout for compatibility reasons.
-
-Pack Format
-===========
-
-All files in the repository except Key and Pack files just contain raw
-data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one
-or more Blobs of data.
-
-A Pack's structure is as follows:
-
-::
-
-    EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length
-
-At the end of the Pack file is a header, which describes the content.
-The header is encrypted and authenticated. ``Header_Length`` is the
-length of the encrypted header encoded as a four byte integer in
-little-endian encoding. Placing the header at the end of a file allows
-writing the blobs in a continuous stream as soon as they are read during
-the backup phase. This reduces code complexity and avoids having to
-re-write a file once the pack is complete and the content and length of
-the header is known.
-
-All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are
-authenticated and encrypted independently. This enables repository
-reorganisation without having to touch the encrypted Blobs. In addition
-it also allows efficient indexing, for only the header needs to be read
-in order to find out which Blobs are contained in the Pack. Since the
-header is authenticated, authenticity of the header can be checked
-without having to read the complete Pack.
-
-After decryption, a Pack's header consists of the following elements:
-
-::
-
-    Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) ||
-    [...]
-    Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) ||
-
-This is enough to calculate the offsets for all the Blobs in the Pack.
-Length is the length of a Blob as a four byte integer in little-endian
-format. The type field is a one byte field and labels the content of a
-blob according to the following table:
-
-+--------+-----------+
-| Type   | Meaning   |
-+========+===========+
-| 0      | data      |
-+--------+-----------+
-| 1      | tree      |
-+--------+-----------+
-
-All other types are invalid, more types may be added in the future.
-
-For reconstructing the index or parsing a pack without an index, first
-the last four bytes must be read in order to find the length of the
-header. Afterwards, the header can be read and parsed, which yields all
-plaintext hashes, types, offsets and lengths of all included blobs.
-
-Indexing
-========
-
-Index files contain information about Data and Tree Blobs and the Packs
-they are contained in and store this information in the repository. When
-the local cached index is not accessible any more, the index files can
-be downloaded and used to reconstruct the index. The files are encrypted
-and authenticated like Data and Tree Blobs, so the outer structure is
-``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON
-document like the following:
-
-.. code:: json
-
-    {
-      "supersedes": [
-        "ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452"
-      ],
-      "packs": [
-        {
-          "id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c",
-          "blobs": [
-            {
-              "id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
-              "type": "data",
-              "offset": 0,
-              "length": 25
-            },{
-              "id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
-              "type": "tree",
-              "offset": 38,
-              "length": 100
-            },
-            {
-              "id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
-              "type": "data",
-              "offset": 150,
-              "length": 123
-            }
-          ]
-        }, [...]
-      ]
-    }
-
-This JSON document lists Packs and the blobs contained therein. In this
-example, the Pack ``73d04e61`` contains two data Blobs and one Tree
-blob, the plaintext hashes are listed afterwards.
-
-The field ``supersedes`` lists the storage IDs of index files that have
-been replaced with the current index file. This happens when index files
-are repacked, for example when old snapshots are removed and Packs are
-recombined.
-
-There may be an arbitrary number of index files, containing information
-on non-disjoint sets of Packs. The number of packs described in a single
-file is chosen so that the file size is kept below 8 MiB.
-
-Keys, Encryption and MAC
-========================
-
-All data stored by restic in the repository is encrypted with AES-256 in
-counter mode and authenticated using Poly1305-AES. For encrypting new
-data first 16 bytes are read from a cryptographically secure
-pseudorandom number generator as a random nonce. This is used both as
-the IV for counter mode and the nonce for Poly1305. This operation needs
-three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and
-a 16 byte key for Poly1305. For details see the original paper `The
-Poly1305-AES message-authentication
-code <http://cr.yp.to/mac/poly1305-20050329.pdf>`__ by Dan Bernstein.
-The data is then encrypted with AES-256 and afterwards a message
-authentication code (MAC) is computed over the ciphertext, everything is
-then stored as IV \|\| CIPHERTEXT \|\| MAC.
-
-The directory ``keys`` contains key files. These are simple JSON
-documents which contain all data that is needed to derive the
-repository's master encryption and message authentication keys from a
-user's password. The JSON document from the repository can be
-pretty-printed for example by using the Python module ``json``
-(shortened to increase readability):
-
-::
-
-    $ python -mjson.tool /tmp/restic-repo/keys/b02de82*
-    {
-        "hostname": "kasimir",
-        "username": "fd0"
-        "kdf": "scrypt",
-        "N": 65536,
-        "r": 8,
-        "p": 1,
-        "created": "2015-01-02T18:10:13.48307196+01:00",
-        "data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==",
-        "salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==",
-    }
-
-When the repository is opened by restic, the user is prompted for the
-repository password. This is then used with ``scrypt``, a key derivation
-function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and
-``salt``) to derive 64 key bytes. The first 32 bytes are used as the
-encryption key (for AES-256) and the last 32 bytes are used as the
-message authentication key (for Poly1305-AES). These last 32 bytes are
-divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key
-``r``. The key ``r`` is then masked for use with Poly1305 (see the paper
-for details).
-
-Those keys are used to authenticate and decrypt the bytes contained in
-the JSON field ``data`` with AES-256 and Poly1305-AES as if they were
-any other blob (after removing the Base64 encoding). If the
-password is incorrect or the key file has been tampered with, the
-computed MAC will not match the last 16 bytes of the data, and restic
-exits with an error. Otherwise, the data yields a JSON document
-which contains the master encryption and message authentication keys for
-this repository (encoded in Base64). The command
-``restic cat masterkey`` can be used as follows to decrypt and
-pretty-print the master key:
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo cat masterkey
-    {
-        "mac": {
-          "k": "evFWd9wWlndL9jc501268g==",
-          "r": "E9eEDnSJZgqwTOkDtOp+Dw=="
-        },
-        "encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=",
-    }
-
-All data in the repository is encrypted and authenticated with these
-master keys. For encryption, the AES-256 algorithm in Counter mode is
-used. For message authentication, Poly1305-AES is used as described
-above.
-
-A repository can have several different passwords, with a key file for
-each. This way, the password can be changed without having to re-encrypt
-all data.
-
-Snapshots
-=========
-
-A snapshot represents a directory with all files and sub-directories at
-a given point in time. For each backup that is made, a new snapshot is
-created. A snapshot is a JSON document that is stored in an encrypted
-file below the directory ``snapshots`` in the repository. The filename
-is the storage ID. This string is unique and used within restic to
-uniquely identify a snapshot.
-
-The command ``restic cat snapshot`` can be used as follows to decrypt
-and pretty-print the contents of a snapshot file:
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo cat snapshot 251c2e58
-    enter password for repository:
-    {
-      "time": "2015-01-02T18:10:50.895208559+01:00",
-      "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
-      "dir": "/tmp/testdata",
-      "hostname": "kasimir",
-      "username": "fd0",
-      "uid": 1000,
-      "gid": 100,
-      "tags": [
-        "NL"
-      ]
-    }
-
-Here it can be seen that this snapshot represents the contents of the
-directory ``/tmp/testdata``. The most important field is ``tree``. When
-the meta data (e.g. the tags) of a snapshot change, the snapshot needs
-to be re-encrypted and saved. This will change the storage ID, so in
-order to relate these seemingly different snapshots, a field
-``original`` is introduced which contains the ID of the original
-snapshot, e.g. after adding the tag ``DE`` to the snapshot above it
-becomes:
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo cat snapshot 22a5af1b
-    enter password for repository:
-    {
-      "time": "2015-01-02T18:10:50.895208559+01:00",
-      "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
-      "dir": "/tmp/testdata",
-      "hostname": "kasimir",
-      "username": "fd0",
-      "uid": 1000,
-      "gid": 100,
-      "tags": [
-        "NL",
-        "DE"
-      ],
-      "original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837"
-    }
-
-Once introduced, the ``original`` field is not modified when the
-snapshot's meta data is changed again.
-
-All content within a restic repository is referenced according to its
-SHA-256 hash. Before saving, each file is split into variable sized
-Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered
-list which then represents the content of the file.
-
-In order to relate these plaintext hashes to the actual location within
-a Pack file , an index is used. If the index is not available, the
-header of all data Blobs can be read.
-
-Trees and Data
-==============
-
-A snapshot references a tree by the SHA-256 hash of the JSON string
-representation of its contents. Trees and data are saved in pack files
-in a subdirectory of the directory ``data``.
-
-The command ``restic cat blob`` can be used to inspect the tree
-referenced above (piping the output of the command to ``jq .`` so that
-the JSON is indented):
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq .
-    enter password for repository:
-    {
-      "nodes": [
-        {
-          "name": "testdata",
-          "type": "dir",
-          "mode": 493,
-          "mtime": "2014-12-22T14:47:59.912418701+01:00",
-          "atime": "2014-12-06T17:49:21.748468803+01:00",
-          "ctime": "2014-12-22T14:47:59.912418701+01:00",
-          "uid": 1000,
-          "gid": 100,
-          "user": "fd0",
-          "inode": 409704562,
-          "content": null,
-          "subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc"
-        }
-      ]
-    }
-
-A tree contains a list of entries (in the field ``nodes``) which contain
-meta data like a name and timestamps. When the entry references a
-directory, the field ``subtree`` contains the plain text ID of another
-tree object.
-
-When the command ``restic cat blob`` is used, the plaintext ID is needed
-to print a tree. The tree referenced above can be dumped as follows:
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc
-    enter password for repository:
-    {
-      "nodes": [
-        {
-          "name": "testfile",
-          "type": "file",
-          "mode": 420,
-          "mtime": "2014-12-06T17:50:23.34513538+01:00",
-          "atime": "2014-12-06T17:50:23.338468713+01:00",
-          "ctime": "2014-12-06T17:50:23.34513538+01:00",
-          "uid": 1000,
-          "gid": 100,
-          "user": "fd0",
-          "inode": 416863351,
-          "size": 1234,
-          "links": 1,
-          "content": [
-            "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d"
-          ]
-        },
-        [...]
-      ]
-    }
-
-This tree contains a file entry. This time, the ``subtree`` field is not
-present and the ``content`` field contains a list with one plain text
-SHA-256 hash.
-
-The command ``restic cat blob`` can also be used to extract and decrypt
-data given a plaintext ID, e.g. for the data mentioned above:
-
-.. code-block:: console
-
-    $ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum
-    enter password for repository:
-    50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d  -
-
-As can be seen from the output of the program ``sha256sum``, the hash
-matches the plaintext hash from the map included in the tree above, so
-the correct data has been returned.
-
-Locks
-=====
-
-The restic repository structure is designed in a way that allows
-parallel access of multiple instance of restic and even parallel writes.
-However, there are some functions that work more efficient or even
-require exclusive access of the repository. In order to implement these
-functions, restic processes are required to create a lock on the
-repository before doing anything.
-
-Locks come in two types: Exclusive and non-exclusive locks. At most one
-process can have an exclusive lock on the repository, and during that
-time there must not be any other locks (exclusive and non-exclusive).
-There may be multiple non-exclusive locks in parallel.
-
-A lock is a file in the subdir ``locks`` whose filename is the storage
-ID of the contents. It is encrypted and authenticated the same way as
-other files in the repository and contains the following JSON structure:
-
-.. code:: json
-
-    {
-      "time": "2015-06-27T12:18:51.759239612+02:00",
-      "exclusive": false,
-      "hostname": "kasimir",
-      "username": "fd0",
-      "pid": 13607,
-      "uid": 1000,
-      "gid": 100
-    }
-
-The field ``exclusive`` defines the type of lock. When a new lock is to
-be created, restic checks all locks in the repository. When a lock is
-found, it is tested if the lock is stale, which is the case for locks
-with timestamps older than 30 minutes. If the lock was created on the
-same machine, even for younger locks it is tested whether the process is
-still alive by sending a signal to it. If that fails, restic assumes
-that the process is dead and considers the lock to be stale.
-
-When a new lock is to be created and no other conflicting locks are
-detected, restic creates a new lock, waits, and checks if other locks
-appeared in the repository. Depending on the type of the other locks and
-the lock to be created, restic either continues or fails.
-
-Backups and Deduplication
-=========================
-
-For creating a backup, restic scans the source directory for all files,
-sub-directories and other entries. The data from each file is split into
-variable length Blobs cut at offsets defined by a sliding window of 64
-byte. The implementation uses Rabin Fingerprints for implementing this
-Content Defined Chunking (CDC). An irreducible polynomial is selected at
-random and saved in the file ``config`` when a repository is
-initialized, so that watermark attacks are much harder.
-
-Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB
-in size. The implementation aims for 1 MiB Blob size on average.
-
-For modified files, only modified Blobs have to be saved in a subsequent
-backup. This even works if bytes are inserted or removed at arbitrary
-positions within the file.
-
-Threat Model
-============
-
-The design goals for restic include being able to securely store backups
-in a location that is not completely trusted, e.g. a shared system where
-others can potentially access the files or (in the case of the system
-administrator) even modify or delete them.
-
-General assumptions:
-
-  The host system a backup is created on is trusted. This is the most
-   basic requirement, and essential for creating trustworthy backups.
-
-The restic backup program guarantees the following:
-
-  Accessing the unencrypted content of stored files and metadata should
-   not be possible without a password for the repository. Everything
-   except the metadata included for informational purposes in the key
-   files is encrypted and authenticated.
-
-  Modifications (intentional or unintentional) can be detected
-   automatically on several layers:
-
-   1. For all accesses of data stored in the repository it is checked
-      whether the cryptographic hash of the contents matches the storage
-      ID (the file's name). This way, modifications (bad RAM, broken
-      harddisk) can be detected easily.
-
-   2. Before decrypting any data, the MAC on the encrypted data is
-      checked. If there has been a modification, the MAC check will
-      fail. This step happens even before the data is decrypted, so data
-      that has been tampered with is not decrypted at all.
-
-However, the restic backup program is not designed to protect against
-attackers deleting files at the storage location. There is nothing that
-can be done about this. If this needs to be guaranteed, get a secure
-location without any access from third parties. If you assume that
-attackers have write access to your files at the storage location,
-attackers are able to figure out (e.g. based on the timestamps of the
-stored files) which files belong to what snapshot. When only these files
-are deleted, the particular snapshot vanished and all snapshots
-depending on data that has been added in the snapshot cannot be restored
-completely. Restic is not designed to detect this attack.
-
-************
-Local Cache
-***********
-
-In order to speed up certain operations, restic manages a local cache of data.
-This document describes the data structures for the local cache with version 1.
-
-Versions
-========
-
-The cache directory is selected according to the `XDG base dir specification
-<http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html>`__.
-Each repository has its own cache sub-directory, consting of the repository ID
-which is chosen at ``init``. All cache directories for different repos are
-independent of each other.
-
-The cache dir for a repo contains a file named ``version``, which contains a
-single ASCII integer line that stands for the current version of the cache. If
-a lower version number is found the cache is recreated with the current
-version. If a higher version number is found the cache is ignored and left as
-is.
-
-Snapshots, Data and Indexes
-===========================
-
-Snapshot, Data and Index files are cached in the sub-directories ``snapshots``,
-``data`` and  ``index``, as read from the repository.
-
-Expiry
-======
-
-Whenever a cache directory for a repo is used, that directory's modification
-timestamp is updated to the current time. By looking at the modification
-timestamps of the repo cache directories it is easy to decide which directories
-are old and haven't been used in a long time. Those are probably stale and can
-be removed.
-
-
-************
-REST Backend
-************
-
-Restic can interact with HTTP Backend that respects the following REST
-API.
-
-The following values are valid for ``{type}``:
-
- * ``data``
- * ``keys``
- * ``locks``
- * ``snapshots``
- * ``index``
- * ``config``
-
-The API version is selected via the ``Accept`` HTTP header in the request. The
-following values are defined:
-
- * ``application/vnd.x.restic.rest.v1`` or empty: Select API version 1
- * ``application/vnd.x.restic.rest.v2``: Select API version 2
-
-The server will respond with the value of the highest version it supports in
-the ``Content-Type`` HTTP response header for the HTTP requests which should
-return JSON. Any different value for this header means API version 1.
-
-The placeholder ``{path}`` in this document is a path to the repository, so
-that multiple different repositories can be accessed. The default path is
-``/``. The path must end with a slash.
-
-POST {path}?create=true
-=======================
-
-This request is used to initially create a new repository. The server
-responds with "200 OK" if the repository structure was created
-successfully or already exists, otherwise an error is returned.
-
-DELETE {path}
-=============
-
-Deletes the repository on the server side. The server responds with "200
-OK" if the repository was successfully removed. If this function is not
-implemented the server returns "501 Not Implemented", if this it is
-denied by the server it returns "403 Forbidden".
-
-HEAD {path}/config
-==================
-
-Returns "200 OK" if the repository has a configuration, an HTTP error
-otherwise.
-
-GET {path}/config
-=================
-
-Returns the content of the configuration file if the repository has a
-configuration, an HTTP error otherwise.
-
-Response format: binary/octet-stream
-
-POST {path}/config
-==================
-
-Returns "200 OK" if the configuration of the request body has been
-saved, an HTTP error otherwise.
-
-GET {path}/{type}/
-==================
-
-API version 1
-------------
-
-Returns a JSON array containing the names of all the blobs stored for a given
-type, example:
-
-.. code:: json
-
-    [
-      "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
-      "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
-      "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
-      "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649"
-    ]
-
-API version 2
-------------
-
-Returns a JSON array containing an object for each file of the given type. The
-objects have two keys: ``name`` for the file name, and ``size`` for the size in
-bytes.
-
-.. code:: json
-
-    [
-      {
-        "name": "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
-        "size": 2341058
-      },
-      {
-        "name": "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
-        "size": 2908900
-      },
-      {
-        "name": "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
-        "size": 3030712
-      },
-      {
-        "name": "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649",
-        "size": 2804
-      }
-    ]
-
-HEAD {path}/{type}/{name}
-=========================
-
-Returns "200 OK" if the blob with the given name and type is stored in
-the repository, "404 not found" otherwise. If the blob exists, the HTTP
-header ``Content-Length`` is set to the file size.
-
-GET {path}/{type}/{name}
-========================
-
-Returns the content of the blob with the given name and type if it is
-stored in the repository, "404 not found" otherwise.
-
-If the request specifies a partial read with a Range header field, then
-the status code of the response is 206 instead of 200 and the response
-only contains the specified range.
-
-Response format: binary/octet-stream
-
-POST {path}/{type}/{name}
-=========================
-
-Saves the content of the request body as a blob with the given name and
-type, an HTTP error otherwise.
-
-Request format: binary/octet-stream
-
-DELETE {path}/{type}/{name}
-===========================
-
-Returns "200 OK" if the blob with the given name and type has been
-deleted from the repository, an HTTP error otherwise.
-
-
+.. include:: design.rst
+.. include:: cache.rst
+.. include:: REST_backend.rst
--- a/doc/REST_backend.rst
+++ b/doc/REST_backend.rst
@ -0,0 +1,145 @@
+************
+REST Backend
+************
+
+Restic can interact with HTTP Backend that respects the following REST
+API.
+
+The following values are valid for ``{type}``:
+
+ * ``data``
+ * ``keys``
+ * ``locks``
+ * ``snapshots``
+ * ``index``
+ * ``config``
+
+The API version is selected via the ``Accept`` HTTP header in the request. The
+following values are defined:
+
+ * ``application/vnd.x.restic.rest.v1`` or empty: Select API version 1
+ * ``application/vnd.x.restic.rest.v2``: Select API version 2
+
+The server will respond with the value of the highest version it supports in
+the ``Content-Type`` HTTP response header for the HTTP requests which should
+return JSON. Any different value for this header means API version 1.
+
+The placeholder ``{path}`` in this document is a path to the repository, so
+that multiple different repositories can be accessed. The default path is
+``/``. The path must end with a slash.
+
+POST {path}?create=true
+=======================
+
+This request is used to initially create a new repository. The server
+responds with "200 OK" if the repository structure was created
+successfully or already exists, otherwise an error is returned.
+
+DELETE {path}
+=============
+
+Deletes the repository on the server side. The server responds with "200
+OK" if the repository was successfully removed. If this function is not
+implemented the server returns "501 Not Implemented", if this it is
+denied by the server it returns "403 Forbidden".
+
+HEAD {path}/config
+==================
+
+Returns "200 OK" if the repository has a configuration, an HTTP error
+otherwise.
+
+GET {path}/config
+=================
+
+Returns the content of the configuration file if the repository has a
+configuration, an HTTP error otherwise.
+
+Response format: binary/octet-stream
+
+POST {path}/config
+==================
+
+Returns "200 OK" if the configuration of the request body has been
+saved, an HTTP error otherwise.
+
+GET {path}/{type}/
+==================
+
+API version 1
+-------------
+
+Returns a JSON array containing the names of all the blobs stored for a given
+type, example:
+
+.. code:: json
+
+    [
+      "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
+      "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
+      "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
+      "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649"
+    ]
+
+API version 2
+-------------
+
+Returns a JSON array containing an object for each file of the given type. The
+objects have two keys: ``name`` for the file name, and ``size`` for the size in
+bytes.
+
+.. code:: json
+
+    [
+      {
+        "name": "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b",
+        "size": 2341058
+      },
+      {
+        "name": "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec",
+        "size": 2908900
+      },
+      {
+        "name": "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60",
+        "size": 3030712
+      },
+      {
+        "name": "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649",
+        "size": 2804
+      }
+    ]
+
+HEAD {path}/{type}/{name}
+=========================
+
+Returns "200 OK" if the blob with the given name and type is stored in
+the repository, "404 not found" otherwise. If the blob exists, the HTTP
+header ``Content-Length`` is set to the file size.
+
+GET {path}/{type}/{name}
+========================
+
+Returns the content of the blob with the given name and type if it is
+stored in the repository, "404 not found" otherwise.
+
+If the request specifies a partial read with a Range header field, then
+the status code of the response is 206 instead of 200 and the response
+only contains the specified range.
+
+Response format: binary/octet-stream
+
+POST {path}/{type}/{name}
+=========================
+
+Saves the content of the request body as a blob with the given name and
+type, an HTTP error otherwise.
+
+Request format: binary/octet-stream
+
+DELETE {path}/{type}/{name}
+===========================
+
+Returns "200 OK" if the blob with the given name and type has been
+deleted from the repository, an HTTP error otherwise.
+
+
--- a/doc/cache.rst
+++ b/doc/cache.rst
@ -0,0 +1,36 @@
+***********
+Local Cache
+***********
+
+In order to speed up certain operations, restic manages a local cache of data.
+This document describes the data structures for the local cache with version 1.
+
+Versions
+========
+
+The cache directory is selected according to the `XDG base dir specification
+<http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html>`__.
+Each repository has its own cache sub-directory, consting of the repository ID
+which is chosen at ``init``. All cache directories for different repos are
+independent of each other.
+
+The cache dir for a repo contains a file named ``version``, which contains a
+single ASCII integer line that stands for the current version of the cache. If
+a lower version number is found the cache is recreated with the current
+version. If a higher version number is found the cache is ignored and left as
+is.
+
+Snapshots, Data and Indexes
+===========================
+
+Snapshot, Data and Index files are cached in the sub-directories ``snapshots``,
+``data`` and  ``index``, as read from the repository.
+
+Expiry
+======
+
+Whenever a cache directory for a repo is used, that directory's modification
+timestamp is updated to the current time. By looking at the modification
+timestamps of the repo cache directories it is easy to decide which directories
+are old and haven't been used in a long time. Those are probably stale and can
+be removed.
--- a/doc/design.rst
+++ b/doc/design.rst
@ -0,0 +1,608 @@
+
+Terminology
+===========
+
+This section introduces terminology used in this document.
+
+*Repository*: All data produced during a backup is sent to and stored in
+a repository in a structured form, for example in a file system
+hierarchy with several subdirectories. A repository implementation must
+be able to fulfill a number of operations, e.g. list the contents.
+
+*Blob*: A Blob combines a number of data bytes with identifying
+information like the SHA-256 hash of the data and its length.
+
+*Pack*: A Pack combines one or more Blobs, e.g. in a single file.
+
+*Snapshot*: A Snapshot stands for the state of a file or directory that
+has been backed up at some point in time. The state here means the
+content and meta data like the name and modification time for the file
+or the directory and its contents.
+
+*Storage ID*: A storage ID is the SHA-256 hash of the content stored in
+the repository. This ID is required in order to load the file from the
+repository.
+
+Repository Format
+=================
+
+All data is stored in a restic repository. A repository is able to store
+data of several different types, which can later be requested based on
+an ID. This so-called "storage ID" is the SHA-256 hash of the content of
+a file. All files in a repository are only written once and never
+modified afterwards. This allows accessing and even writing to the
+repository with multiple clients in parallel. Only the ``prune`` operation
+removes data from the repository.
+
+Repositories consist of several directories and a top-level file called
+``config``. For all other files stored in the repository, the name for
+the file is the lower case hexadecimal representation of the storage ID,
+which is the SHA-256 hash of the file's contents. This allows for easy
+verification of files for accidental modifications, like disk read
+errors, by simply running the program ``sha256sum`` on the file and
+comparing its output to the file name. If the prefix of a filename is
+unique amongst all the other files in the same directory, the prefix may
+be used instead of the complete filename.
+
+Apart from the files stored within the ``keys`` directory, all files are
+encrypted with AES-256 in counter mode (CTR). The integrity of the
+encrypted data is secured by a Poly1305-AES message authentication code
+(sometimes also referred to as a "signature").
+
+In the first 16 bytes of each encrypted file the initialisation vector
+(IV) is stored. It is followed by the encrypted data and completed by
+the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The
+complete encryption overhead is 32 bytes. For each file, a new random IV
+is selected.
+
+The file ``config`` is encrypted this way and contains a JSON document
+like the following:
+
+.. code:: json
+
+    {
+      "version": 1,
+      "id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
+      "chunker_polynomial": "25b468838dcb75"
+    }
+
+After decryption, restic first checks that the version field contains a
+version number that it understands, otherwise it aborts. At the moment,
+the version is expected to be 1. The field ``id`` holds a unique ID
+which consists of 32 random bytes, encoded in hexadecimal. This uniquely
+identifies the repository, regardless if it is accessed via SFTP or
+locally. The field ``chunker_polynomial`` contains a parameter that is
+used for splitting large files into smaller chunks (see below).
+
+Repository Layout
+-----------------
+
+The ``local`` and ``sftp`` backends are implemented using files and
+directories stored in a file system. The directory layout is the same
+for both backend types.
+
+The basic layout of a repository stored in a ``local`` or ``sftp``
+backend is shown here:
+
+::
+
+    /tmp/restic-repo
+    ├── config
+    ├── data
+    │   ├── 21
+    │   │   └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
+    │   ├── 32
+    │   │   └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
+    │   ├── 59
+    │   │   └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
+    │   ├── 73
+    │   │   └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
+    │   [...]
+    ├── index
+    │   ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
+    │   └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
+    ├── keys
+    │   └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
+    ├── locks
+    ├── snapshots
+    │   └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
+    └── tmp
+
+A local repository can be initialized with the ``restic init`` command,
+e.g.:
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo init
+
+The local and sftp backends will auto-detect and accept all layouts described
+in the following sections, so that remote repositories mounted locally e.g. via
+fuse can be accessed. The layout auto-detection can be overridden by specifying
+the option ``-o local.layout=default``, valid values are ``default`` and
+``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the
+s3 backend ``s3.layout``.
+
+S3 Legacy Layout
+----------------
+
+Unfortunately during development the AWS S3 backend uses slightly different
+paths (directory names use singular instead of plural for ``key``,
+``lock``, and ``snapshot`` files), and the data files are stored directly below
+the ``data`` directory. The S3 Legacy repository layout looks like this:
+
+::
+
+    /config
+    /data
+     ├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1
+     ├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5
+     ├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426
+     ├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c
+    [...]
+    /index
+     ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d
+     └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd
+    /key
+     └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7
+    /lock
+    /snapshot
+     └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec
+
+The S3 backend understands and accepts both forms, new backends are
+always created with the default layout for compatibility reasons.
+
+Pack Format
+===========
+
+All files in the repository except Key and Pack files just contain raw
+data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one
+or more Blobs of data.
+
+A Pack's structure is as follows:
+
+::
+
+    EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length
+
+At the end of the Pack file is a header, which describes the content.
+The header is encrypted and authenticated. ``Header_Length`` is the
+length of the encrypted header encoded as a four byte integer in
+little-endian encoding. Placing the header at the end of a file allows
+writing the blobs in a continuous stream as soon as they are read during
+the backup phase. This reduces code complexity and avoids having to
+re-write a file once the pack is complete and the content and length of
+the header is known.
+
+All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are
+authenticated and encrypted independently. This enables repository
+reorganisation without having to touch the encrypted Blobs. In addition
+it also allows efficient indexing, for only the header needs to be read
+in order to find out which Blobs are contained in the Pack. Since the
+header is authenticated, authenticity of the header can be checked
+without having to read the complete Pack.
+
+After decryption, a Pack's header consists of the following elements:
+
+::
+
+    Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) ||
+    [...]
+    Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) ||
+
+This is enough to calculate the offsets for all the Blobs in the Pack.
+Length is the length of a Blob as a four byte integer in little-endian
+format. The type field is a one byte field and labels the content of a
+blob according to the following table:
+
+--------+-----------+
+| Type   | Meaning   |
+========+===========+
+| 0      | data      |
+--------+-----------+
+| 1      | tree      |
+--------+-----------+
+
+All other types are invalid, more types may be added in the future.
+
+For reconstructing the index or parsing a pack without an index, first
+the last four bytes must be read in order to find the length of the
+header. Afterwards, the header can be read and parsed, which yields all
+plaintext hashes, types, offsets and lengths of all included blobs.
+
+Indexing
+========
+
+Index files contain information about Data and Tree Blobs and the Packs
+they are contained in and store this information in the repository. When
+the local cached index is not accessible any more, the index files can
+be downloaded and used to reconstruct the index. The files are encrypted
+and authenticated like Data and Tree Blobs, so the outer structure is
+``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON
+document like the following:
+
+.. code:: json
+
+    {
+      "supersedes": [
+        "ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452"
+      ],
+      "packs": [
+        {
+          "id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c",
+          "blobs": [
+            {
+              "id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
+              "type": "data",
+              "offset": 0,
+              "length": 25
+            },{
+              "id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
+              "type": "tree",
+              "offset": 38,
+              "length": 100
+            },
+            {
+              "id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
+              "type": "data",
+              "offset": 150,
+              "length": 123
+            }
+          ]
+        }, [...]
+      ]
+    }
+
+This JSON document lists Packs and the blobs contained therein. In this
+example, the Pack ``73d04e61`` contains two data Blobs and one Tree
+blob, the plaintext hashes are listed afterwards.
+
+The field ``supersedes`` lists the storage IDs of index files that have
+been replaced with the current index file. This happens when index files
+are repacked, for example when old snapshots are removed and Packs are
+recombined.
+
+There may be an arbitrary number of index files, containing information
+on non-disjoint sets of Packs. The number of packs described in a single
+file is chosen so that the file size is kept below 8 MiB.
+
+Keys, Encryption and MAC
+========================
+
+All data stored by restic in the repository is encrypted with AES-256 in
+counter mode and authenticated using Poly1305-AES. For encrypting new
+data first 16 bytes are read from a cryptographically secure
+pseudorandom number generator as a random nonce. This is used both as
+the IV for counter mode and the nonce for Poly1305. This operation needs
+three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and
+a 16 byte key for Poly1305. For details see the original paper `The
+Poly1305-AES message-authentication
+code <http://cr.yp.to/mac/poly1305-20050329.pdf>`__ by Dan Bernstein.
+The data is then encrypted with AES-256 and afterwards a message
+authentication code (MAC) is computed over the ciphertext, everything is
+then stored as IV \|\| CIPHERTEXT \|\| MAC.
+
+The directory ``keys`` contains key files. These are simple JSON
+documents which contain all data that is needed to derive the
+repository's master encryption and message authentication keys from a
+user's password. The JSON document from the repository can be
+pretty-printed for example by using the Python module ``json``
+(shortened to increase readability):
+
+::
+
+    $ python -mjson.tool /tmp/restic-repo/keys/b02de82*
+    {
+        "hostname": "kasimir",
+        "username": "fd0"
+        "kdf": "scrypt",
+        "N": 65536,
+        "r": 8,
+        "p": 1,
+        "created": "2015-01-02T18:10:13.48307196+01:00",
+        "data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==",
+        "salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==",
+    }
+
+When the repository is opened by restic, the user is prompted for the
+repository password. This is then used with ``scrypt``, a key derivation
+function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and
+``salt``) to derive 64 key bytes. The first 32 bytes are used as the
+encryption key (for AES-256) and the last 32 bytes are used as the
+message authentication key (for Poly1305-AES). These last 32 bytes are
+divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key
+``r``. The key ``r`` is then masked for use with Poly1305 (see the paper
+for details).
+
+Those keys are used to authenticate and decrypt the bytes contained in
+the JSON field ``data`` with AES-256 and Poly1305-AES as if they were
+any other blob (after removing the Base64 encoding). If the
+password is incorrect or the key file has been tampered with, the
+computed MAC will not match the last 16 bytes of the data, and restic
+exits with an error. Otherwise, the data yields a JSON document
+which contains the master encryption and message authentication keys for
+this repository (encoded in Base64). The command
+``restic cat masterkey`` can be used as follows to decrypt and
+pretty-print the master key:
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo cat masterkey
+    {
+        "mac": {
+          "k": "evFWd9wWlndL9jc501268g==",
+          "r": "E9eEDnSJZgqwTOkDtOp+Dw=="
+        },
+        "encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=",
+    }
+
+All data in the repository is encrypted and authenticated with these
+master keys. For encryption, the AES-256 algorithm in Counter mode is
+used. For message authentication, Poly1305-AES is used as described
+above.
+
+A repository can have several different passwords, with a key file for
+each. This way, the password can be changed without having to re-encrypt
+all data.
+
+Snapshots
+=========
+
+A snapshot represents a directory with all files and sub-directories at
+a given point in time. For each backup that is made, a new snapshot is
+created. A snapshot is a JSON document that is stored in an encrypted
+file below the directory ``snapshots`` in the repository. The filename
+is the storage ID. This string is unique and used within restic to
+uniquely identify a snapshot.
+
+The command ``restic cat snapshot`` can be used as follows to decrypt
+and pretty-print the contents of a snapshot file:
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo cat snapshot 251c2e58
+    enter password for repository:
+    {
+      "time": "2015-01-02T18:10:50.895208559+01:00",
+      "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
+      "dir": "/tmp/testdata",
+      "hostname": "kasimir",
+      "username": "fd0",
+      "uid": 1000,
+      "gid": 100,
+      "tags": [
+        "NL"
+      ]
+    }
+
+Here it can be seen that this snapshot represents the contents of the
+directory ``/tmp/testdata``. The most important field is ``tree``. When
+the meta data (e.g. the tags) of a snapshot change, the snapshot needs
+to be re-encrypted and saved. This will change the storage ID, so in
+order to relate these seemingly different snapshots, a field
+``original`` is introduced which contains the ID of the original
+snapshot, e.g. after adding the tag ``DE`` to the snapshot above it
+becomes:
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo cat snapshot 22a5af1b
+    enter password for repository:
+    {
+      "time": "2015-01-02T18:10:50.895208559+01:00",
+      "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf",
+      "dir": "/tmp/testdata",
+      "hostname": "kasimir",
+      "username": "fd0",
+      "uid": 1000,
+      "gid": 100,
+      "tags": [
+        "NL",
+        "DE"
+      ],
+      "original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837"
+    }
+
+Once introduced, the ``original`` field is not modified when the
+snapshot's meta data is changed again.
+
+All content within a restic repository is referenced according to its
+SHA-256 hash. Before saving, each file is split into variable sized
+Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered
+list which then represents the content of the file.
+
+In order to relate these plaintext hashes to the actual location within
+a Pack file , an index is used. If the index is not available, the
+header of all data Blobs can be read.
+
+Trees and Data
+==============
+
+A snapshot references a tree by the SHA-256 hash of the JSON string
+representation of its contents. Trees and data are saved in pack files
+in a subdirectory of the directory ``data``.
+
+The command ``restic cat blob`` can be used to inspect the tree
+referenced above (piping the output of the command to ``jq .`` so that
+the JSON is indented):
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq .
+    enter password for repository:
+    {
+      "nodes": [
+        {
+          "name": "testdata",
+          "type": "dir",
+          "mode": 493,
+          "mtime": "2014-12-22T14:47:59.912418701+01:00",
+          "atime": "2014-12-06T17:49:21.748468803+01:00",
+          "ctime": "2014-12-22T14:47:59.912418701+01:00",
+          "uid": 1000,
+          "gid": 100,
+          "user": "fd0",
+          "inode": 409704562,
+          "content": null,
+          "subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc"
+        }
+      ]
+    }
+
+A tree contains a list of entries (in the field ``nodes``) which contain
+meta data like a name and timestamps. When the entry references a
+directory, the field ``subtree`` contains the plain text ID of another
+tree object.
+
+When the command ``restic cat blob`` is used, the plaintext ID is needed
+to print a tree. The tree referenced above can be dumped as follows:
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc
+    enter password for repository:
+    {
+      "nodes": [
+        {
+          "name": "testfile",
+          "type": "file",
+          "mode": 420,
+          "mtime": "2014-12-06T17:50:23.34513538+01:00",
+          "atime": "2014-12-06T17:50:23.338468713+01:00",
+          "ctime": "2014-12-06T17:50:23.34513538+01:00",
+          "uid": 1000,
+          "gid": 100,
+          "user": "fd0",
+          "inode": 416863351,
+          "size": 1234,
+          "links": 1,
+          "content": [
+            "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d"
+          ]
+        },
+        [...]
+      ]
+    }
+
+This tree contains a file entry. This time, the ``subtree`` field is not
+present and the ``content`` field contains a list with one plain text
+SHA-256 hash.
+
+The command ``restic cat blob`` can also be used to extract and decrypt
+data given a plaintext ID, e.g. for the data mentioned above:
+
+.. code-block:: console
+
+    $ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum
+    enter password for repository:
+    50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d  -
+
+As can be seen from the output of the program ``sha256sum``, the hash
+matches the plaintext hash from the map included in the tree above, so
+the correct data has been returned.
+
+Locks
+=====
+
+The restic repository structure is designed in a way that allows
+parallel access of multiple instance of restic and even parallel writes.
+However, there are some functions that work more efficient or even
+require exclusive access of the repository. In order to implement these
+functions, restic processes are required to create a lock on the
+repository before doing anything.
+
+Locks come in two types: Exclusive and non-exclusive locks. At most one
+process can have an exclusive lock on the repository, and during that
+time there must not be any other locks (exclusive and non-exclusive).
+There may be multiple non-exclusive locks in parallel.
+
+A lock is a file in the subdir ``locks`` whose filename is the storage
+ID of the contents. It is encrypted and authenticated the same way as
+other files in the repository and contains the following JSON structure:
+
+.. code:: json
+
+    {
+      "time": "2015-06-27T12:18:51.759239612+02:00",
+      "exclusive": false,
+      "hostname": "kasimir",
+      "username": "fd0",
+      "pid": 13607,
+      "uid": 1000,
+      "gid": 100
+    }
+
+The field ``exclusive`` defines the type of lock. When a new lock is to
+be created, restic checks all locks in the repository. When a lock is
+found, it is tested if the lock is stale, which is the case for locks
+with timestamps older than 30 minutes. If the lock was created on the
+same machine, even for younger locks it is tested whether the process is
+still alive by sending a signal to it. If that fails, restic assumes
+that the process is dead and considers the lock to be stale.
+
+When a new lock is to be created and no other conflicting locks are
+detected, restic creates a new lock, waits, and checks if other locks
+appeared in the repository. Depending on the type of the other locks and
+the lock to be created, restic either continues or fails.
+
+Backups and Deduplication
+=========================
+
+For creating a backup, restic scans the source directory for all files,
+sub-directories and other entries. The data from each file is split into
+variable length Blobs cut at offsets defined by a sliding window of 64
+byte. The implementation uses Rabin Fingerprints for implementing this
+Content Defined Chunking (CDC). An irreducible polynomial is selected at
+random and saved in the file ``config`` when a repository is
+initialized, so that watermark attacks are much harder.
+
+Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB
+in size. The implementation aims for 1 MiB Blob size on average.
+
+For modified files, only modified Blobs have to be saved in a subsequent
+backup. This even works if bytes are inserted or removed at arbitrary
+positions within the file.
+
+Threat Model
+============
+
+The design goals for restic include being able to securely store backups
+in a location that is not completely trusted, e.g. a shared system where
+others can potentially access the files or (in the case of the system
+administrator) even modify or delete them.
+
+General assumptions:
+
+-  The host system a backup is created on is trusted. This is the most
+   basic requirement, and essential for creating trustworthy backups.
+
+The restic backup program guarantees the following:
+
+-  Accessing the unencrypted content of stored files and metadata should
+   not be possible without a password for the repository. Everything
+   except the metadata included for informational purposes in the key
+   files is encrypted and authenticated.
+
+-  Modifications (intentional or unintentional) can be detected
+   automatically on several layers:
+
+   1. For all accesses of data stored in the repository it is checked
+      whether the cryptographic hash of the contents matches the storage
+      ID (the file's name). This way, modifications (bad RAM, broken
+      harddisk) can be detected easily.
+
+   2. Before decrypting any data, the MAC on the encrypted data is
+      checked. If there has been a modification, the MAC check will
+      fail. This step happens even before the data is decrypted, so data
+      that has been tampered with is not decrypted at all.
+
+However, the restic backup program is not designed to protect against
+attackers deleting files at the storage location. There is nothing that
+can be done about this. If this needs to be guaranteed, get a secure
+location without any access from third parties. If you assume that
+attackers have write access to your files at the storage location,
+attackers are able to figure out (e.g. based on the timestamps of the
+stored files) which files belong to what snapshot. When only these files
+are deleted, the particular snapshot vanished and all snapshots
+depending on data that has been added in the snapshot cannot be restored
+completely. Restic is not designed to detect this attack.
+