From ddf2065ce2add0600ca144b9b884c8c13c6986be Mon Sep 17 00:00:00 2001 From: Alexander Neumann Date: Mon, 18 Jun 2018 22:17:30 +0200 Subject: [PATCH] doc: Split references out into smaller files Closes #1852 --- doc/100_references.rst | 793 +---------------------------------------- doc/REST_backend.rst | 145 ++++++++ doc/cache.rst | 36 ++ doc/design.rst | 608 +++++++++++++++++++++++++++++++ 4 files changed, 792 insertions(+), 790 deletions(-) create mode 100644 doc/REST_backend.rst create mode 100644 doc/cache.rst create mode 100644 doc/design.rst diff --git a/doc/100_references.rst b/doc/100_references.rst index 43dc53f98..4be4e0dae 100644 --- a/doc/100_references.rst +++ b/doc/100_references.rst @@ -18,793 +18,6 @@ References Design ****** -Terminology -=========== - -This section introduces terminology used in this document. - -*Repository*: All data produced during a backup is sent to and stored in -a repository in a structured form, for example in a file system -hierarchy with several subdirectories. A repository implementation must -be able to fulfill a number of operations, e.g. list the contents. - -*Blob*: A Blob combines a number of data bytes with identifying -information like the SHA-256 hash of the data and its length. - -*Pack*: A Pack combines one or more Blobs, e.g. in a single file. - -*Snapshot*: A Snapshot stands for the state of a file or directory that -has been backed up at some point in time. The state here means the -content and meta data like the name and modification time for the file -or the directory and its contents. - -*Storage ID*: A storage ID is the SHA-256 hash of the content stored in -the repository. This ID is required in order to load the file from the -repository. - -Repository Format -================= - -All data is stored in a restic repository. A repository is able to store -data of several different types, which can later be requested based on -an ID. This so-called "storage ID" is the SHA-256 hash of the content of -a file. All files in a repository are only written once and never -modified afterwards. This allows accessing and even writing to the -repository with multiple clients in parallel. Only the ``prune`` operation -removes data from the repository. - -Repositories consist of several directories and a top-level file called -``config``. For all other files stored in the repository, the name for -the file is the lower case hexadecimal representation of the storage ID, -which is the SHA-256 hash of the file's contents. This allows for easy -verification of files for accidental modifications, like disk read -errors, by simply running the program ``sha256sum`` on the file and -comparing its output to the file name. If the prefix of a filename is -unique amongst all the other files in the same directory, the prefix may -be used instead of the complete filename. - -Apart from the files stored within the ``keys`` directory, all files are -encrypted with AES-256 in counter mode (CTR). The integrity of the -encrypted data is secured by a Poly1305-AES message authentication code -(sometimes also referred to as a "signature"). - -In the first 16 bytes of each encrypted file the initialisation vector -(IV) is stored. It is followed by the encrypted data and completed by -the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The -complete encryption overhead is 32 bytes. For each file, a new random IV -is selected. - -The file ``config`` is encrypted this way and contains a JSON document -like the following: - -.. code:: json - - { - "version": 1, - "id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b", - "chunker_polynomial": "25b468838dcb75" - } - -After decryption, restic first checks that the version field contains a -version number that it understands, otherwise it aborts. At the moment, -the version is expected to be 1. The field ``id`` holds a unique ID -which consists of 32 random bytes, encoded in hexadecimal. This uniquely -identifies the repository, regardless if it is accessed via SFTP or -locally. The field ``chunker_polynomial`` contains a parameter that is -used for splitting large files into smaller chunks (see below). - -Repository Layout ------------------ - -The ``local`` and ``sftp`` backends are implemented using files and -directories stored in a file system. The directory layout is the same -for both backend types. - -The basic layout of a repository stored in a ``local`` or ``sftp`` -backend is shown here: - -:: - - /tmp/restic-repo - ├── config - ├── data - │ ├── 21 - │ │ └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1 - │ ├── 32 - │ │ └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5 - │ ├── 59 - │ │ └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426 - │ ├── 73 - │ │ └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c - │ [...] - ├── index - │ ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d - │ └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd - ├── keys - │ └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7 - ├── locks - ├── snapshots - │ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec - └── tmp - -A local repository can be initialized with the ``restic init`` command, -e.g.: - -.. code-block:: console - - $ restic -r /tmp/restic-repo init - -The local and sftp backends will auto-detect and accept all layouts described -in the following sections, so that remote repositories mounted locally e.g. via -fuse can be accessed. The layout auto-detection can be overridden by specifying -the option ``-o local.layout=default``, valid values are ``default`` and -``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the -s3 backend ``s3.layout``. - -S3 Legacy Layout ----------------- - -Unfortunately during development the AWS S3 backend uses slightly different -paths (directory names use singular instead of plural for ``key``, -``lock``, and ``snapshot`` files), and the data files are stored directly below -the ``data`` directory. The S3 Legacy repository layout looks like this: - -:: - - /config - /data - ├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1 - ├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5 - ├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426 - ├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c - [...] - /index - ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d - └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd - /key - └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7 - /lock - /snapshot - └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec - -The S3 backend understands and accepts both forms, new backends are -always created with the default layout for compatibility reasons. - -Pack Format -=========== - -All files in the repository except Key and Pack files just contain raw -data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one -or more Blobs of data. - -A Pack's structure is as follows: - -:: - - EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length - -At the end of the Pack file is a header, which describes the content. -The header is encrypted and authenticated. ``Header_Length`` is the -length of the encrypted header encoded as a four byte integer in -little-endian encoding. Placing the header at the end of a file allows -writing the blobs in a continuous stream as soon as they are read during -the backup phase. This reduces code complexity and avoids having to -re-write a file once the pack is complete and the content and length of -the header is known. - -All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are -authenticated and encrypted independently. This enables repository -reorganisation without having to touch the encrypted Blobs. In addition -it also allows efficient indexing, for only the header needs to be read -in order to find out which Blobs are contained in the Pack. Since the -header is authenticated, authenticity of the header can be checked -without having to read the complete Pack. - -After decryption, a Pack's header consists of the following elements: - -:: - - Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) || - [...] - Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) || - -This is enough to calculate the offsets for all the Blobs in the Pack. -Length is the length of a Blob as a four byte integer in little-endian -format. The type field is a one byte field and labels the content of a -blob according to the following table: - -+--------+-----------+ -| Type | Meaning | -+========+===========+ -| 0 | data | -+--------+-----------+ -| 1 | tree | -+--------+-----------+ - -All other types are invalid, more types may be added in the future. - -For reconstructing the index or parsing a pack without an index, first -the last four bytes must be read in order to find the length of the -header. Afterwards, the header can be read and parsed, which yields all -plaintext hashes, types, offsets and lengths of all included blobs. - -Indexing -======== - -Index files contain information about Data and Tree Blobs and the Packs -they are contained in and store this information in the repository. When -the local cached index is not accessible any more, the index files can -be downloaded and used to reconstruct the index. The files are encrypted -and authenticated like Data and Tree Blobs, so the outer structure is -``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON -document like the following: - -.. code:: json - - { - "supersedes": [ - "ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452" - ], - "packs": [ - { - "id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c", - "blobs": [ - { - "id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce", - "type": "data", - "offset": 0, - "length": 25 - },{ - "id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae", - "type": "tree", - "offset": 38, - "length": 100 - }, - { - "id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66", - "type": "data", - "offset": 150, - "length": 123 - } - ] - }, [...] - ] - } - -This JSON document lists Packs and the blobs contained therein. In this -example, the Pack ``73d04e61`` contains two data Blobs and one Tree -blob, the plaintext hashes are listed afterwards. - -The field ``supersedes`` lists the storage IDs of index files that have -been replaced with the current index file. This happens when index files -are repacked, for example when old snapshots are removed and Packs are -recombined. - -There may be an arbitrary number of index files, containing information -on non-disjoint sets of Packs. The number of packs described in a single -file is chosen so that the file size is kept below 8 MiB. - -Keys, Encryption and MAC -======================== - -All data stored by restic in the repository is encrypted with AES-256 in -counter mode and authenticated using Poly1305-AES. For encrypting new -data first 16 bytes are read from a cryptographically secure -pseudorandom number generator as a random nonce. This is used both as -the IV for counter mode and the nonce for Poly1305. This operation needs -three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and -a 16 byte key for Poly1305. For details see the original paper `The -Poly1305-AES message-authentication -code `__ by Dan Bernstein. -The data is then encrypted with AES-256 and afterwards a message -authentication code (MAC) is computed over the ciphertext, everything is -then stored as IV \|\| CIPHERTEXT \|\| MAC. - -The directory ``keys`` contains key files. These are simple JSON -documents which contain all data that is needed to derive the -repository's master encryption and message authentication keys from a -user's password. The JSON document from the repository can be -pretty-printed for example by using the Python module ``json`` -(shortened to increase readability): - -:: - - $ python -mjson.tool /tmp/restic-repo/keys/b02de82* - { - "hostname": "kasimir", - "username": "fd0" - "kdf": "scrypt", - "N": 65536, - "r": 8, - "p": 1, - "created": "2015-01-02T18:10:13.48307196+01:00", - "data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==", - "salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==", - } - -When the repository is opened by restic, the user is prompted for the -repository password. This is then used with ``scrypt``, a key derivation -function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and -``salt``) to derive 64 key bytes. The first 32 bytes are used as the -encryption key (for AES-256) and the last 32 bytes are used as the -message authentication key (for Poly1305-AES). These last 32 bytes are -divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key -``r``. The key ``r`` is then masked for use with Poly1305 (see the paper -for details). - -Those keys are used to authenticate and decrypt the bytes contained in -the JSON field ``data`` with AES-256 and Poly1305-AES as if they were -any other blob (after removing the Base64 encoding). If the -password is incorrect or the key file has been tampered with, the -computed MAC will not match the last 16 bytes of the data, and restic -exits with an error. Otherwise, the data yields a JSON document -which contains the master encryption and message authentication keys for -this repository (encoded in Base64). The command -``restic cat masterkey`` can be used as follows to decrypt and -pretty-print the master key: - -.. code-block:: console - - $ restic -r /tmp/restic-repo cat masterkey - { - "mac": { - "k": "evFWd9wWlndL9jc501268g==", - "r": "E9eEDnSJZgqwTOkDtOp+Dw==" - }, - "encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=", - } - -All data in the repository is encrypted and authenticated with these -master keys. For encryption, the AES-256 algorithm in Counter mode is -used. For message authentication, Poly1305-AES is used as described -above. - -A repository can have several different passwords, with a key file for -each. This way, the password can be changed without having to re-encrypt -all data. - -Snapshots -========= - -A snapshot represents a directory with all files and sub-directories at -a given point in time. For each backup that is made, a new snapshot is -created. A snapshot is a JSON document that is stored in an encrypted -file below the directory ``snapshots`` in the repository. The filename -is the storage ID. This string is unique and used within restic to -uniquely identify a snapshot. - -The command ``restic cat snapshot`` can be used as follows to decrypt -and pretty-print the contents of a snapshot file: - -.. code-block:: console - - $ restic -r /tmp/restic-repo cat snapshot 251c2e58 - enter password for repository: - { - "time": "2015-01-02T18:10:50.895208559+01:00", - "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf", - "dir": "/tmp/testdata", - "hostname": "kasimir", - "username": "fd0", - "uid": 1000, - "gid": 100, - "tags": [ - "NL" - ] - } - -Here it can be seen that this snapshot represents the contents of the -directory ``/tmp/testdata``. The most important field is ``tree``. When -the meta data (e.g. the tags) of a snapshot change, the snapshot needs -to be re-encrypted and saved. This will change the storage ID, so in -order to relate these seemingly different snapshots, a field -``original`` is introduced which contains the ID of the original -snapshot, e.g. after adding the tag ``DE`` to the snapshot above it -becomes: - -.. code-block:: console - - $ restic -r /tmp/restic-repo cat snapshot 22a5af1b - enter password for repository: - { - "time": "2015-01-02T18:10:50.895208559+01:00", - "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf", - "dir": "/tmp/testdata", - "hostname": "kasimir", - "username": "fd0", - "uid": 1000, - "gid": 100, - "tags": [ - "NL", - "DE" - ], - "original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837" - } - -Once introduced, the ``original`` field is not modified when the -snapshot's meta data is changed again. - -All content within a restic repository is referenced according to its -SHA-256 hash. Before saving, each file is split into variable sized -Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered -list which then represents the content of the file. - -In order to relate these plaintext hashes to the actual location within -a Pack file , an index is used. If the index is not available, the -header of all data Blobs can be read. - -Trees and Data -============== - -A snapshot references a tree by the SHA-256 hash of the JSON string -representation of its contents. Trees and data are saved in pack files -in a subdirectory of the directory ``data``. - -The command ``restic cat blob`` can be used to inspect the tree -referenced above (piping the output of the command to ``jq .`` so that -the JSON is indented): - -.. code-block:: console - - $ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq . - enter password for repository: - { - "nodes": [ - { - "name": "testdata", - "type": "dir", - "mode": 493, - "mtime": "2014-12-22T14:47:59.912418701+01:00", - "atime": "2014-12-06T17:49:21.748468803+01:00", - "ctime": "2014-12-22T14:47:59.912418701+01:00", - "uid": 1000, - "gid": 100, - "user": "fd0", - "inode": 409704562, - "content": null, - "subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc" - } - ] - } - -A tree contains a list of entries (in the field ``nodes``) which contain -meta data like a name and timestamps. When the entry references a -directory, the field ``subtree`` contains the plain text ID of another -tree object. - -When the command ``restic cat blob`` is used, the plaintext ID is needed -to print a tree. The tree referenced above can be dumped as follows: - -.. code-block:: console - - $ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc - enter password for repository: - { - "nodes": [ - { - "name": "testfile", - "type": "file", - "mode": 420, - "mtime": "2014-12-06T17:50:23.34513538+01:00", - "atime": "2014-12-06T17:50:23.338468713+01:00", - "ctime": "2014-12-06T17:50:23.34513538+01:00", - "uid": 1000, - "gid": 100, - "user": "fd0", - "inode": 416863351, - "size": 1234, - "links": 1, - "content": [ - "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d" - ] - }, - [...] - ] - } - -This tree contains a file entry. This time, the ``subtree`` field is not -present and the ``content`` field contains a list with one plain text -SHA-256 hash. - -The command ``restic cat blob`` can also be used to extract and decrypt -data given a plaintext ID, e.g. for the data mentioned above: - -.. code-block:: console - - $ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum - enter password for repository: - 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d - - -As can be seen from the output of the program ``sha256sum``, the hash -matches the plaintext hash from the map included in the tree above, so -the correct data has been returned. - -Locks -===== - -The restic repository structure is designed in a way that allows -parallel access of multiple instance of restic and even parallel writes. -However, there are some functions that work more efficient or even -require exclusive access of the repository. In order to implement these -functions, restic processes are required to create a lock on the -repository before doing anything. - -Locks come in two types: Exclusive and non-exclusive locks. At most one -process can have an exclusive lock on the repository, and during that -time there must not be any other locks (exclusive and non-exclusive). -There may be multiple non-exclusive locks in parallel. - -A lock is a file in the subdir ``locks`` whose filename is the storage -ID of the contents. It is encrypted and authenticated the same way as -other files in the repository and contains the following JSON structure: - -.. code:: json - - { - "time": "2015-06-27T12:18:51.759239612+02:00", - "exclusive": false, - "hostname": "kasimir", - "username": "fd0", - "pid": 13607, - "uid": 1000, - "gid": 100 - } - -The field ``exclusive`` defines the type of lock. When a new lock is to -be created, restic checks all locks in the repository. When a lock is -found, it is tested if the lock is stale, which is the case for locks -with timestamps older than 30 minutes. If the lock was created on the -same machine, even for younger locks it is tested whether the process is -still alive by sending a signal to it. If that fails, restic assumes -that the process is dead and considers the lock to be stale. - -When a new lock is to be created and no other conflicting locks are -detected, restic creates a new lock, waits, and checks if other locks -appeared in the repository. Depending on the type of the other locks and -the lock to be created, restic either continues or fails. - -Backups and Deduplication -========================= - -For creating a backup, restic scans the source directory for all files, -sub-directories and other entries. The data from each file is split into -variable length Blobs cut at offsets defined by a sliding window of 64 -byte. The implementation uses Rabin Fingerprints for implementing this -Content Defined Chunking (CDC). An irreducible polynomial is selected at -random and saved in the file ``config`` when a repository is -initialized, so that watermark attacks are much harder. - -Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB -in size. The implementation aims for 1 MiB Blob size on average. - -For modified files, only modified Blobs have to be saved in a subsequent -backup. This even works if bytes are inserted or removed at arbitrary -positions within the file. - -Threat Model -============ - -The design goals for restic include being able to securely store backups -in a location that is not completely trusted, e.g. a shared system where -others can potentially access the files or (in the case of the system -administrator) even modify or delete them. - -General assumptions: - -- The host system a backup is created on is trusted. This is the most - basic requirement, and essential for creating trustworthy backups. - -The restic backup program guarantees the following: - -- Accessing the unencrypted content of stored files and metadata should - not be possible without a password for the repository. Everything - except the metadata included for informational purposes in the key - files is encrypted and authenticated. - -- Modifications (intentional or unintentional) can be detected - automatically on several layers: - - 1. For all accesses of data stored in the repository it is checked - whether the cryptographic hash of the contents matches the storage - ID (the file's name). This way, modifications (bad RAM, broken - harddisk) can be detected easily. - - 2. Before decrypting any data, the MAC on the encrypted data is - checked. If there has been a modification, the MAC check will - fail. This step happens even before the data is decrypted, so data - that has been tampered with is not decrypted at all. - -However, the restic backup program is not designed to protect against -attackers deleting files at the storage location. There is nothing that -can be done about this. If this needs to be guaranteed, get a secure -location without any access from third parties. If you assume that -attackers have write access to your files at the storage location, -attackers are able to figure out (e.g. based on the timestamps of the -stored files) which files belong to what snapshot. When only these files -are deleted, the particular snapshot vanished and all snapshots -depending on data that has been added in the snapshot cannot be restored -completely. Restic is not designed to detect this attack. - -************ -Local Cache -*********** - -In order to speed up certain operations, restic manages a local cache of data. -This document describes the data structures for the local cache with version 1. - -Versions -======== - -The cache directory is selected according to the `XDG base dir specification -`__. -Each repository has its own cache sub-directory, consting of the repository ID -which is chosen at ``init``. All cache directories for different repos are -independent of each other. - -The cache dir for a repo contains a file named ``version``, which contains a -single ASCII integer line that stands for the current version of the cache. If -a lower version number is found the cache is recreated with the current -version. If a higher version number is found the cache is ignored and left as -is. - -Snapshots, Data and Indexes -=========================== - -Snapshot, Data and Index files are cached in the sub-directories ``snapshots``, -``data`` and ``index``, as read from the repository. - -Expiry -====== - -Whenever a cache directory for a repo is used, that directory's modification -timestamp is updated to the current time. By looking at the modification -timestamps of the repo cache directories it is easy to decide which directories -are old and haven't been used in a long time. Those are probably stale and can -be removed. - - -************ -REST Backend -************ - -Restic can interact with HTTP Backend that respects the following REST -API. - -The following values are valid for ``{type}``: - - * ``data`` - * ``keys`` - * ``locks`` - * ``snapshots`` - * ``index`` - * ``config`` - -The API version is selected via the ``Accept`` HTTP header in the request. The -following values are defined: - - * ``application/vnd.x.restic.rest.v1`` or empty: Select API version 1 - * ``application/vnd.x.restic.rest.v2``: Select API version 2 - -The server will respond with the value of the highest version it supports in -the ``Content-Type`` HTTP response header for the HTTP requests which should -return JSON. Any different value for this header means API version 1. - -The placeholder ``{path}`` in this document is a path to the repository, so -that multiple different repositories can be accessed. The default path is -``/``. The path must end with a slash. - -POST {path}?create=true -======================= - -This request is used to initially create a new repository. The server -responds with "200 OK" if the repository structure was created -successfully or already exists, otherwise an error is returned. - -DELETE {path} -============= - -Deletes the repository on the server side. The server responds with "200 -OK" if the repository was successfully removed. If this function is not -implemented the server returns "501 Not Implemented", if this it is -denied by the server it returns "403 Forbidden". - -HEAD {path}/config -================== - -Returns "200 OK" if the repository has a configuration, an HTTP error -otherwise. - -GET {path}/config -================= - -Returns the content of the configuration file if the repository has a -configuration, an HTTP error otherwise. - -Response format: binary/octet-stream - -POST {path}/config -================== - -Returns "200 OK" if the configuration of the request body has been -saved, an HTTP error otherwise. - -GET {path}/{type}/ -================== - -API version 1 -------------- - -Returns a JSON array containing the names of all the blobs stored for a given -type, example: - -.. code:: json - - [ - "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b", - "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec", - "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60", - "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649" - ] - -API version 2 -------------- - -Returns a JSON array containing an object for each file of the given type. The -objects have two keys: ``name`` for the file name, and ``size`` for the size in -bytes. - -.. code:: json - - [ - { - "name": "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b", - "size": 2341058 - }, - { - "name": "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec", - "size": 2908900 - }, - { - "name": "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60", - "size": 3030712 - }, - { - "name": "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649", - "size": 2804 - } - ] - -HEAD {path}/{type}/{name} -========================= - -Returns "200 OK" if the blob with the given name and type is stored in -the repository, "404 not found" otherwise. If the blob exists, the HTTP -header ``Content-Length`` is set to the file size. - -GET {path}/{type}/{name} -======================== - -Returns the content of the blob with the given name and type if it is -stored in the repository, "404 not found" otherwise. - -If the request specifies a partial read with a Range header field, then -the status code of the response is 206 instead of 200 and the response -only contains the specified range. - -Response format: binary/octet-stream - -POST {path}/{type}/{name} -========================= - -Saves the content of the request body as a blob with the given name and -type, an HTTP error otherwise. - -Request format: binary/octet-stream - -DELETE {path}/{type}/{name} -=========================== - -Returns "200 OK" if the blob with the given name and type has been -deleted from the repository, an HTTP error otherwise. - - +.. include:: design.rst +.. include:: cache.rst +.. include:: REST_backend.rst diff --git a/doc/REST_backend.rst b/doc/REST_backend.rst new file mode 100644 index 000000000..f9d72cf06 --- /dev/null +++ b/doc/REST_backend.rst @@ -0,0 +1,145 @@ +************ +REST Backend +************ + +Restic can interact with HTTP Backend that respects the following REST +API. + +The following values are valid for ``{type}``: + + * ``data`` + * ``keys`` + * ``locks`` + * ``snapshots`` + * ``index`` + * ``config`` + +The API version is selected via the ``Accept`` HTTP header in the request. The +following values are defined: + + * ``application/vnd.x.restic.rest.v1`` or empty: Select API version 1 + * ``application/vnd.x.restic.rest.v2``: Select API version 2 + +The server will respond with the value of the highest version it supports in +the ``Content-Type`` HTTP response header for the HTTP requests which should +return JSON. Any different value for this header means API version 1. + +The placeholder ``{path}`` in this document is a path to the repository, so +that multiple different repositories can be accessed. The default path is +``/``. The path must end with a slash. + +POST {path}?create=true +======================= + +This request is used to initially create a new repository. The server +responds with "200 OK" if the repository structure was created +successfully or already exists, otherwise an error is returned. + +DELETE {path} +============= + +Deletes the repository on the server side. The server responds with "200 +OK" if the repository was successfully removed. If this function is not +implemented the server returns "501 Not Implemented", if this it is +denied by the server it returns "403 Forbidden". + +HEAD {path}/config +================== + +Returns "200 OK" if the repository has a configuration, an HTTP error +otherwise. + +GET {path}/config +================= + +Returns the content of the configuration file if the repository has a +configuration, an HTTP error otherwise. + +Response format: binary/octet-stream + +POST {path}/config +================== + +Returns "200 OK" if the configuration of the request body has been +saved, an HTTP error otherwise. + +GET {path}/{type}/ +================== + +API version 1 +------------- + +Returns a JSON array containing the names of all the blobs stored for a given +type, example: + +.. code:: json + + [ + "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b", + "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec", + "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60", + "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649" + ] + +API version 2 +------------- + +Returns a JSON array containing an object for each file of the given type. The +objects have two keys: ``name`` for the file name, and ``size`` for the size in +bytes. + +.. code:: json + + [ + { + "name": "245bc4c430d393f74fbe7b13325e30dbde9fb0745e50caad57c446c93d20096b", + "size": 2341058 + }, + { + "name": "85b420239efa1132c41cea0065452a40ebc20c6f8e0b132a5b2f5848360973ec", + "size": 2908900 + }, + { + "name": "8e2006bb5931a520f3c7009fe278d1ebb87eb72c3ff92a50c30e90f1b8cf3e60", + "size": 3030712 + }, + { + "name": "e75c8c407ea31ba399ab4109f28dd18c4c68303d8d86cc275432820c42ce3649", + "size": 2804 + } + ] + +HEAD {path}/{type}/{name} +========================= + +Returns "200 OK" if the blob with the given name and type is stored in +the repository, "404 not found" otherwise. If the blob exists, the HTTP +header ``Content-Length`` is set to the file size. + +GET {path}/{type}/{name} +======================== + +Returns the content of the blob with the given name and type if it is +stored in the repository, "404 not found" otherwise. + +If the request specifies a partial read with a Range header field, then +the status code of the response is 206 instead of 200 and the response +only contains the specified range. + +Response format: binary/octet-stream + +POST {path}/{type}/{name} +========================= + +Saves the content of the request body as a blob with the given name and +type, an HTTP error otherwise. + +Request format: binary/octet-stream + +DELETE {path}/{type}/{name} +=========================== + +Returns "200 OK" if the blob with the given name and type has been +deleted from the repository, an HTTP error otherwise. + + diff --git a/doc/cache.rst b/doc/cache.rst new file mode 100644 index 000000000..1b4a874e0 --- /dev/null +++ b/doc/cache.rst @@ -0,0 +1,36 @@ +*********** +Local Cache +*********** + +In order to speed up certain operations, restic manages a local cache of data. +This document describes the data structures for the local cache with version 1. + +Versions +======== + +The cache directory is selected according to the `XDG base dir specification +`__. +Each repository has its own cache sub-directory, consting of the repository ID +which is chosen at ``init``. All cache directories for different repos are +independent of each other. + +The cache dir for a repo contains a file named ``version``, which contains a +single ASCII integer line that stands for the current version of the cache. If +a lower version number is found the cache is recreated with the current +version. If a higher version number is found the cache is ignored and left as +is. + +Snapshots, Data and Indexes +=========================== + +Snapshot, Data and Index files are cached in the sub-directories ``snapshots``, +``data`` and ``index``, as read from the repository. + +Expiry +====== + +Whenever a cache directory for a repo is used, that directory's modification +timestamp is updated to the current time. By looking at the modification +timestamps of the repo cache directories it is easy to decide which directories +are old and haven't been used in a long time. Those are probably stale and can +be removed. diff --git a/doc/design.rst b/doc/design.rst new file mode 100644 index 000000000..890645649 --- /dev/null +++ b/doc/design.rst @@ -0,0 +1,608 @@ + +Terminology +=========== + +This section introduces terminology used in this document. + +*Repository*: All data produced during a backup is sent to and stored in +a repository in a structured form, for example in a file system +hierarchy with several subdirectories. A repository implementation must +be able to fulfill a number of operations, e.g. list the contents. + +*Blob*: A Blob combines a number of data bytes with identifying +information like the SHA-256 hash of the data and its length. + +*Pack*: A Pack combines one or more Blobs, e.g. in a single file. + +*Snapshot*: A Snapshot stands for the state of a file or directory that +has been backed up at some point in time. The state here means the +content and meta data like the name and modification time for the file +or the directory and its contents. + +*Storage ID*: A storage ID is the SHA-256 hash of the content stored in +the repository. This ID is required in order to load the file from the +repository. + +Repository Format +================= + +All data is stored in a restic repository. A repository is able to store +data of several different types, which can later be requested based on +an ID. This so-called "storage ID" is the SHA-256 hash of the content of +a file. All files in a repository are only written once and never +modified afterwards. This allows accessing and even writing to the +repository with multiple clients in parallel. Only the ``prune`` operation +removes data from the repository. + +Repositories consist of several directories and a top-level file called +``config``. For all other files stored in the repository, the name for +the file is the lower case hexadecimal representation of the storage ID, +which is the SHA-256 hash of the file's contents. This allows for easy +verification of files for accidental modifications, like disk read +errors, by simply running the program ``sha256sum`` on the file and +comparing its output to the file name. If the prefix of a filename is +unique amongst all the other files in the same directory, the prefix may +be used instead of the complete filename. + +Apart from the files stored within the ``keys`` directory, all files are +encrypted with AES-256 in counter mode (CTR). The integrity of the +encrypted data is secured by a Poly1305-AES message authentication code +(sometimes also referred to as a "signature"). + +In the first 16 bytes of each encrypted file the initialisation vector +(IV) is stored. It is followed by the encrypted data and completed by +the 16 byte MAC. The format is: ``IV || CIPHERTEXT || MAC``. The +complete encryption overhead is 32 bytes. For each file, a new random IV +is selected. + +The file ``config`` is encrypted this way and contains a JSON document +like the following: + +.. code:: json + + { + "version": 1, + "id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b", + "chunker_polynomial": "25b468838dcb75" + } + +After decryption, restic first checks that the version field contains a +version number that it understands, otherwise it aborts. At the moment, +the version is expected to be 1. The field ``id`` holds a unique ID +which consists of 32 random bytes, encoded in hexadecimal. This uniquely +identifies the repository, regardless if it is accessed via SFTP or +locally. The field ``chunker_polynomial`` contains a parameter that is +used for splitting large files into smaller chunks (see below). + +Repository Layout +----------------- + +The ``local`` and ``sftp`` backends are implemented using files and +directories stored in a file system. The directory layout is the same +for both backend types. + +The basic layout of a repository stored in a ``local`` or ``sftp`` +backend is shown here: + +:: + + /tmp/restic-repo + ├── config + ├── data + │ ├── 21 + │ │ └── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1 + │ ├── 32 + │ │ └── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5 + │ ├── 59 + │ │ └── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426 + │ ├── 73 + │ │ └── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c + │ [...] + ├── index + │ ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d + │ └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd + ├── keys + │ └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7 + ├── locks + ├── snapshots + │ └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec + └── tmp + +A local repository can be initialized with the ``restic init`` command, +e.g.: + +.. code-block:: console + + $ restic -r /tmp/restic-repo init + +The local and sftp backends will auto-detect and accept all layouts described +in the following sections, so that remote repositories mounted locally e.g. via +fuse can be accessed. The layout auto-detection can be overridden by specifying +the option ``-o local.layout=default``, valid values are ``default`` and +``s3legacy``. The option for the sftp backend is named ``sftp.layout``, for the +s3 backend ``s3.layout``. + +S3 Legacy Layout +---------------- + +Unfortunately during development the AWS S3 backend uses slightly different +paths (directory names use singular instead of plural for ``key``, +``lock``, and ``snapshot`` files), and the data files are stored directly below +the ``data`` directory. The S3 Legacy repository layout looks like this: + +:: + + /config + /data + ├── 2159dd48f8a24f33c307b750592773f8b71ff8d11452132a7b2e2a6a01611be1 + ├── 32ea976bc30771cebad8285cd99120ac8786f9ffd42141d452458089985043a5 + ├── 59fe4bcde59bd6222eba87795e35a90d82cd2f138a27b6835032b7b58173a426 + ├── 73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c + [...] + /index + ├── c38f5fb68307c6a3e3aa945d556e325dc38f5fb68307c6a3e3aa945d556e325d + └── ca171b1b7394d90d330b265d90f506f9984043b342525f019788f97e745c71fd + /key + └── b02de829beeb3c01a63e6b25cbd421a98fef144f03b9a02e46eff9e2ca3f0bd7 + /lock + /snapshot + └── 22a5af1bdc6e616f8a29579458c49627e01b32210d09adb288d1ecda7c5711ec + +The S3 backend understands and accepts both forms, new backends are +always created with the default layout for compatibility reasons. + +Pack Format +=========== + +All files in the repository except Key and Pack files just contain raw +data, stored as ``IV || Ciphertext || MAC``. Pack files may contain one +or more Blobs of data. + +A Pack's structure is as follows: + +:: + + EncryptedBlob1 || ... || EncryptedBlobN || EncryptedHeader || Header_Length + +At the end of the Pack file is a header, which describes the content. +The header is encrypted and authenticated. ``Header_Length`` is the +length of the encrypted header encoded as a four byte integer in +little-endian encoding. Placing the header at the end of a file allows +writing the blobs in a continuous stream as soon as they are read during +the backup phase. This reduces code complexity and avoids having to +re-write a file once the pack is complete and the content and length of +the header is known. + +All the blobs (``EncryptedBlob1``, ``EncryptedBlobN`` etc.) are +authenticated and encrypted independently. This enables repository +reorganisation without having to touch the encrypted Blobs. In addition +it also allows efficient indexing, for only the header needs to be read +in order to find out which Blobs are contained in the Pack. Since the +header is authenticated, authenticity of the header can be checked +without having to read the complete Pack. + +After decryption, a Pack's header consists of the following elements: + +:: + + Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) || + [...] + Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) || + +This is enough to calculate the offsets for all the Blobs in the Pack. +Length is the length of a Blob as a four byte integer in little-endian +format. The type field is a one byte field and labels the content of a +blob according to the following table: + ++--------+-----------+ +| Type | Meaning | ++========+===========+ +| 0 | data | ++--------+-----------+ +| 1 | tree | ++--------+-----------+ + +All other types are invalid, more types may be added in the future. + +For reconstructing the index or parsing a pack without an index, first +the last four bytes must be read in order to find the length of the +header. Afterwards, the header can be read and parsed, which yields all +plaintext hashes, types, offsets and lengths of all included blobs. + +Indexing +======== + +Index files contain information about Data and Tree Blobs and the Packs +they are contained in and store this information in the repository. When +the local cached index is not accessible any more, the index files can +be downloaded and used to reconstruct the index. The files are encrypted +and authenticated like Data and Tree Blobs, so the outer structure is +``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON +document like the following: + +.. code:: json + + { + "supersedes": [ + "ed54ae36197f4745ebc4b54d10e0f623eaaaedd03013eb7ae90df881b7781452" + ], + "packs": [ + { + "id": "73d04e6125cf3c28a299cc2f3cca3b78ceac396e4fcf9575e34536b26782413c", + "blobs": [ + { + "id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce", + "type": "data", + "offset": 0, + "length": 25 + },{ + "id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae", + "type": "tree", + "offset": 38, + "length": 100 + }, + { + "id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66", + "type": "data", + "offset": 150, + "length": 123 + } + ] + }, [...] + ] + } + +This JSON document lists Packs and the blobs contained therein. In this +example, the Pack ``73d04e61`` contains two data Blobs and one Tree +blob, the plaintext hashes are listed afterwards. + +The field ``supersedes`` lists the storage IDs of index files that have +been replaced with the current index file. This happens when index files +are repacked, for example when old snapshots are removed and Packs are +recombined. + +There may be an arbitrary number of index files, containing information +on non-disjoint sets of Packs. The number of packs described in a single +file is chosen so that the file size is kept below 8 MiB. + +Keys, Encryption and MAC +======================== + +All data stored by restic in the repository is encrypted with AES-256 in +counter mode and authenticated using Poly1305-AES. For encrypting new +data first 16 bytes are read from a cryptographically secure +pseudorandom number generator as a random nonce. This is used both as +the IV for counter mode and the nonce for Poly1305. This operation needs +three keys: A 32 byte for AES-256 for encryption, a 16 byte AES key and +a 16 byte key for Poly1305. For details see the original paper `The +Poly1305-AES message-authentication +code `__ by Dan Bernstein. +The data is then encrypted with AES-256 and afterwards a message +authentication code (MAC) is computed over the ciphertext, everything is +then stored as IV \|\| CIPHERTEXT \|\| MAC. + +The directory ``keys`` contains key files. These are simple JSON +documents which contain all data that is needed to derive the +repository's master encryption and message authentication keys from a +user's password. The JSON document from the repository can be +pretty-printed for example by using the Python module ``json`` +(shortened to increase readability): + +:: + + $ python -mjson.tool /tmp/restic-repo/keys/b02de82* + { + "hostname": "kasimir", + "username": "fd0" + "kdf": "scrypt", + "N": 65536, + "r": 8, + "p": 1, + "created": "2015-01-02T18:10:13.48307196+01:00", + "data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==", + "salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==", + } + +When the repository is opened by restic, the user is prompted for the +repository password. This is then used with ``scrypt``, a key derivation +function (KDF), and the supplied parameters (``N``, ``r``, ``p`` and +``salt``) to derive 64 key bytes. The first 32 bytes are used as the +encryption key (for AES-256) and the last 32 bytes are used as the +message authentication key (for Poly1305-AES). These last 32 bytes are +divided into a 16 byte AES key ``k`` followed by 16 bytes of secret key +``r``. The key ``r`` is then masked for use with Poly1305 (see the paper +for details). + +Those keys are used to authenticate and decrypt the bytes contained in +the JSON field ``data`` with AES-256 and Poly1305-AES as if they were +any other blob (after removing the Base64 encoding). If the +password is incorrect or the key file has been tampered with, the +computed MAC will not match the last 16 bytes of the data, and restic +exits with an error. Otherwise, the data yields a JSON document +which contains the master encryption and message authentication keys for +this repository (encoded in Base64). The command +``restic cat masterkey`` can be used as follows to decrypt and +pretty-print the master key: + +.. code-block:: console + + $ restic -r /tmp/restic-repo cat masterkey + { + "mac": { + "k": "evFWd9wWlndL9jc501268g==", + "r": "E9eEDnSJZgqwTOkDtOp+Dw==" + }, + "encrypt": "UQCqa0lKZ94PygPxMRqkePTZnHRYh1k1pX2k2lM2v3Q=", + } + +All data in the repository is encrypted and authenticated with these +master keys. For encryption, the AES-256 algorithm in Counter mode is +used. For message authentication, Poly1305-AES is used as described +above. + +A repository can have several different passwords, with a key file for +each. This way, the password can be changed without having to re-encrypt +all data. + +Snapshots +========= + +A snapshot represents a directory with all files and sub-directories at +a given point in time. For each backup that is made, a new snapshot is +created. A snapshot is a JSON document that is stored in an encrypted +file below the directory ``snapshots`` in the repository. The filename +is the storage ID. This string is unique and used within restic to +uniquely identify a snapshot. + +The command ``restic cat snapshot`` can be used as follows to decrypt +and pretty-print the contents of a snapshot file: + +.. code-block:: console + + $ restic -r /tmp/restic-repo cat snapshot 251c2e58 + enter password for repository: + { + "time": "2015-01-02T18:10:50.895208559+01:00", + "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf", + "dir": "/tmp/testdata", + "hostname": "kasimir", + "username": "fd0", + "uid": 1000, + "gid": 100, + "tags": [ + "NL" + ] + } + +Here it can be seen that this snapshot represents the contents of the +directory ``/tmp/testdata``. The most important field is ``tree``. When +the meta data (e.g. the tags) of a snapshot change, the snapshot needs +to be re-encrypted and saved. This will change the storage ID, so in +order to relate these seemingly different snapshots, a field +``original`` is introduced which contains the ID of the original +snapshot, e.g. after adding the tag ``DE`` to the snapshot above it +becomes: + +.. code-block:: console + + $ restic -r /tmp/restic-repo cat snapshot 22a5af1b + enter password for repository: + { + "time": "2015-01-02T18:10:50.895208559+01:00", + "tree": "2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf", + "dir": "/tmp/testdata", + "hostname": "kasimir", + "username": "fd0", + "uid": 1000, + "gid": 100, + "tags": [ + "NL", + "DE" + ], + "original": "251c2e5841355f743f9d4ffd3260bee765acee40a6229857e32b60446991b837" + } + +Once introduced, the ``original`` field is not modified when the +snapshot's meta data is changed again. + +All content within a restic repository is referenced according to its +SHA-256 hash. Before saving, each file is split into variable sized +Blobs of data. The SHA-256 hashes of all Blobs are saved in an ordered +list which then represents the content of the file. + +In order to relate these plaintext hashes to the actual location within +a Pack file , an index is used. If the index is not available, the +header of all data Blobs can be read. + +Trees and Data +============== + +A snapshot references a tree by the SHA-256 hash of the JSON string +representation of its contents. Trees and data are saved in pack files +in a subdirectory of the directory ``data``. + +The command ``restic cat blob`` can be used to inspect the tree +referenced above (piping the output of the command to ``jq .`` so that +the JSON is indented): + +.. code-block:: console + + $ restic -r /tmp/restic-repo cat blob 2da81727b6585232894cfbb8f8bdab8d1eccd3d8f7c92bc934d62e62e618ffdf | jq . + enter password for repository: + { + "nodes": [ + { + "name": "testdata", + "type": "dir", + "mode": 493, + "mtime": "2014-12-22T14:47:59.912418701+01:00", + "atime": "2014-12-06T17:49:21.748468803+01:00", + "ctime": "2014-12-22T14:47:59.912418701+01:00", + "uid": 1000, + "gid": 100, + "user": "fd0", + "inode": 409704562, + "content": null, + "subtree": "b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc" + } + ] + } + +A tree contains a list of entries (in the field ``nodes``) which contain +meta data like a name and timestamps. When the entry references a +directory, the field ``subtree`` contains the plain text ID of another +tree object. + +When the command ``restic cat blob`` is used, the plaintext ID is needed +to print a tree. The tree referenced above can be dumped as follows: + +.. code-block:: console + + $ restic -r /tmp/restic-repo cat blob b26e315b0988ddcd1cee64c351d13a100fedbc9fdbb144a67d1b765ab280b4dc + enter password for repository: + { + "nodes": [ + { + "name": "testfile", + "type": "file", + "mode": 420, + "mtime": "2014-12-06T17:50:23.34513538+01:00", + "atime": "2014-12-06T17:50:23.338468713+01:00", + "ctime": "2014-12-06T17:50:23.34513538+01:00", + "uid": 1000, + "gid": 100, + "user": "fd0", + "inode": 416863351, + "size": 1234, + "links": 1, + "content": [ + "50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d" + ] + }, + [...] + ] + } + +This tree contains a file entry. This time, the ``subtree`` field is not +present and the ``content`` field contains a list with one plain text +SHA-256 hash. + +The command ``restic cat blob`` can also be used to extract and decrypt +data given a plaintext ID, e.g. for the data mentioned above: + +.. code-block:: console + + $ restic -r /tmp/restic-repo cat blob 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d | sha256sum + enter password for repository: + 50f77b3b4291e8411a027b9f9b9e64658181cc676ce6ba9958b95f268cb1109d - + +As can be seen from the output of the program ``sha256sum``, the hash +matches the plaintext hash from the map included in the tree above, so +the correct data has been returned. + +Locks +===== + +The restic repository structure is designed in a way that allows +parallel access of multiple instance of restic and even parallel writes. +However, there are some functions that work more efficient or even +require exclusive access of the repository. In order to implement these +functions, restic processes are required to create a lock on the +repository before doing anything. + +Locks come in two types: Exclusive and non-exclusive locks. At most one +process can have an exclusive lock on the repository, and during that +time there must not be any other locks (exclusive and non-exclusive). +There may be multiple non-exclusive locks in parallel. + +A lock is a file in the subdir ``locks`` whose filename is the storage +ID of the contents. It is encrypted and authenticated the same way as +other files in the repository and contains the following JSON structure: + +.. code:: json + + { + "time": "2015-06-27T12:18:51.759239612+02:00", + "exclusive": false, + "hostname": "kasimir", + "username": "fd0", + "pid": 13607, + "uid": 1000, + "gid": 100 + } + +The field ``exclusive`` defines the type of lock. When a new lock is to +be created, restic checks all locks in the repository. When a lock is +found, it is tested if the lock is stale, which is the case for locks +with timestamps older than 30 minutes. If the lock was created on the +same machine, even for younger locks it is tested whether the process is +still alive by sending a signal to it. If that fails, restic assumes +that the process is dead and considers the lock to be stale. + +When a new lock is to be created and no other conflicting locks are +detected, restic creates a new lock, waits, and checks if other locks +appeared in the repository. Depending on the type of the other locks and +the lock to be created, restic either continues or fails. + +Backups and Deduplication +========================= + +For creating a backup, restic scans the source directory for all files, +sub-directories and other entries. The data from each file is split into +variable length Blobs cut at offsets defined by a sliding window of 64 +byte. The implementation uses Rabin Fingerprints for implementing this +Content Defined Chunking (CDC). An irreducible polynomial is selected at +random and saved in the file ``config`` when a repository is +initialized, so that watermark attacks are much harder. + +Files smaller than 512 KiB are not split, Blobs are of 512 KiB to 8 MiB +in size. The implementation aims for 1 MiB Blob size on average. + +For modified files, only modified Blobs have to be saved in a subsequent +backup. This even works if bytes are inserted or removed at arbitrary +positions within the file. + +Threat Model +============ + +The design goals for restic include being able to securely store backups +in a location that is not completely trusted, e.g. a shared system where +others can potentially access the files or (in the case of the system +administrator) even modify or delete them. + +General assumptions: + +- The host system a backup is created on is trusted. This is the most + basic requirement, and essential for creating trustworthy backups. + +The restic backup program guarantees the following: + +- Accessing the unencrypted content of stored files and metadata should + not be possible without a password for the repository. Everything + except the metadata included for informational purposes in the key + files is encrypted and authenticated. + +- Modifications (intentional or unintentional) can be detected + automatically on several layers: + + 1. For all accesses of data stored in the repository it is checked + whether the cryptographic hash of the contents matches the storage + ID (the file's name). This way, modifications (bad RAM, broken + harddisk) can be detected easily. + + 2. Before decrypting any data, the MAC on the encrypted data is + checked. If there has been a modification, the MAC check will + fail. This step happens even before the data is decrypted, so data + that has been tampered with is not decrypted at all. + +However, the restic backup program is not designed to protect against +attackers deleting files at the storage location. There is nothing that +can be done about this. If this needs to be guaranteed, get a secure +location without any access from third parties. If you assume that +attackers have write access to your files at the storage location, +attackers are able to figure out (e.g. based on the timestamps of the +stored files) which files belong to what snapshot. When only these files +are deleted, the particular snapshot vanished and all snapshots +depending on data that has been added in the snapshot cannot be restored +completely. Restic is not designed to detect this attack. +