doc: Add repository compression support documentation
Co-authored-by: Michael Eischer <michael.eischer@fau.de>
This commit is contained in:
parent
4e1ef7804a
commit
270ed00d1f
1 changed files with 88 additions and 33 deletions
121
doc/design.rst
121
doc/design.rst
|
@ -62,18 +62,21 @@ like the following:
|
||||||
.. code:: json
|
.. code:: json
|
||||||
|
|
||||||
{
|
{
|
||||||
"version": 1,
|
"version": 2,
|
||||||
"id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
|
"id": "5956a3f67a6230d4a92cefb29529f10196c7d92582ec305fd71ff6d331d6271b",
|
||||||
"chunker_polynomial": "25b468838dcb75"
|
"chunker_polynomial": "25b468838dcb75"
|
||||||
}
|
}
|
||||||
|
|
||||||
After decryption, restic first checks that the version field contains a
|
After decryption, restic first checks that the version field contains a
|
||||||
version number that it understands, otherwise it aborts. At the moment,
|
version number that it understands, otherwise it aborts. At the moment, the
|
||||||
the version is expected to be 1. The field ``id`` holds a unique ID
|
version is expected to be 1 or 2. The list of changes in the repository
|
||||||
which consists of 32 random bytes, encoded in hexadecimal. This uniquely
|
format is contained in the section "Changes" below.
|
||||||
identifies the repository, regardless if it is accessed via SFTP or
|
|
||||||
locally. The field ``chunker_polynomial`` contains a parameter that is
|
The field ``id`` holds a unique ID which consists of 32 random bytes, encoded
|
||||||
used for splitting large files into smaller chunks (see below).
|
in hexadecimal. This uniquely identifies the repository, regardless if it is
|
||||||
|
accessed via SFTP or locally. The field ``chunker_polynomial`` contains a
|
||||||
|
parameter that is used for splitting large files into smaller chunks (see
|
||||||
|
below).
|
||||||
|
|
||||||
Repository Layout
|
Repository Layout
|
||||||
-----------------
|
-----------------
|
||||||
|
@ -186,40 +189,75 @@ After decryption, a Pack's header consists of the following elements:
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
Type_Blob1 || Length(EncryptedBlob1) || Hash(Plaintext_Blob1) ||
|
Type_Blob1 || Data_Blob1 ||
|
||||||
[...]
|
[...]
|
||||||
Type_BlobN || Length(EncryptedBlobN) || Hash(Plaintext_Blobn) ||
|
Type_BlobN || Data_BlobN ||
|
||||||
|
|
||||||
|
The Blob type field is a single byte. What follows it depends on the type. The
|
||||||
|
following Blob types are defined:
|
||||||
|
|
||||||
|
+-----------+----------------------+-------------------------------------------------------------------------------+
|
||||||
|
| Type | Meaning | Data |
|
||||||
|
+===========+======================+===============================================================================+
|
||||||
|
| 0b00 | data blob | ``Length(encrypted_blob) || Hash(plaintext_blob)`` |
|
||||||
|
+-----------+----------------------+-------------------------------------------------------------------------------+
|
||||||
|
| 0b01 | tree blob | ``Length(encrypted_blob) || Hash(plaintext_blob)`` |
|
||||||
|
+-----------+----------------------+-------------------------------------------------------------------------------+
|
||||||
|
| 0b10 | compressed data blob | ``Length(encrypted_blob) || Length(plaintext_blob) || Hash(plaintext_blob)`` |
|
||||||
|
+-----------+----------------------+-------------------------------------------------------------------------------+
|
||||||
|
| 0b11 | compressed tree blob | ``Length(encrypted_blob) || Length(plaintext_blob) || Hash(plaintext_blob)`` |
|
||||||
|
+-----------+----------------------+-------------------------------------------------------------------------------+
|
||||||
|
|
||||||
This is enough to calculate the offsets for all the Blobs in the Pack.
|
This is enough to calculate the offsets for all the Blobs in the Pack.
|
||||||
Length is the length of a Blob as a four byte integer in little-endian
|
The length fields are encoded as four byte integers in little-endian
|
||||||
format. The type field is a one byte field and labels the content of a
|
format. In the Data column, ``Length(plaintext_blob)`` means the length
|
||||||
blob according to the following table:
|
of the decrypted and uncompressed data a blob consists of.
|
||||||
|
|
||||||
+--------+-----------+
|
All other types are invalid, more types may be added in the future. The
|
||||||
| Type | Meaning |
|
compressed types are only valid for repository format version 2. Data and
|
||||||
+========+===========+
|
tree blobs may be compressed with the zstandard compression algorithm.
|
||||||
| 0 | data |
|
|
||||||
+--------+-----------+
|
|
||||||
| 1 | tree |
|
|
||||||
+--------+-----------+
|
|
||||||
|
|
||||||
All other types are invalid, more types may be added in the future.
|
In repository format version 1, data and tree blobs should be stored in
|
||||||
|
separate pack files. In version 2, they must be stored in separate files.
|
||||||
|
Compressed and non-compress blobs of the same type may be mixed in a pack
|
||||||
|
file.
|
||||||
|
|
||||||
For reconstructing the index or parsing a pack without an index, first
|
For reconstructing the index or parsing a pack without an index, first
|
||||||
the last four bytes must be read in order to find the length of the
|
the last four bytes must be read in order to find the length of the
|
||||||
header. Afterwards, the header can be read and parsed, which yields all
|
header. Afterwards, the header can be read and parsed, which yields all
|
||||||
plaintext hashes, types, offsets and lengths of all included blobs.
|
plaintext hashes, types, offsets and lengths of all included blobs.
|
||||||
|
|
||||||
|
Unpacked Data Format
|
||||||
|
====================
|
||||||
|
|
||||||
|
Individual files for the index, locks or snapshots are encrypted
|
||||||
|
and authenticated like Data and Tree Blobs, so the outer structure is
|
||||||
|
``IV || Ciphertext || MAC`` again. In repository format version 1 the
|
||||||
|
plaintext always consists of a JSON document which must either be an
|
||||||
|
object or an array.
|
||||||
|
|
||||||
|
Repository format version 2 adds support for compression. The plaintext
|
||||||
|
now starts with a header to indicate the encoding version to distinguish
|
||||||
|
it from plain JSON and to allow for further evolution of the storage format:
|
||||||
|
``encoding_version || data``
|
||||||
|
The ``encoding_version`` field is encoded as one byte.
|
||||||
|
For backwards compatibility the encoding versions '[' (0x5b) and '{' (0x7b)
|
||||||
|
are used to mark that the whole plaintext (including the encoding version
|
||||||
|
byte) should treated as JSON document.
|
||||||
|
|
||||||
|
For new data the encoding version is currently always ``2``. For that
|
||||||
|
version ``data`` contains a JSON document compressed using the zstandard
|
||||||
|
compression algorithm.
|
||||||
|
|
||||||
Indexing
|
Indexing
|
||||||
========
|
========
|
||||||
|
|
||||||
Index files contain information about Data and Tree Blobs and the Packs
|
Index files contain information about Data and Tree Blobs and the Packs
|
||||||
they are contained in and store this information in the repository. When
|
they are contained in and store this information in the repository. When
|
||||||
the local cached index is not accessible any more, the index files can
|
the local cached index is not accessible any more, the index files can
|
||||||
be downloaded and used to reconstruct the index. The files are encrypted
|
be downloaded and used to reconstruct the index. The file encoding is
|
||||||
and authenticated like Data and Tree Blobs, so the outer structure is
|
described in the "Unpacked Data Format" section. The plaintext consists
|
||||||
``IV || Ciphertext || MAC`` again. The plaintext consists of a JSON
|
of a JSON document like the following:
|
||||||
document like the following:
|
|
||||||
|
|
||||||
.. code:: json
|
.. code:: json
|
||||||
|
|
||||||
|
@ -235,18 +273,22 @@ document like the following:
|
||||||
"id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
|
"id": "3ec79977ef0cf5de7b08cd12b874cd0f62bbaf7f07f3497a5b1bbcc8cb39b1ce",
|
||||||
"type": "data",
|
"type": "data",
|
||||||
"offset": 0,
|
"offset": 0,
|
||||||
"length": 25
|
"length": 38,
|
||||||
},{
|
// no 'uncompressed_length' as blob is not compressed
|
||||||
|
},
|
||||||
|
{
|
||||||
"id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
|
"id": "9ccb846e60d90d4eb915848add7aa7ea1e4bbabfc60e573db9f7bfb2789afbae",
|
||||||
"type": "tree",
|
"type": "tree",
|
||||||
"offset": 38,
|
"offset": 38,
|
||||||
"length": 100
|
"length": 112,
|
||||||
|
"uncompressed_length": 511,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
|
"id": "d3dc577b4ffd38cc4b32122cabf8655a0223ed22edfd93b353dc0c3f2b0fdf66",
|
||||||
"type": "data",
|
"type": "data",
|
||||||
"offset": 150,
|
"offset": 150,
|
||||||
"length": 123
|
"length": 123,
|
||||||
|
"uncompressed_length": 234,
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}, [...]
|
}, [...]
|
||||||
|
@ -255,7 +297,11 @@ document like the following:
|
||||||
|
|
||||||
This JSON document lists Packs and the blobs contained therein. In this
|
This JSON document lists Packs and the blobs contained therein. In this
|
||||||
example, the Pack ``73d04e61`` contains two data Blobs and one Tree
|
example, the Pack ``73d04e61`` contains two data Blobs and one Tree
|
||||||
blob, the plaintext hashes are listed afterwards.
|
blob, the plaintext hashes are listed afterwards. The ``length`` field
|
||||||
|
corresponds to ``Length(encrypted_blob)`` in the pack file header.
|
||||||
|
Field ``uncompressed_length`` is only present for compressed blobs and
|
||||||
|
therefore is never present in version 1. It is set to the value of
|
||||||
|
``Length(blob)``.
|
||||||
|
|
||||||
The field ``supersedes`` lists the storage IDs of index files that have
|
The field ``supersedes`` lists the storage IDs of index files that have
|
||||||
been replaced with the current index file. This happens when index files
|
been replaced with the current index file. This happens when index files
|
||||||
|
@ -350,8 +396,9 @@ Snapshots
|
||||||
|
|
||||||
A snapshot represents a directory with all files and sub-directories at
|
A snapshot represents a directory with all files and sub-directories at
|
||||||
a given point in time. For each backup that is made, a new snapshot is
|
a given point in time. For each backup that is made, a new snapshot is
|
||||||
created. A snapshot is a JSON document that is stored in an encrypted
|
created. A snapshot is a JSON document that is stored in a file below
|
||||||
file below the directory ``snapshots`` in the repository. The filename
|
the directory ``snapshots`` in the repository. It uses the file encoding
|
||||||
|
described in the "Unpacked Data Format" section. The filename
|
||||||
is the storage ID. This string is unique and used within restic to
|
is the storage ID. This string is unique and used within restic to
|
||||||
uniquely identify a snapshot.
|
uniquely identify a snapshot.
|
||||||
|
|
||||||
|
@ -517,8 +564,8 @@ time there must not be any other locks (exclusive and non-exclusive).
|
||||||
There may be multiple non-exclusive locks in parallel.
|
There may be multiple non-exclusive locks in parallel.
|
||||||
|
|
||||||
A lock is a file in the subdir ``locks`` whose filename is the storage
|
A lock is a file in the subdir ``locks`` whose filename is the storage
|
||||||
ID of the contents. It is encrypted and authenticated the same way as
|
ID of the contents. It is stored in the file encoding described in the
|
||||||
other files in the repository and contains the following JSON structure:
|
"Unpacked Data Format" section and contains the following JSON structure:
|
||||||
|
|
||||||
.. code:: json
|
.. code:: json
|
||||||
|
|
||||||
|
@ -721,3 +768,11 @@ An adversary who has a leaked (decrypted) key for a repository could:
|
||||||
only be done using the ``copy`` command, which moves the data into a new
|
only be done using the ``copy`` command, which moves the data into a new
|
||||||
repository with a new master key, or by making a completely new repository
|
repository with a new master key, or by making a completely new repository
|
||||||
and new backup.
|
and new backup.
|
||||||
|
|
||||||
|
Changes
|
||||||
|
=======
|
||||||
|
|
||||||
|
Repository Version 2
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
* Support compression for blobs (data/tree) and index / lock / snapshot files
|
||||||
|
|
Loading…
Reference in a new issue