forked from TrueCloudLab/rclone
chunker: finish meta-format before release
changes: - chunker: remove GetTier and SetTier - remove wdmrcompat metaformat - remove fastopen strategy - make hash_type option non-advanced - adverise hash support when possible - add metadata field "ver", run strict checks - describe internal behavior in comments - improve documentation note: wdmrcompat used to write file name in the metadata, so maximum metadata size was 1K; removing it allows to cap size by 200 bytes now.
This commit is contained in:
parent
c41812fc88
commit
ccecfa9cb1
5 changed files with 303 additions and 312 deletions
|
@ -4,11 +4,11 @@ description: "Split-chunking overlay remote"
|
|||
date: "2019-08-30"
|
||||
---
|
||||
|
||||
<i class="fa fa-cut"></i>Chunker
|
||||
<i class="fa fa-cut"></i>Chunker (BETA)
|
||||
----------------------------------------
|
||||
|
||||
The `chunker` overlay transparently splits large files into smaller chunks
|
||||
during the upload to wrapped remote and transparently assembles them back
|
||||
during upload to wrapped remote and transparently assembles them back
|
||||
when the file is downloaded. This allows to effectively overcome size limits
|
||||
imposed by storage providers.
|
||||
|
||||
|
@ -41,10 +41,27 @@ Storage> chunker
|
|||
Remote to chunk/unchunk.
|
||||
Normally should contain a ':' and a path, eg "myremote:path/to/dir",
|
||||
"myremote:bucket" or maybe "myremote:" (not recommended).
|
||||
Enter a string value. Press Enter for the default ("").
|
||||
remote> remote:path
|
||||
Files larger than chunk_size will be split in chunks. By default 2 Gb.
|
||||
Files larger than chunk size will be split in chunks.
|
||||
Enter a size with suffix k,M,G,T. Press Enter for the default ("2G").
|
||||
chunk_size> 1G
|
||||
chunk_size> 100M
|
||||
Choose how chunker handles hash sums.
|
||||
Enter a string value. Press Enter for the default ("md5").
|
||||
Choose a number from below, or type in your own value
|
||||
/ Chunker can pass any hash supported by wrapped remote
|
||||
1 | for a single-chunk file but returns nothing otherwise.
|
||||
\ "none"
|
||||
2 / MD5 for multi-chunk files. Requires "simplejson".
|
||||
\ "md5"
|
||||
3 / SHA1 for multi-chunk files. Requires "simplejson".
|
||||
\ "sha1"
|
||||
/ Copying a file to chunker will request MD5 from the source
|
||||
4 | falling back to SHA1 if unsupported. Requires "simplejson".
|
||||
\ "md5quick"
|
||||
5 / Similar to "md5quick" but prefers SHA1 over MD5. Requires "simplejson".
|
||||
\ "sha1quick"
|
||||
hash_type> md5
|
||||
Edit advanced config? (y/n)
|
||||
y) Yes
|
||||
n) No
|
||||
|
@ -53,8 +70,9 @@ Remote config
|
|||
--------------------
|
||||
[overlay]
|
||||
type = chunker
|
||||
remote = TestLocal:
|
||||
chunk_size = 2G
|
||||
remote = remote:bucket
|
||||
chunk_size = 100M
|
||||
hash_type = md5
|
||||
--------------------
|
||||
y) Yes this is OK
|
||||
e) Edit this remote
|
||||
|
@ -73,8 +91,8 @@ will put files in a directory called `name` in the current directory.
|
|||
|
||||
### Chunking
|
||||
|
||||
When rclone starts a file upload, chunker checks the file size.
|
||||
If it doesn't exceed the configured chunk size, chunker will just pass it
|
||||
When rclone starts a file upload, chunker checks the file size. If it
|
||||
doesn't exceed the configured chunk size, chunker will just pass the file
|
||||
to the wrapped remote. If a file is large, chunker will transparently cut
|
||||
data in pieces with temporary names and stream them one by one, on the fly.
|
||||
Each chunk will contain the specified number of data byts, except for the
|
||||
|
@ -84,7 +102,7 @@ a temporary copy, record its size and repeat the above process.
|
|||
When upload completes, temporary chunk files are finally renamed.
|
||||
This scheme guarantees that operations look from outside as atomic.
|
||||
A similar method with hidden temporary chunks is used for other operations
|
||||
(copy/move/rename etc). If operation fails, hidden chunks are normally
|
||||
(copy/move/rename etc). If an operation fails, hidden chunks are normally
|
||||
destroyed, and the destination composite file stays intact.
|
||||
|
||||
#### Chunk names
|
||||
|
@ -94,58 +112,52 @@ By default chunk names are `BIG_FILE_NAME.rclone-chunk.001`,
|
|||
format is `*.rclone-chunk.###`. You can configure another name format
|
||||
using the `--chunker-name-format` option. The format uses asterisk
|
||||
`*` as a placeholder for the base file name and one or more consecutive
|
||||
hash characters `#` as a placeholder for the chunk number. There must be
|
||||
one and only one asterisk. The number of consecutive hashes defines the
|
||||
minimum length of a string representing a chunk number. If a chunk number
|
||||
has less digits than the number of hashes, it is left-padded by zeros.
|
||||
If there are more digits in the number, they are left as is.
|
||||
hash characters `#` as a placeholder for sequential chunk number.
|
||||
There must be one and only one asterisk. The number of consecutive hash
|
||||
characters defines the minimum length of a string representing a chunk number.
|
||||
If decimal chunk number has less digits than the number of hashes, it is
|
||||
left-padded by zeros. If the number stringis longer, it is left intact.
|
||||
By default numbering starts from 1 but there is another option that allows
|
||||
user to start from 0, eg. for compatibility with legacy software.
|
||||
|
||||
For example, if name format is `big_*-##.part`, and original file was
|
||||
named `data.txt` and numbering starts from 0, then the first chunk will be
|
||||
named `big_data.txt-00.part`, the 99th chunk will be `big_data.txt-98.part`
|
||||
and the 302nd chunk will be `big_data.txt-301.part`.
|
||||
For example, if name format is `big_*-##.part` and original file name is
|
||||
`data.txt` and numbering starts from 0, then the first chunk will be named
|
||||
`big_data.txt-00.part`, the 99th chunk will be `big_data.txt-98.part`
|
||||
and the 302nd chunk will become `big_data.txt-301.part`.
|
||||
|
||||
Would-be chunk files are ignored if their name does not match given format.
|
||||
The list command might encounter composite files with missinng or invalid
|
||||
chunks. By default, if chunker detects a missing chunk it will silently
|
||||
ignore the whole group. Use the `--chunker-fail-on-bad-chunks` flag
|
||||
to make it fail with an error message.
|
||||
When the `list` rclone command scans a directory on wrapped remote, the
|
||||
potential chunk files are accounted for and merged into composite directory
|
||||
entries only if their names match the configured format. All other files
|
||||
are ignored, including temporary chunks.
|
||||
The list command might encounter composite files with missing or invalid
|
||||
chunks. If chunker detects a missing chunk it will by default silently
|
||||
ignore the whole group. You can use the `--chunker-fail-on-bad-chunks`
|
||||
command line flag to make `list` fail with an error message.
|
||||
|
||||
|
||||
### Metadata
|
||||
|
||||
By default when a file is large enough, chunker will create a metadata
|
||||
object besides data chunks. The object is named after the original file.
|
||||
Chunker allows to choose between few metadata formats. Please note that
|
||||
currently metadata is not created for files smaller than configured
|
||||
chunk size. This may change in future as new formats are developed.
|
||||
Chunker allows user to disable metadata completely (the `none` format).
|
||||
Please note that currently metadata is not created for files smaller
|
||||
than configured chunk size. This may change in future as new formats
|
||||
are developed.
|
||||
|
||||
#### Simple JSON metadata format
|
||||
|
||||
This is the default format. It supports hash sums and chunk validation
|
||||
for composite files. Meta objects carry the following fields:
|
||||
|
||||
- `size` - total size of chunks
|
||||
- `nchunks` - number of chunks
|
||||
- `md5` - MD5 hashsum (if present)
|
||||
- `ver` - version of format, currently `1`
|
||||
- `size` - total size of composite file
|
||||
- `nchunks` - number of chunks in the file
|
||||
- `md5` - MD5 hashsum of composite file (if present)
|
||||
- `sha1` - SHA1 hashsum (if present)
|
||||
|
||||
There is no field for composite file name as it's simply equal to the name
|
||||
of meta object on the wrapped remote. Please refer to respective sections
|
||||
for detils on hashsums and modified time handling.
|
||||
|
||||
#### WedDavMailRu compatible metadata format
|
||||
|
||||
The `wdmrcompat` metadata format is only useful to support historical files
|
||||
created by [WebDriveMailru](https://github.com/yar229/WebDavMailRuCloud).
|
||||
It keeps the following fields (most are ignored, though):
|
||||
|
||||
- `Name` - name of the composite file (always equal to the meta file name)
|
||||
- `Size` - total size of chunks
|
||||
- `PublicKey` - ignored, always "null"
|
||||
- `CreationDate` - last modification (sic!) time, ignored.
|
||||
for detils on hashsums and handling of modified time.
|
||||
|
||||
#### No metadata
|
||||
|
||||
|
@ -161,8 +173,8 @@ errors (especially missing last chunk) than metadata-enabled formats.
|
|||
### Hashsums
|
||||
|
||||
Chunker supports hashsums only when a compatible metadata is present.
|
||||
Thus, if you choose metadata format of `none` or `wdmrcompat`, chunker
|
||||
will return `UNSUPPORTED` as hashsum.
|
||||
Thus, if you choose metadata format of `none`, chunker will return
|
||||
`UNSUPPORTED` as hashsum.
|
||||
|
||||
Please note that metadata is stored only for composite files. If a file
|
||||
is small (smaller than configured chunk size), chunker will transparently
|
||||
|
@ -175,16 +187,16 @@ Currently you can choose one or another but not both.
|
|||
MD5 is set by default as the most supported type.
|
||||
Since chunker keeps hashes for composite files and falls back to the
|
||||
wrapped remote hash for small ones, we advise you to choose the same
|
||||
hash type as wrapped remote, so your file listings look coherent.
|
||||
hash type as wrapped remote so that your file listings look coherent.
|
||||
|
||||
Normally, when a file is copied to chunker controlled remote, chunker
|
||||
will ask its source for compatible file hash and revert to on-the-fly
|
||||
Normally, when a file is copied to a chunker controlled remote, chunker
|
||||
will ask the file source for compatible file hash and revert to on-the-fly
|
||||
calculation if none is found. This involves some CPU overhead but provides
|
||||
a guarantee that given hashsum is available. Also, chunker will reject
|
||||
a server-side copy or move operation if source and destination hashsum
|
||||
types are different, resulting in the extra network bandwidth, too.
|
||||
In some rare cases this may be undesired, so chunker provides two optional
|
||||
choices: `sha1quick` and `md5quick`. If source does not have the primary
|
||||
choices: `sha1quick` and `md5quick`. If the source does not support primary
|
||||
hash type and the quick mode is enabled, chunker will try to fall back to
|
||||
the secondary type. This will save CPU and bandwidth but can result in empty
|
||||
hashsums at destination. Beware of consequences: the `sync` command will
|
||||
|
@ -215,13 +227,14 @@ chunk naming scheme is to:
|
|||
hash type, chunk naming etc.
|
||||
- Now run `rclone sync oldchunks: newchunks:` and all your data
|
||||
will be transparently converted at transfer.
|
||||
This may take some time.
|
||||
This may take some time, yet chunker will try server-side
|
||||
copy if possible.
|
||||
- After checking data integrity you may remove configuration section
|
||||
of the old remote.
|
||||
|
||||
If rclone gets killed during a long operation on a big composite file,
|
||||
hidden temporary chunks may stay in the directory. They will not be
|
||||
shown by the list command but will eat up your account quota.
|
||||
shown by the `list` command but will eat up your account quota.
|
||||
Please note that the `deletefile` rclone command deletes only active
|
||||
chunks of a file. As a workaround, you can use remote of the wrapped
|
||||
file system to see them.
|
||||
|
@ -234,17 +247,18 @@ remove everything including garbage.
|
|||
### Caveats and Limitations
|
||||
|
||||
Chunker requires wrapped remote to support server side `move` (or `copy` +
|
||||
delete) operations, otherwise it will explicitly refuse to start.
|
||||
`delete`) operations, otherwise it will explicitly refuse to start.
|
||||
This is because it internally renames temporary chunk files to their final
|
||||
names when an operation completes successfully.
|
||||
|
||||
Note that moves done using the copy-and-delete method may incur double
|
||||
charging with some cloud storage providers.
|
||||
Note that a move implemented using the copy-and-delete method may incur
|
||||
double charging with some cloud storage providers.
|
||||
|
||||
Chunker will not automatically rename existing chunks when you change the
|
||||
chunk name format. Beware that in result of this some files which have been
|
||||
treated as chunks before the change can pop up in directory listings as
|
||||
normal files and vice versa. The same warning holds for the chunk size.
|
||||
Chunker will not automatically rename existing chunks when you run
|
||||
`rclone config` on a live remote and change the chunk name format.
|
||||
Beware that in result of this some files which have been treated as chunks
|
||||
before the change can pop up in directory listings as normal files
|
||||
and vice versa. The same warning holds for the chunk size.
|
||||
If you desperately need to change critical chunking setings, you should
|
||||
run data migration as described in a dedicated section.
|
||||
|
||||
|
@ -278,6 +292,28 @@ Files larger than chunk size will be split in chunks.
|
|||
- Type: SizeSuffix
|
||||
- Default: 2G
|
||||
|
||||
#### --chunker-hash-type
|
||||
|
||||
Choose how chunker handles hash sums.
|
||||
|
||||
- Config: hash_type
|
||||
- Env Var: RCLONE_CHUNKER_HASH_TYPE
|
||||
- Type: string
|
||||
- Default: "md5"
|
||||
- Examples:
|
||||
- "none"
|
||||
- Chunker can pass any hash supported by wrapped remote
|
||||
- for a single-chunk file but returns nothing otherwise.
|
||||
- "md5"
|
||||
- MD5 for multi-chunk files. Requires "simplejson".
|
||||
- "sha1"
|
||||
- SHA1 for multi-chunk files. Requires "simplejson".
|
||||
- "md5quick"
|
||||
- Copying a file to chunker will request MD5 from the source
|
||||
- falling back to SHA1 if unsupported. Requires "simplejson".
|
||||
- "sha1quick"
|
||||
- Similar to "md5quick" but prefers SHA1 over MD5. Requires "simplejson".
|
||||
|
||||
### Advanced Options
|
||||
|
||||
Here are the advanced options specific to chunker (Transparently chunk/split large files).
|
||||
|
@ -321,33 +357,6 @@ Metadata is a small JSON file named after the composite file.
|
|||
- "simplejson"
|
||||
- Simple JSON supports hash sums and chunk validation.
|
||||
- It has the following fields: size, nchunks, md5, sha1.
|
||||
- "wdmrcompat"
|
||||
- This format brings compatibility with WebDavMailRuCloud.
|
||||
- It does not support hash sums or validation, most fields are ignored.
|
||||
- It has the following fields: Name, Size, PublicKey, CreationDate.
|
||||
- Requires hash type "none".
|
||||
|
||||
#### --chunker-hash-type
|
||||
|
||||
Choose how chunker handles hash sums.
|
||||
|
||||
- Config: hash_type
|
||||
- Env Var: RCLONE_CHUNKER_HASH_TYPE
|
||||
- Type: string
|
||||
- Default: "md5"
|
||||
- Examples:
|
||||
- "none"
|
||||
- Chunker can pass any hash supported by wrapped remote
|
||||
- for a single-chunk file but returns nothing otherwise.
|
||||
- "md5"
|
||||
- MD5 for multi-chunk files. Requires "simplejson".
|
||||
- "sha1"
|
||||
- SHA1 for multi-chunk files. Requires "simplejson".
|
||||
- "md5quick"
|
||||
- When a file is copied on to chunker, MD5 is taken from its source
|
||||
- falling back to SHA1 if the source doesn't support it. Requires "simplejson".
|
||||
- "sha1quick"
|
||||
- Similar to "md5quick" but prefers SHA1 over MD5. Requires "simplejson".
|
||||
|
||||
#### --chunker-fail-on-bad-chunks
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue