s3: update docs with a Reducing Costs section - Fixes #2889
This commit is contained in:
parent
979bb07c86
commit
506342317b
1 changed files with 81 additions and 19 deletions
|
@ -248,25 +248,6 @@ d) Delete this remote
|
|||
y/e/d>
|
||||
```
|
||||
|
||||
### --fast-list ###
|
||||
|
||||
This remote supports `--fast-list` which allows you to use fewer
|
||||
transactions in exchange for more memory. See the [rclone
|
||||
docs](/docs/#fast-list) for more details.
|
||||
|
||||
### --update and --use-server-modtime ###
|
||||
|
||||
As noted below, the modified time is stored on metadata on the object. It is
|
||||
used by default for all operations that require checking the time a file was
|
||||
last updated. It allows rclone to treat the remote more like a true filesystem,
|
||||
but it is inefficient because it requires an extra API call to retrieve the
|
||||
metadata.
|
||||
|
||||
For many operations, the time the object was last uploaded to the remote is
|
||||
sufficient to determine if it is "dirty". By using `--update` along with
|
||||
`--use-server-modtime`, you can avoid the extra API call and simply upload
|
||||
files whose local modtime is newer than the time it was last uploaded.
|
||||
|
||||
### Modified time ###
|
||||
|
||||
The modified time is stored as metadata on the object as
|
||||
|
@ -280,6 +261,87 @@ storage the object will be uploaded rather than copied.
|
|||
Note that reading this from the object takes an additional `HEAD`
|
||||
request as the metadata isn't returned in object listings.
|
||||
|
||||
### Reducing costs
|
||||
|
||||
#### Avoiding HEAD requests to read the modification time
|
||||
|
||||
By default rclone will use the modification time of objects stored in
|
||||
S3 for syncing. This is stored in object metadata which unfortunately
|
||||
takes an extra HEAD request to read which can be expensive (in time
|
||||
and money).
|
||||
|
||||
The modification time is used by default for all operations that
|
||||
require checking the time a file was last updated. It allows rclone to
|
||||
treat the remote more like a true filesystem, but it is inefficient on
|
||||
S3 because it requires an extra API call to retrieve the metadata.
|
||||
|
||||
The extra API calls can be avoided when syncing (using `rclone sync`
|
||||
or `rclone copy`) in a few different ways, each with its own
|
||||
tradeoffs.
|
||||
|
||||
- `--size-only`
|
||||
- Only checks the size of files.
|
||||
- Uses no extra transactions.
|
||||
- If the file doesn't change size then rclone won't detect it has
|
||||
changed.
|
||||
- `rclone sync --size-only /path/to/source s3:bucket`
|
||||
- `--checksum`
|
||||
- Checks the size and MD5 checksum of files.
|
||||
- Uses no extra transactions.
|
||||
- The most accurate detection of changes possible.
|
||||
- Will cause the source to read an MD5 checksum which, if it is a
|
||||
local disk, will cause lots of disk activity.
|
||||
- If the source and destination are both S3 this is the
|
||||
**recommended** flag to use for maximum efficiency.
|
||||
- `rclone sync --checksum /path/to/source s3:bucket`
|
||||
- `--update --use-server-modtime`
|
||||
- Uses no extra transactions.
|
||||
- Modification time becomes the time the object was uploaded.
|
||||
- For many operations this is sufficient to determine if it needs
|
||||
uploading.
|
||||
- Using `--update` along with `--use-server-modtime`, avoids the
|
||||
extra API call and uploads files whose local modification time
|
||||
is newer than the time it was last uploaded.
|
||||
- Files created with timestamps in the past will be missed by the sync.
|
||||
- `rclone sync --update --use-server-modtime /path/to/source s3:bucket`
|
||||
|
||||
These flags can and should be used in combination with `--fast-list` -
|
||||
see below.
|
||||
|
||||
If using `rclone mount` or any command using the VFS (eg `rclone
|
||||
serve`) commands then you might want to consider using the VFS flag
|
||||
`--no-modtime` which will stop rclone reading the modification time
|
||||
for every object. You could also use `--use-server-modtime` if you are
|
||||
happy with the modification times of the objects being the time of
|
||||
upload.
|
||||
|
||||
#### Avoiding GET requests to read directory listings
|
||||
|
||||
Rclone's default directory traversal is to process each directory
|
||||
individually. This takes one API call per directory. Using the
|
||||
`--fast-list` flag will read all info about the the objects into
|
||||
memory first using a smaller number of API calls (one per 1000
|
||||
objects). See the [rclone docs](/docs/#fast-list) for more details.
|
||||
|
||||
rclone sync --fast-list --checksum /path/to/source s3:bucket
|
||||
|
||||
`--fast-list` trades off API transactions for memory use. As a rough
|
||||
guide rclone uses 1k of memory per object stored, so using
|
||||
`--fast-list` on a sync of a million objects will use roughly 1 GB of
|
||||
RAM.
|
||||
|
||||
If you are only copying a small number of files into a big repository
|
||||
then using `--no-traverse` is a good idea. This finds objects directly
|
||||
instead of through directory listings. You can do a "top-up" sync very
|
||||
cheaply by using `--max-age` and `--no-traverse` to copy only recent
|
||||
files, eg
|
||||
|
||||
rclone copy --min-age 24h --no-traverse /path/to/source s3:bucket
|
||||
|
||||
You'd then do a full `rclone sync` less often.
|
||||
|
||||
Note that `--fast-list` isn't required in the top-up sync.
|
||||
|
||||
### Hashes ###
|
||||
|
||||
For small objects which weren't uploaded as multipart uploads (objects
|
||||
|
|
Loading…
Reference in a new issue