s3: Add MD5 metadata to objects uploaded with SSE-AWS/SSE-C

Before this change, small objects uploaded with SSE-AWS/SSE-C would
not have MD5 sums.

This change adds metadata for these objects in the same way that the
metadata is stored for multipart uploaded objects.

See: #1824 #2827
This commit is contained in:
Nick Craig-Wood 2020-11-23 11:53:31 +00:00
parent 4bb241c435
commit 76ee3060d1
2 changed files with 28 additions and 1 deletions

View file

@ -3118,6 +3118,7 @@ func (o *Object) Update(ctx context.Context, in io.Reader, src fs.ObjectInfo, op
// read the md5sum if available
// - for non multipart
// - so we can add a ContentMD5
// - so we can add the md5sum in the metadata as metaMD5Hash if using SSE/SSE-C
// - for multipart provided checksums aren't disabled
// - so we can add the md5sum in the metadata as metaMD5Hash
var md5sum string
@ -3127,7 +3128,11 @@ func (o *Object) Update(ctx context.Context, in io.Reader, src fs.ObjectInfo, op
hashBytes, err := hex.DecodeString(hash)
if err == nil {
md5sum = base64.StdEncoding.EncodeToString(hashBytes)
if multipart {
if (multipart || o.fs.etagIsNotMD5) && !o.fs.opt.DisableChecksum {
// Set the md5sum as metadata on the object if
// - a multipart upload
// - the Etag is not an MD5, eg when using SSE/SSE-C
// provided checksums aren't disabled
metadata[metaMD5Hash] = &md5sum
}
}

View file

@ -277,6 +277,28 @@ side copy to update the modification if the object can be copied in a single par
In the case the object is larger than 5Gb or is in Glacier or Glacier Deep Archive
storage the object will be uploaded rather than copied.
Note that reading this from the object takes an additional `HEAD`
request as the metadata isn't returned in object listings.
### Hashes ###
For small objects which weren't uploaded as multipart uploads (objects
sized below `--s3-upload-cutoff` if uploaded with rclone) rclone uses
the `ETag:` header as an MD5 checksum.
However for objects which were uploaded as multipart uploads or with
server side encryption (SSE-AWS or SSE-C) the `ETag` header is no
longer the MD5 sum of the data, so rclone adds an additional piece of
metadata `X-Amz-Meta-Md5chksum` which is a base64 encoded MD5 hash (in
the same format as is required for `Content-MD5`).
For large objects, calculating this hash can take some time so the
addition of this hash can be disabled with `--s3-disable-checksum`.
This will mean that these objects do not have an MD5 checksum.
Note that reading this from the object takes an additional `HEAD`
request as the metadata isn't returned in object listings.
### Cleanup ###
If you run `rclone cleanup s3:bucket` then it will remove all pending