[#205] Add md5 checksum in header #228

Merged
alexvanin merged 1 commit from mbiryukova/frostfs-s3-gw:feature/add_md5_checksum_in_header into master 2023-10-25 14:50:49 +00:00
Member

Closes #205

Signed-off-by: Marina Biryukova m.biryukova@yadro.com

Closes #205 Signed-off-by: Marina Biryukova <m.biryukova@yadro.com>
mbiryukova self-assigned this 2023-10-02 09:13:08 +00:00
mbiryukova requested review from storage-services-committers 2023-10-02 09:20:03 +00:00
mbiryukova requested review from storage-services-developers 2023-10-02 09:20:05 +00:00
mbiryukova force-pushed feature/add_md5_checksum_in_header from 68df04ff1e to ea753fff1a 2023-10-02 09:41:54 +00:00 Compare
dkirillov reviewed 2023-10-04 09:12:08 +00:00
@ -95,2 +95,4 @@
}
if len(info.MD5Sum) > 0 {
h.Set(api.ContentMD5, "\""+info.MD5Sum+"\"")
Member

If info.MD5Sum contains base64 encoded hash can we rename it to MD5SumBase64? But I would like to store hex encoded values and encode/decode where it's need

If `info.MD5Sum` contains base64 encoded hash can we rename it to `MD5SumBase64`? But I would like to store hex encoded values and encode/decode where it's need
Author
Member

info.MD5Sum is stored as hex encoded value

`info.MD5Sum` is stored as hex encoded value
Member

Then we have to encode it to base64 here (and probably without quotes)

Then we have to encode it to base64 here (and probably without quotes)
Author
Member

I fixed this

I fixed this
dkirillov marked this conversation as resolved
@ -113,3 +117,3 @@
for key, val := range info.Headers {
if layer.IsSystemHeader(key) {
if layer.IsSystemHeader(key) || key == api.ContentMD5 {
Member

Let's add api.ContentMD5 to api.SystemMetadata map

Let's add `api.ContentMD5` to `api.SystemMetadata` map
dkirillov marked this conversation as resolved
@ -245,2 +245,4 @@
Reader: body,
}
if md5Hash := r.Header.Get(api.ContentMD5); len(md5Hash) > 0 {
p.MD5 = md5Hash
Member

It seems we can simplify this and write:

p := &layer.UploadPartParams{
    // ...
    MD5: r.Header.Get(api.ContentMD5)
}
It seems we can simplify this and write: ```golang p := &layer.UploadPartParams{ // ... MD5: r.Header.Get(api.ContentMD5) } ```
dkirillov marked this conversation as resolved
@ -120,0 +118,4 @@
Lock *data.ObjectLock
Encryption encryption.Params
CopiesNumbers []uint32
IsCompleteMultipart bool
Member

I would rename this to something like NeedComputeMD5

I would rename this to something like `NeedComputeMD5`
dkirillov marked this conversation as resolved
@ -294,0 +299,4 @@
if err != nil {
return nil, err
}
if hex.EncodeToString(md5HashBytes) != hex.EncodeToString(md5Hash) {
Member

Actually we can compare raw bytes (to avoid extra conversion)

Actually we can compare raw bytes (to avoid extra conversion)
dkirillov marked this conversation as resolved
@ -305,2 +321,4 @@
IsCombined: p.Header[MultipartObjectSize] != "",
}
if n.features.MD5Enable() {
if p.IsCompleteMultipart && ok {
Member

The ok variable was defined quite a long time ago. So it's better to use a more meaningful name

The `ok` variable was defined quite a long time ago. So it's better to use a more meaningful name
dkirillov marked this conversation as resolved
@ -307,0 +326,4 @@
} else if p.IsCompleteMultipart && !ok {
return nil, fmt.Errorf("MD5 checksum of complete object wasn't provided")
} else {
newVersion.MD5 = hex.EncodeToString(md5Hash)
Member

Why do we store md5 in different formats in this field (hex (newVersion.MD5 = hex.EncodeToString(md5Hash)) vs base64 (newVersion.MD5 = md5EncodedHash) ?

Why do we store md5 in different formats in this field (hex (`newVersion.MD5 = hex.EncodeToString(md5Hash)`) vs base64 (`newVersion.MD5 = md5EncodedHash`) ?
Author
Member

Format of header depends on IsCompleteMultipart, I'll change this to be more logical

Format of header depends on `IsCompleteMultipart`, I'll change this to be more logical
dkirillov marked this conversation as resolved
@ -465,3 +495,4 @@
prm.Payload = wrapReader(prm.Payload, 64*1024, func(buf []byte) {
size += uint64(len(buf))
hash.Write(buf)
md5Hash.Write(buf)
Member

As I understand we don't have to compute md5 at all if this feature is disabled

As I understand we don't have to compute md5 at all if this feature is disabled
Member

It seems the fix for this comment somehow be missed in master
@mbiryukova @alexvanin

It seems the fix for this comment somehow be missed in master @mbiryukova @alexvanin
Owner

For v0.29.0 we do compute both hashes, unless it produce significant slowdown. As for future releases, we may drop SHA and use MD5 as new default for everything.

For v0.29.0 we do compute both hashes, unless it produce significant slowdown. As for future releases, we may drop SHA and use MD5 as new default for everything.
@ -177,0 +176,4 @@
soft_memory_limit: 1gb
md5:
enable: false
Member

Maybe we can introduce new section features or hashes instead of just md5?

Also we should add description of this param to docs/configuration.md and to config/config.env.

Please run pre-commit run --all to see and fix some linter errors (e.g. no empty line at the end of the file)

Maybe we can introduce new section `features` or `hashes` instead of just `md5`? Also we should add description of this param to `docs/configuration.md` and to `config/config.env`. Please run `pre-commit run --all` to see and fix some linter errors (e.g. no empty line at the end of the file)
Author
Member

And put md5 in new section?

And put `md5` in new section?
dkirillov marked this conversation as resolved
@ -992,6 +997,7 @@ func (c *Tree) AddPart(ctx context.Context, bktInfo *data.BucketInfo, multipartN
sizeKV: strconv.FormatUint(info.Size, 10),
createdKV: strconv.FormatInt(info.Created.UTC().UnixMilli(), 10),
etagKV: info.ETag,
md5KV: info.MD5,
Member

Let's do not set md5KV key if value is empty.

Let's do not set `md5KV` key if value is empty.
dkirillov marked this conversation as resolved
mbiryukova force-pushed feature/add_md5_checksum_in_header from ea753fff1a to 768b0ccc64 2023-10-04 09:52:15 +00:00 Compare
ironbee approved these changes 2023-10-06 07:34:40 +00:00
mbiryukova force-pushed feature/add_md5_checksum_in_header from 768b0ccc64 to c302fb5226 2023-10-06 15:28:38 +00:00 Compare
mbiryukova force-pushed feature/add_md5_checksum_in_header from c302fb5226 to 230259be40 2023-10-06 15:29:36 +00:00 Compare
dkirillov approved these changes 2023-10-10 12:25:56 +00:00
@ -563,0 +573,4 @@
| Parameter | Type | SIGHUP reload | Default value | Description |
|--------------|--------|---------------|---------------|----------------------------------------------|
| `md5.enable` | `bool` | yes | false | Enable calculating and storing MD5 checksum. |
Member

Let's name this enabled (not enable) to be more consistent with other config values (e.g. pprof.enabled, nats.enabled, etc)

Let's name this `enabled` (not `enable`) to be more consistent with other config values (e.g. `pprof.enabled`, `nats.enabled`, etc)
dkirillov marked this conversation as resolved
mbiryukova force-pushed feature/add_md5_checksum_in_header from 230259be40 to 1d1fb10684 2023-10-20 14:35:04 +00:00 Compare
mbiryukova force-pushed feature/add_md5_checksum_in_header from 1d1fb10684 to 1ed9df8fbb 2023-10-20 14:59:51 +00:00 Compare
mbiryukova force-pushed feature/add_md5_checksum_in_header from 1ed9df8fbb to 5264e2f7e3 2023-10-20 15:06:44 +00:00 Compare
mbiryukova force-pushed feature/add_md5_checksum_in_header from 5264e2f7e3 to 9713b1cfec 2023-10-20 15:14:00 +00:00 Compare
dkirillov reviewed 2023-10-23 07:35:06 +00:00
@ -464,3 +510,4 @@
prm.WithoutHomomorphicHash = bktInfo.HomomorphicHashDisabled
var size uint64
hash := sha256.New()
md5Hash := md5.New()
Member

If we returns md5 as ETag why do we need to compute sha256? It seems we can just calculate md5 and save it as ETag (I mean use the same field in data.ObjectInfo, the same key in for tree node meta etc.)

If we returns md5 as `ETag` why do we need to compute sha256? It seems we can just calculate md5 and save it as ETag (I mean use the same field in `data.ObjectInfo`, the same key in for tree node meta etc.)
Author
Member

We decided not to replace ETag for now. @alexvanin
And using setting we can switch to returning sha256 in ETag

We decided not to replace ETag for now. @alexvanin And using setting we can switch to returning sha256 in ETag
dkirillov marked this conversation as resolved
dkirillov reviewed 2023-10-24 07:41:29 +00:00
api/data/info.go Outdated
@ -47,2 +47,4 @@
CreationEpoch uint64
HashSum string
MD5Sum string
UseMD5 bool
Member

It seems we can drop the UseMD5 field and check for MD5Sum != "". Or encapsulate this check to method

func (o ObjectInfo) UseMD5() bool {
    return o.MD5Sum != ""
}
It seems we can drop the `UseMD5` field and check for `MD5Sum != ""`. Or encapsulate this check to method ``` func (o ObjectInfo) UseMD5() bool { return o.MD5Sum != "" } ```
Author
Member

In current implementation MD5Sum will not be empty if value was stored for object

In current implementation `MD5Sum` will not be empty if value was stored for object
Member

Then maybe we can write something like

func (o ObjectInfo) ETag(md5Enabled bool) string {
	if md5Enabled && len(o.MD5Sum) > 0 {
		return o.MD5Sum
	}

	return o.HashSum
}

and then instead of aff675a8eb/api/handler/put.go (L331-L335)

we can write (we can extend interface with MD5Enabled() bool since appSettings implements it anyway )

w.Header().Set(api.ETag, objInfo.ETag(h.cfg.MD5Enable()))
Then maybe we can write something like ``` func (o ObjectInfo) ETag(md5Enabled bool) string { if md5Enabled && len(o.MD5Sum) > 0 { return o.MD5Sum } return o.HashSum } ``` and then instead of https://git.frostfs.info/mbiryukova/frostfs-s3-gw/src/commit/aff675a8eb80a15b634fc93d82b17d8a2372d620/api/handler/put.go#L331-L335 we can write (we can extend [interface](https://git.frostfs.info/mbiryukova/frostfs-s3-gw/src/commit/aff675a8eb80a15b634fc93d82b17d8a2372d620/api/handler/api.go#L33) with `MD5Enabled() bool` since `appSettings` implements it anyway ) ``` w.Header().Set(api.ETag, objInfo.ETag(h.cfg.MD5Enable())) ```
dkirillov marked this conversation as resolved
@ -184,8 +184,6 @@ func TestGetObject(t *testing.T) {
bktName, objName := "bucket", "obj"
bktInfo, objInfo := createVersionedBucketAndObject(hc.t, hc, bktName, objName)
putObject(hc, bktName, objName)
Member

Why do we drop this?

Why do we drop this?
dkirillov marked this conversation as resolved
@ -141,6 +141,7 @@ func prepareHandlerContextBase(t *testing.T, minCache bool) *handlerContext {
}
features := &layer.FeatureSettingsMock{}
features.SetMD5Enable(false)
Member

It isn't necessary. The false is default value

It isn't necessary. The `false` is default value
dkirillov marked this conversation as resolved
@ -248,2 +248,4 @@
}
if md5Hash := r.Header.Get(api.ContentMD5); len(md5Hash) > 0 {
params.ContentMD5 = md5Hash
Member

This can be simplified to

params.ContentMD5 = r.Header.Get(api.ContentMD5)

or even

params := &layer.PutObjectParams{
	// ...
	ContentMD5: r.Header.Get(api.ContentMD5),
}
This can be simplified to ``` params.ContentMD5 = r.Header.Get(api.ContentMD5) ``` or even ``` params := &layer.PutObjectParams{ // ... ContentMD5: r.Header.Get(api.ContentMD5), } ```
dkirillov marked this conversation as resolved
@ -350,2 +372,4 @@
n.cache.PutObjectWithName(n.BearerOwner(ctx), extendedObjInfo)
if n.features.MD5Enable() {
objInfo.UseMD5 = true
Member

Can be simplified

objInfo.UseMD5 = n.features.MD5Enable()
Can be simplified ``` objInfo.UseMD5 = n.features.MD5Enable() ```
dkirillov marked this conversation as resolved
@ -358,0 +384,4 @@
if n.features.MD5Enable() {
extObjInfo.ObjectInfo.UseMD5 = true
} else {
extObjInfo.ObjectInfo.UseMD5 = false
Member

Can be simplified

extObjInfo.ObjectInfo.UseMD5 = n.features.MD5Enable()
Can be simplified ``` extObjInfo.ObjectInfo.UseMD5 = n.features.MD5Enable() ```
dkirillov marked this conversation as resolved
@ -387,2 +419,4 @@
n.cache.PutObjectWithName(owner, extObjInfo)
if n.features.MD5Enable() {
objInfo.UseMD5 = true
Member

Can be simplified

objInfo.UseMD5 = n.features.MD5Enable()
Can be simplified ``` objInfo.UseMD5 = n.features.MD5Enable() ```
dkirillov marked this conversation as resolved
@ -422,0 +458,4 @@
if n.features.MD5Enable() {
extObjInfo.ObjectInfo.UseMD5 = true
} else {
extObjInfo.ObjectInfo.UseMD5 = false
Member

Can be simplified

extObjInfo.ObjectInfo.UseMD5 = n.features.MD5Enable()
Can be simplified ``` extObjInfo.ObjectInfo.UseMD5 = n.features.MD5Enable() ```
dkirillov marked this conversation as resolved
@ -439,2 +481,4 @@
n.cache.PutObject(owner, extObjInfo)
if n.features.MD5Enable() {
objInfo.UseMD5 = true
Member

Can be simplified

objInfo.UseMD5 = n.features.MD5Enable()
Can be simplified ``` objInfo.UseMD5 = n.features.MD5Enable() ```
dkirillov marked this conversation as resolved
@ -800,0 +848,4 @@
if n.features.MD5Enable() {
extInfo.ObjectInfo.UseMD5 = true
} else {
extInfo.ObjectInfo.UseMD5 = false
Member

Can be simplified (see previous comments)

Can be simplified (see previous comments)
dkirillov marked this conversation as resolved
@ -810,2 +864,4 @@
n.cache.PutObject(owner, &data.ExtendedObjectInfo{ObjectInfo: oi, NodeVersion: node})
if n.features.MD5Enable() {
oi.UseMD5 = true
Member

Can be simplified (see previous comments)

Can be simplified (see previous comments)
dkirillov marked this conversation as resolved
@ -538,6 +541,8 @@ func newSettings() *viper.Viper {
v.SetDefault(cfgKludgeCompleteMultipartUploadKeepalive, 10*time.Second)
v.SetDefault(cfgKludgeBypassContentEncodingCheckInChunks, false)
v.SetDefault(cfgMD5Enable, false)
Member

It isn't necessary. Default value be false anyway

It isn't necessary. Default value be `false` anyway
dkirillov marked this conversation as resolved
mbiryukova force-pushed feature/add_md5_checksum_in_header from 9713b1cfec to 3788863fd1 2023-10-24 10:27:04 +00:00 Compare
mbiryukova force-pushed feature/add_md5_checksum_in_header from 3788863fd1 to aff675a8eb 2023-10-24 10:44:53 +00:00 Compare
mbiryukova force-pushed feature/add_md5_checksum_in_header from aff675a8eb to c9109de09b 2023-10-24 14:34:46 +00:00 Compare
dkirillov reviewed 2023-10-24 14:49:27 +00:00
@ -260,3 +262,3 @@
CreateMultipartUpload(ctx context.Context, p *CreateMultipartParams) error
CompleteMultipartUpload(ctx context.Context, p *CompleteMultipartParams) (*UploadData, *data.ExtendedObjectInfo, error)
UploadPart(ctx context.Context, p *UploadPartParams) (string, error)
UploadPart(ctx context.Context, p *UploadPartParams, md5Enabled bool) (string, error)
Member

It seems we can get md5Enabled from here

diff --git a/api/layer/layer.go b/api/layer/layer.go
index 9e4832ec..ecbf9632 100644
--- a/api/layer/layer.go
+++ b/api/layer/layer.go
@@ -50,6 +50,7 @@ type (
        FeatureSettings interface {
                ClientCut() bool
                BufferMaxSizeForPut() uint64
+               MD5Enabled() bool
        }
 
        layer struct {
diff --git a/api/layer/multipart_upload.go b/api/layer/multipart_upload.go
index 0c83174c..5ea515d7 100644
--- a/api/layer/multipart_upload.go
+++ b/api/layer/multipart_upload.go
@@ -200,7 +200,7 @@ func (n *layer) UploadPart(ctx context.Context, p *UploadPartParams, md5Enabled
                return "", err
        }
 
-       return objInfo.ETag(md5Enabled), nil
+       return objInfo.ETag(n.features.MD5Enabled()), nil
 }
 
 func (n *layer) uploadPart(ctx context.Context, multipartInfo *data.MultipartInfo, p *UploadPartParams) (*data.ObjectInfo, error) {

It seems we can get md5Enabled from here ```diff diff --git a/api/layer/layer.go b/api/layer/layer.go index 9e4832ec..ecbf9632 100644 --- a/api/layer/layer.go +++ b/api/layer/layer.go @@ -50,6 +50,7 @@ type ( FeatureSettings interface { ClientCut() bool BufferMaxSizeForPut() uint64 + MD5Enabled() bool } layer struct { diff --git a/api/layer/multipart_upload.go b/api/layer/multipart_upload.go index 0c83174c..5ea515d7 100644 --- a/api/layer/multipart_upload.go +++ b/api/layer/multipart_upload.go @@ -200,7 +200,7 @@ func (n *layer) UploadPart(ctx context.Context, p *UploadPartParams, md5Enabled return "", err } - return objInfo.ETag(md5Enabled), nil + return objInfo.ETag(n.features.MD5Enabled()), nil } func (n *layer) uploadPart(ctx context.Context, multipartInfo *data.MultipartInfo, p *UploadPartParams) (*data.ObjectInfo, error) { ```
Author
Member

I moved MD5Enabled() bool to other interface as you suggested above. Leave it in both interfaces?

I moved `MD5Enabled() bool` to other interface as you suggested above. Leave it in both interfaces?
Member

I suppose yes. We have to know if md5 enabled in both places: handler and layer

I suppose yes. We have to know if md5 enabled in both places: `handler` and `layer`
dkirillov marked this conversation as resolved
mbiryukova force-pushed feature/add_md5_checksum_in_header from c9109de09b to 25bb581fee 2023-10-25 08:08:36 +00:00 Compare
dkirillov approved these changes 2023-10-25 13:50:14 +00:00
dkirillov left a comment
Member

LGTM

LGTM
alexvanin approved these changes 2023-10-25 14:24:46 +00:00
alexvanin merged commit 25bb581fee into master 2023-10-25 14:50:49 +00:00
alexvanin deleted branch feature/add_md5_checksum_in_header 2023-10-25 14:50:50 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-s3-gw#228
No description provided.