[#205] Add md5 checksum in header #228
Labels
No labels
P0
P1
P2
P3
good first issue
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-s3-gw#228
Loading…
Reference in a new issue
No description provided.
Delete branch "mbiryukova/frostfs-s3-gw:feature/add_md5_checksum_in_header"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #205
Signed-off-by: Marina Biryukova m.biryukova@yadro.com
68df04ff1e
toea753fff1a
@ -95,2 +95,4 @@
}
if len(info.MD5Sum) > 0 {
h.Set(api.ContentMD5, "\""+info.MD5Sum+"\"")
If
info.MD5Sum
contains base64 encoded hash can we rename it toMD5SumBase64
? But I would like to store hex encoded values and encode/decode where it's needinfo.MD5Sum
is stored as hex encoded valueThen we have to encode it to base64 here (and probably without quotes)
I fixed this
@ -113,3 +117,3 @@
for key, val := range info.Headers {
if layer.IsSystemHeader(key) {
if layer.IsSystemHeader(key) || key == api.ContentMD5 {
Let's add
api.ContentMD5
toapi.SystemMetadata
map@ -245,2 +245,4 @@
Reader: body,
}
if md5Hash := r.Header.Get(api.ContentMD5); len(md5Hash) > 0 {
p.MD5 = md5Hash
It seems we can simplify this and write:
@ -120,0 +118,4 @@
Lock *data.ObjectLock
Encryption encryption.Params
CopiesNumbers []uint32
IsCompleteMultipart bool
I would rename this to something like
NeedComputeMD5
@ -294,0 +299,4 @@
if err != nil {
return nil, err
}
if hex.EncodeToString(md5HashBytes) != hex.EncodeToString(md5Hash) {
Actually we can compare raw bytes (to avoid extra conversion)
@ -305,2 +321,4 @@
IsCombined: p.Header[MultipartObjectSize] != "",
}
if n.features.MD5Enable() {
if p.IsCompleteMultipart && ok {
The
ok
variable was defined quite a long time ago. So it's better to use a more meaningful name@ -307,0 +326,4 @@
} else if p.IsCompleteMultipart && !ok {
return nil, fmt.Errorf("MD5 checksum of complete object wasn't provided")
} else {
newVersion.MD5 = hex.EncodeToString(md5Hash)
Why do we store md5 in different formats in this field (hex (
newVersion.MD5 = hex.EncodeToString(md5Hash)
) vs base64 (newVersion.MD5 = md5EncodedHash
) ?Format of header depends on
IsCompleteMultipart
, I'll change this to be more logical@ -465,3 +495,4 @@
prm.Payload = wrapReader(prm.Payload, 64*1024, func(buf []byte) {
size += uint64(len(buf))
hash.Write(buf)
md5Hash.Write(buf)
As I understand we don't have to compute md5 at all if this feature is disabled
It seems the fix for this comment somehow be missed in master
@mbiryukova @alexvanin
For v0.29.0 we do compute both hashes, unless it produce significant slowdown. As for future releases, we may drop SHA and use MD5 as new default for everything.
@ -177,0 +176,4 @@
soft_memory_limit: 1gb
md5:
enable: false
Maybe we can introduce new section
features
orhashes
instead of justmd5
?Also we should add description of this param to
docs/configuration.md
and toconfig/config.env
.Please run
pre-commit run --all
to see and fix some linter errors (e.g. no empty line at the end of the file)And put
md5
in new section?@ -992,6 +997,7 @@ func (c *Tree) AddPart(ctx context.Context, bktInfo *data.BucketInfo, multipartN
sizeKV: strconv.FormatUint(info.Size, 10),
createdKV: strconv.FormatInt(info.Created.UTC().UnixMilli(), 10),
etagKV: info.ETag,
md5KV: info.MD5,
Let's do not set
md5KV
key if value is empty.ea753fff1a
to768b0ccc64
768b0ccc64
toc302fb5226
c302fb5226
to230259be40
@ -563,0 +573,4 @@
| Parameter | Type | SIGHUP reload | Default value | Description |
|--------------|--------|---------------|---------------|----------------------------------------------|
| `md5.enable` | `bool` | yes | false | Enable calculating and storing MD5 checksum. |
Let's name this
enabled
(notenable
) to be more consistent with other config values (e.g.pprof.enabled
,nats.enabled
, etc)230259be40
to1d1fb10684
1d1fb10684
to1ed9df8fbb
1ed9df8fbb
to5264e2f7e3
5264e2f7e3
to9713b1cfec
@ -464,3 +510,4 @@
prm.WithoutHomomorphicHash = bktInfo.HomomorphicHashDisabled
var size uint64
hash := sha256.New()
md5Hash := md5.New()
If we returns md5 as
ETag
why do we need to compute sha256? It seems we can just calculate md5 and save it as ETag (I mean use the same field indata.ObjectInfo
, the same key in for tree node meta etc.)We decided not to replace ETag for now. @alexvanin
And using setting we can switch to returning sha256 in ETag
@ -47,2 +47,4 @@
CreationEpoch uint64
HashSum string
MD5Sum string
UseMD5 bool
It seems we can drop the
UseMD5
field and check forMD5Sum != ""
. Or encapsulate this check to methodIn current implementation
MD5Sum
will not be empty if value was stored for objectThen maybe we can write something like
and then instead of
aff675a8eb/api/handler/put.go (L331-L335)
we can write (we can extend interface with
MD5Enabled() bool
sinceappSettings
implements it anyway )@ -184,8 +184,6 @@ func TestGetObject(t *testing.T) {
bktName, objName := "bucket", "obj"
bktInfo, objInfo := createVersionedBucketAndObject(hc.t, hc, bktName, objName)
putObject(hc, bktName, objName)
Why do we drop this?
@ -141,6 +141,7 @@ func prepareHandlerContextBase(t *testing.T, minCache bool) *handlerContext {
}
features := &layer.FeatureSettingsMock{}
features.SetMD5Enable(false)
It isn't necessary. The
false
is default value@ -248,2 +248,4 @@
}
if md5Hash := r.Header.Get(api.ContentMD5); len(md5Hash) > 0 {
params.ContentMD5 = md5Hash
This can be simplified to
or even
@ -350,2 +372,4 @@
n.cache.PutObjectWithName(n.BearerOwner(ctx), extendedObjInfo)
if n.features.MD5Enable() {
objInfo.UseMD5 = true
Can be simplified
@ -358,0 +384,4 @@
if n.features.MD5Enable() {
extObjInfo.ObjectInfo.UseMD5 = true
} else {
extObjInfo.ObjectInfo.UseMD5 = false
Can be simplified
@ -387,2 +419,4 @@
n.cache.PutObjectWithName(owner, extObjInfo)
if n.features.MD5Enable() {
objInfo.UseMD5 = true
Can be simplified
@ -422,0 +458,4 @@
if n.features.MD5Enable() {
extObjInfo.ObjectInfo.UseMD5 = true
} else {
extObjInfo.ObjectInfo.UseMD5 = false
Can be simplified
@ -439,2 +481,4 @@
n.cache.PutObject(owner, extObjInfo)
if n.features.MD5Enable() {
objInfo.UseMD5 = true
Can be simplified
@ -800,0 +848,4 @@
if n.features.MD5Enable() {
extInfo.ObjectInfo.UseMD5 = true
} else {
extInfo.ObjectInfo.UseMD5 = false
Can be simplified (see previous comments)
@ -810,2 +864,4 @@
n.cache.PutObject(owner, &data.ExtendedObjectInfo{ObjectInfo: oi, NodeVersion: node})
if n.features.MD5Enable() {
oi.UseMD5 = true
Can be simplified (see previous comments)
@ -538,6 +541,8 @@ func newSettings() *viper.Viper {
v.SetDefault(cfgKludgeCompleteMultipartUploadKeepalive, 10*time.Second)
v.SetDefault(cfgKludgeBypassContentEncodingCheckInChunks, false)
v.SetDefault(cfgMD5Enable, false)
It isn't necessary. Default value be
false
anyway9713b1cfec
to3788863fd1
3788863fd1
toaff675a8eb
aff675a8eb
toc9109de09b
@ -260,3 +262,3 @@
CreateMultipartUpload(ctx context.Context, p *CreateMultipartParams) error
CompleteMultipartUpload(ctx context.Context, p *CompleteMultipartParams) (*UploadData, *data.ExtendedObjectInfo, error)
UploadPart(ctx context.Context, p *UploadPartParams) (string, error)
UploadPart(ctx context.Context, p *UploadPartParams, md5Enabled bool) (string, error)
It seems we can get md5Enabled from here
I moved
MD5Enabled() bool
to other interface as you suggested above. Leave it in both interfaces?I suppose yes. We have to know if md5 enabled in both places:
handler
andlayer
c9109de09b
to25bb581fee
LGTM