[#63] poc: Fast multipart upload #157

Merged
alexvanin merged 1 commit from dkirillov/frostfs-s3-gw:feature/63-fast_multipart_upload into master 2023-07-26 21:08:01 +00:00
Member

Please note:

  • Hash of result object doesn't match with real hash which user may expects (though etag for multipart upload is not real hash of payload)

  • When we copy object that was created by multipart upload and its overall size is less than 5 GB (if more return error since in such case user must do copying by multipart upload, see note in CopyObject), the new copied object has real payload rather than list of parts.

  • Take a look at test. If copying was performed with replace metadata directive, resulting object lose info about its parts and we will not see this info in ObjectAttribute response. Actually, I'm not sure if it's ok. I'm ok with always preserving this info (especially taking into account the fact that for real big object, we need do copying by multipart upload, so in result object we will have some parts info)

  • There are not special locking procedure for combined object. Probably it's ok because we don't fully support locks and if we lock the combined object without its part the use of S3 protocol will not be able to bypass this lock (he has access only for combined object and not for its parts). But if something will happen on storage level and parts are gone (e.g. because of expiration or manual removing via grpc, etc) it will be really sad.

Signed-off-by: Denis Kirillov d.kirillov@yadro.com

**Please note:** * Hash of result object doesn't match with real hash which user may expects (though etag for multipart upload [is not real hash of payload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html)) * When we copy object that was created by multipart upload and its overall size is less than 5 GB (if more return error since in such case user must do copying by multipart upload, see [note in CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)), the new copied object has real payload rather than list of parts. * Take a look at [test](https://git.frostfs.info/dkirillov/frostfs-s3-gw/src/commit/167c610340aa163e9acbd19101d9e1636881f484/api/handler/copy_test.go#L67). If copying was performed with replace metadata directive, resulting object lose info about its parts and we will not see this info in [ObjectAttribute](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html) response. Actually, I'm not sure if it's ok. I'm ok with always preserving this info (especially taking into account the fact that for real big object, we need do copying by multipart upload, so in result object we will have some parts info) * There are not special locking procedure for combined object. Probably it's ok because we don't fully support locks and if we lock the combined object without its part the use of S3 protocol will not be able to bypass this lock (he has access only for combined object and not for its parts). But if something will happen on storage level and parts are gone (e.g. because of expiration or manual removing via grpc, etc) it will be really sad. Signed-off-by: Denis Kirillov <d.kirillov@yadro.com>
dkirillov self-assigned this 2023-06-29 14:02:56 +00:00
Author
Member

Multipart upload comparison

part size: 5mb
max object size: 64mb

local:

size READ old READ new Write old WRITE new
8mb 117 MB/s 96 MB/s 9.3 MB/s 18 MB/s
64mb 162 MB/s 113 MB/s 10 MB/s 21 MB/s
256mb 161 MB/s 121 MB/s 8.5 MB/s 22 MB/s
512mb 126 MB/s 111 MB/s 8.6 MB/s 16 MB/s

bare metal:

size READ old READ new Write old WRITE new
8mb 64 MB/s 42 MB/s 6.3 MB/s 13 MB/s
64mb 93 MB/s 56 MB/s 7.2 MB/s 16 MB/s
256mb 76 MB/s 59 MB/s 6.3 MB/s 17 MB/s
512mb 74 MB/s 59 MB/s 6.4 MB/s 17 MB/s

part size: 16mb
max object size: 64mb

local:

size READ old READ new Write old WRITE new
8mb 127 MB/s 117 MB/s 10 MB/s 20 MB/s
64mb 156 MB/s 146 MB/s 12 MB/s 24 MB/s
256mb 156 MB/s 145 MB/s 8 MB/s 21 MB/s
512mb 163 MB/s 146 MB/s 7.9 MB/s 25 MB/s

bare metal:

size READ old READ new Write old WRITE new
8mb 62 MB/s 64 MB/s 8.2 MB/s 15 MB/s
64mb 89 MB/s 83 MB/s 9.7 MB/s 20 MB/s
256mb 81 MB/s 85 MB/s 9.5 MB/s 20 MB/s
512mb 81 MB/s 83 MB/s 9.5 MB/s 20 MB/s

part size: 64mb
max object size: 64mb

local:

size READ old READ new Write old WRITE new
8mb 124 MB/s 111 MB/s 9.9 MB/s 19 MB/s
64mb 181 MB/s 151 MB/s 13 MB/s 23 MB/s
256mb 183 MB/s 145 MB/s 8 MB/s 26 MB/s
512mb 177 MB/s 150 MB/s 7.9 MB/s 25 MB/s

bare metal:

size READ old READ new Write old WRITE new
8mb 64 MB/s 62 MB/s 7.5 MB/s 14 MB/s
64mb 92 MB/s 95 MB/s 9.7 MB/s 20 MB/s
256mb 94 MB/s 97 MB/s 8.3 MB/s 20 MB/s
512mb 83 MB/s 95 MB/s 9.3 MB/s 21 MB/s
Multipart upload comparison part size: 5mb max object size: 64mb local: | size | READ old | READ new | Write old | WRITE new | | ------------ | -------- | -------- | --------- | --------- | | **8mb** | 117 MB/s | 96 MB/s | 9.3 MB/s | 18 MB/s | | **64mb** | 162 MB/s | 113 MB/s | 10 MB/s | 21 MB/s | | **256mb** | 161 MB/s | 121 MB/s | 8.5 MB/s | 22 MB/s | | **512mb** | 126 MB/s | 111 MB/s | 8.6 MB/s | 16 MB/s | bare metal: | size | READ old | READ new | Write old | WRITE new | | -----------| ------- | ------- | ------- | ------- | | **8mb** | 64 MB/s | 42 MB/s | 6.3 MB/s | 13 MB/s | | **64mb** | 93 MB/s | 56 MB/s | 7.2 MB/s | 16 MB/s | | **256mb** | 76 MB/s | 59 MB/s | 6.3 MB/s | 17 MB/s | | **512mb** | 74 MB/s | 59 MB/s | 6.4 MB/s | 17 MB/s | part size: 16mb max object size: 64mb local: | size | READ old | READ new | Write old | WRITE new | | ------------ | -------- | -------- | --------- | --------- | | **8mb** | 127 MB/s | 117 MB/s | 10 MB/s | 20 MB/s | | **64mb** | 156 MB/s | 146 MB/s | 12 MB/s | 24 MB/s | | **256mb** | 156 MB/s | 145 MB/s | 8 MB/s | 21 MB/s | | **512mb** | 163 MB/s | 146 MB/s | 7.9 MB/s | 25 MB/s | bare metal: | size | READ old | READ new | Write old | WRITE new | | ------------ | -------- | -------- | --------- | --------- | | **8mb** | 62 MB/s | 64 MB/s | 8.2 MB/s | 15 MB/s | | **64mb** | 89 MB/s | 83 MB/s | 9.7 MB/s | 20 MB/s | | **256mb** | 81 MB/s | 85 MB/s | 9.5 MB/s | 20 MB/s | | **512mb** | 81 MB/s | 83 MB/s | 9.5 MB/s | 20 MB/s | part size: 64mb max object size: 64mb local: | size | READ old | READ new | Write old | WRITE new | | ------------ | -------- | -------- | --------- | --------- | | **8mb** | 124 MB/s | 111 MB/s | 9.9 MB/s | 19 MB/s | | **64mb** | 181 MB/s | 151 MB/s | 13 MB/s | 23 MB/s | | **256mb** | 183 MB/s | 145 MB/s | 8 MB/s | 26 MB/s | | **512mb** | 177 MB/s | 150 MB/s | 7.9 MB/s | 25 MB/s | bare metal: | size | READ old | READ new | Write old | WRITE new | | ------------ | -------- | -------- | --------- | --------- | | **8mb** | 64 MB/s | 62 MB/s | 7.5 MB/s | 14 MB/s | | **64mb** | 92 MB/s | 95 MB/s | 9.7 MB/s | 20 MB/s | | **256mb** | 94 MB/s | 97 MB/s | 8.3 MB/s | 20 MB/s | | **512mb** | 83 MB/s | 95 MB/s | 9.3 MB/s | 21 MB/s |
Owner

@dkirillov thanks for the comparison! A couple more questions:

  • How many of parts have you used? I suppose, the more parts you have, the faster new Write is going to be.
  • Is this read comparison between two multipart upload objects? Is there any significant downgrade in read operations for regular upload objects?
@dkirillov thanks for the comparison! A couple more questions: - How many of parts have you used? I suppose, the more parts you have, the faster new `Write` is going to be. - Is this read comparison between two multipart upload objects? Is there any significant downgrade in read operations for regular upload objects?
Author
Member
  • How many of parts have you used?

Every part has size 5mb (except the last)

Is this read comparison between two multipart upload objects? Is there any significant downgrade in read operations for regular upload objects?

Yes, it's read comparison between two multipart upload objects.
There is not any significant downgrade in read operations for regular upload objects because we check if object should be treated specially by its header (that we always handle)

> - How many of parts have you used? Every part has size 5mb (except the last) > Is this read comparison between two multipart upload objects? Is there any significant downgrade in read operations for regular upload objects? Yes, it's read comparison between two multipart upload objects. There is not any significant downgrade in read operations for regular upload objects because we check if object should be treated specially by its header (that we always handle)
dkirillov force-pushed feature/63-fast_multipart_upload from f4aaa4ec81 to f481f9ad43 2023-07-12 07:42:34 +00:00 Compare
dkirillov force-pushed feature/63-fast_multipart_upload from f481f9ad43 to 167c610340 2023-07-12 09:12:10 +00:00 Compare
dkirillov changed title from WIP: [#63] poc: Fast multipart upload to [#63] poc: Fast multipart upload 2023-07-12 09:26:12 +00:00
dkirillov requested review from storage-services-developers 2023-07-12 09:26:21 +00:00
dkirillov requested review from storage-services-committers 2023-07-12 09:26:21 +00:00
dkirillov force-pushed feature/63-fast_multipart_upload from 167c610340 to 0fa9fa657c 2023-07-12 09:36:05 +00:00 Compare
alexvanin added this to the v0.28.0 milestone 2023-07-12 09:44:21 +00:00
dkirillov changed title from [#63] poc: Fast multipart upload to WIP: [#63] poc: Fast multipart upload 2023-07-17 14:22:34 +00:00
dkirillov changed title from WIP: [#63] poc: Fast multipart upload to [#63] poc: Fast multipart upload 2023-07-18 09:12:19 +00:00
alexvanin reviewed 2023-07-18 11:13:21 +00:00
@ -0,0 +20,4 @@
layer *layer
off, ln uint64
Owner

Some comments will be appreciated. I guess it is the offset of complete object and total size of complete object.

Some comments will be appreciated. I guess it is the offset of _complete_ object and total size of _complete_ object.
@ -0,0 +44,4 @@
for x.off != 0 {
if x.parts[0].Size < x.off {
x.parts = x.parts[1:]
x.off -= x.parts[0].Size
Owner

What if len(x.parts) == 1 and after x.parts[1:] the slice is empty,and panic happens on x.parts[0].Size. Is it valid?

Also, is it possible to go negative in x.off here?

What if `len(x.parts) == 1` and after `x.parts[1:]` the slice is empty,and panic happens on `x.parts[0].Size`. Is it valid? Also, is it possible to go negative in `x.off` here?
Author
Member

Is it valid?

In current implementation we have check range in handler, and init reading from frostfs. So we cannot get such invalid value here.
But yes, theoretically panic can happen. I'll add test for that

Also, is it possible to go negative in x.off here?

No, we have check if x.parts[0].Size < x.off above

> Is it valid? In current implementation we have check range in handler, and init reading from frostfs. So we cannot get such invalid value here. But yes, theoretically panic can happen. I'll add test for that > Also, is it possible to go negative in x.off here? No, we have check `if x.parts[0].Size < x.off` above
@ -0,0 +74,4 @@
x.parts = x.parts[1:]
next, err := x.Read(p[n:])
Owner

Should it be recursive and not x.curReader.Read? What is the max recursion deepness here? As far as I see the max deepness is one, because x.curReader is set above and we expect early return from the recursive function.

Should it be recursive and not `x.curReader.Read`? What is the max recursion deepness here? As far as I see the max deepness is one, because `x.curReader` is set above and we expect early return from the recursive function.
Author
Member

Should it be recursive and not x.curReader.Read?

It seems it should be. We have to handle the case when we ended read one part and must start another.

What is the max recursion deepness here?

The max deepness is number of parts (if p is large enough to contain all parts payload at once)

> Should it be recursive and not `x.curReader.Read`? It seems it should be. We have to handle the case when we ended read one part and must start another. > What is the max recursion deepness here? The max deepness is number of parts (if `p` is large enough to contain all parts payload at once)
dkirillov force-pushed feature/63-fast_multipart_upload from 0027a94878 to b0214811e0 2023-07-18 15:43:43 +00:00 Compare
dkirillov force-pushed feature/63-fast_multipart_upload from b0214811e0 to 68ec470c18 2023-07-19 06:22:07 +00:00 Compare
dkirillov force-pushed feature/63-fast_multipart_upload from 68ec470c18 to 438c35bc9f 2023-07-19 13:17:33 +00:00 Compare
dkirillov force-pushed feature/63-fast_multipart_upload from 438c35bc9f to 1a09041cd1 2023-07-20 12:59:32 +00:00 Compare
Author
Member

The multipart ceph tests results are the same with this PR

The [multipart ceph tests results](https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/src/commit/361d10cc786249fbb084a6961a764680c3218d59/docs/s3_test_results.md#multipart) are the same with this PR
alexvanin approved these changes 2023-07-20 13:35:43 +00:00
Author
Member

testcases also passed

image

testcases also passed ![image](/attachments/ae53846a-f3a7-4efb-88be-0e9349a28066)
alexvanin merged commit 1a09041cd1 into master 2023-07-20 14:50:55 +00:00
alexvanin deleted branch feature/63-fast_multipart_upload 2023-07-20 14:50:56 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-services-developers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-s3-gw#157
No description provided.