[#469] List multipart uploads streaming #527
No reviewers
TrueCloudLab/storage-services-committers
TrueCloudLab/storage-services-developers
Labels
No labels
P0
P1
P2
P3
good first issue
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-s3-gw#527
Loading…
Reference in a new issue
No description provided.
Delete branch "nzinkevich/frostfs-s3-gw:multiparts_list_streaming"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Signed-off-by: Nikita Zinkevich n.zinkevich@yadro.com
7a5333fe35
to1cb1299bb1
WIP: [#469] List multipart uploads streamingto [#469] List multipart uploads streaming1cb1299bb1
to6970f52d2d
6970f52d2d
toc726e2f058
c726e2f058
toe673d727e9
@ -20,0 +23,4 @@
}
type MultipartInfoStream interface {
Next(marker, uploadID string) (*MultipartInfo, error)
Why do we pass
marker
anduploadID
?@ -515,0 +516,4 @@
isTruncated := true
var last data.MultipartInfo
var info *data.MultipartInfo
for uploadsCount <= p.MaxUploads {
Why don't just
for uploadsCount < p.MaxUploads {
?I need to check next element to determine Truncated state. Extracted this check outside the loop
@ -515,0 +531,4 @@
last = *info
}
upload := uploadInfoFromMultipartInfo(info, p.Prefix, p.Delimiter)
if upload != nil {
It seems now we can skip checking prefix inside
uploadInfoFromMultipart
and result always be non nilThis isn't changed
Without checking there is a fail in a test - a call with prefix "/my" returns also item with "/zzz" prefix. Because GetSubTreeStream trims prefix to "/"
Then we should fix streaming in tree, because we it must return only nodes that have provided prefix. (
/my
in this case, and object/zzz/object/name3
doesn't have such prefix )@ -556,0 +574,4 @@
owner := n.BearerOwner(ctx)
cacheKey := cache.CreateListMultipartSessionCacheKey(p.Bkt.CID, p.Prefix, p.KeyMarker, p.UploadIDMarker)
session = n.cache.GetListMultipartSession(owner, cacheKey)
if session == nil {
Why do this if differ from the similar from object listing?
if session == nil || session.Acquired.Swap(true) {
Changed method to be more similar
@ -556,0 +575,4 @@
cacheKey := cache.CreateListMultipartSessionCacheKey(p.Bkt.CID, p.Prefix, p.KeyMarker, p.UploadIDMarker)
session = n.cache.GetListMultipartSession(owner, cacheKey)
if session == nil {
ctx, cancel := context.WithCancel(ctx)
We cannot use
ctx
from current request. It will be canceled after first request be finished@ -44,3 +44,3 @@
GetVersions(ctx context.Context, bktInfo *data.BucketInfo, objectName string) ([]*data.NodeVersion, error)
GetLatestVersion(ctx context.Context, bktInfo *data.BucketInfo, objectName string) (*data.NodeVersion, error)
InitVersionsByPrefixStream(ctx context.Context, bktInfo *data.BucketInfo, prefix string, latestOnly bool) (data.VersionsStream, error)
InitVersionsByPrefixStream(ctx context.Context, bktInfo *data.BucketInfo, prefix, treeID string, latestOnly bool) (data.VersionsStream, error)
Why?
@ -900,28 +900,10 @@ func (s *DummySubTreeStream) Next() (NodeResponse, error) {
return s.data, nil
}
type MultiID []uint64
Why did we move this to
data
package?@ -1084,0 +1084,4 @@
}, nil
}
func (c *Tree) InitMultipartsByPrefixStream(ctx context.Context, bktInfo *data.BucketInfo, prefix, treeID string, latestOnly bool) (data.VersionsStream, error) {
This method isn't used
Please, rebase branch
e673d727e9
to9f4840528f
9f4840528f
toc6dabf62bf
c6dabf62bf
toaaca4f84b8
@ -556,0 +577,4 @@
cacheKey := cache.CreateListMultipartSessionCacheKey(p.Bkt.CID, p.Prefix, p.KeyMarker, p.UploadIDMarker)
session = n.cache.GetListMultipartSession(owner, cacheKey)
if session == nil || session.Acquired.Swap(true) {
ctx, cancel := context.WithCancel(context.Background())
We also have to add AccessBox to this context to be able to get access to tree service. Otherwise currently we get
See how this done for regular listing
if bd, err := middleware.GetBoxData(ctx); err == nil {
Fixed
@ -74,3 +74,3 @@
}
commonVersionsListingParams struct {
commonListingParams struct {
Why do we change this?
It seems we don't use it for multipart listing
@ -80,3 +80,3 @@
MaxKeys int
Marker string
Bookmark string
// key to store session in cache
Not only. It also contains
Marker
for listing v1 andContinuationToken
for listing v2@ -0,0 +31,4 @@
// DefaultListMultipartSessionCacheLifetime is a default lifetime of entries in cache of ListMultipartUploads.
DefaultListMultipartSessionCacheLifetime = time.Second * 60
// DefaultListMultipartSessionCacheSize is a default size of cache of ListMultipartUploads.
DefaultListMultipartSessionCacheSize = 100
If we add new cache we also should be able configure this
Added config params
Please mention these params in https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/src/branch/master/config/config.yaml and https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/src/branch/master/config/config.env
Also we can write
time.Minute
instead oftime.Second * 60
@ -540,3 +553,1 @@
uploads = uploads[:p.MaxUploads]
result.NextUploadIDMarker = uploads[len(uploads)-1].UploadID
result.NextKeyMarker = uploads[len(uploads)-1].Key
result.NextUploadIDMarker = info.UploadID
It seems here we must use
next
rather thaninfo
.Please add tests for that case and others
In previous implementation
NextKeyMarker
andNextUploadIDMarker
was brought from the last element of current list. And the new one acts the same. So usinginfo
in this case is correct. Anyway, I'm going to write tests for thisWell, maybe I point to different error. Output sometime doesn't contain NextUploadIDMarker
Fixed and added test for this scenario
@ -515,0 +521,4 @@
break
}
n.reqLogger(ctx).Warn(logs.CouldNotGetMultipartUploadInfo, zap.Error(err))
continue
Probably we should return error rather than continue. And consider cancel context (also for regular listing) @alexvanin
@ -553,6 +567,40 @@ func (n *Layer) ListMultipartUploads(ctx context.Context, p *ListMultipartUpload
return &result, nil
Result must be sorted by upload-id if some uploads have the same key
@ -1340,0 +1344,4 @@
}
func (c *Tree) GetMultipartUploadsByPrefix(ctx context.Context, bktInfo *data.BucketInfo, params data.MultipartStreamParams) (data.MultipartInfoStream, error) {
stream, err := c.getSubTreeByPrefixStream(ctx, bktInfo, systemTree, params.Prefix)
Consider using approach like here
subTree, err := c.service.GetSubTreeStream(ctx, bktInfo, treeID, rootID, 2)
Otherwise if we will have objects:
dir/objdir1/obj
dir/objdir1/obj2
dir/obj1/obj1
dir/obj1/obj1000000
and request will contain
prefix: dir/objdir
we will list that greater than 1000000 objects and just filter them (but we will get them anyway from storage) despite we are interested in only 2 of themChanged the implementation. Now, at first,
c.service.GetSubTreeStream(ctx, bktInfo, treeID, rootID, 2)
is called, and then theGetSubTreeStream
with max depth is applied to the children as needed. Which should prevents from getting redundant nodes@ -1253,6 +1211,26 @@ func formFilePath(node NodeResponse, fileName string, namesMap map[uint64]string
return filepath, nil
}
func formFilePathV2(node treeNode, filename string, namesMap map[uint64]*treeNode) (string, error) {
It's better to use more meaningful name.
By the way why do we need separate function? And why do we use only 0th parentID below (
namesMap[curNode.ParentID[0]]
)?It seems to me that multipart info can't be split so I use only first element
https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/src/branch/master/pkg/service/tree/tree.go#L332-L334
@ -1329,0 +1313,4 @@
continue
}
m.nodeNames[tNode.ID[0]] = tNode
Why do we use only
tNode.ID[0]
?aaca4f84b8
tofe9d7fbe7d
fe9d7fbe7d
to844b4d92be
844b4d92be
to4a32fde2bf
@ -531,2 +542,2 @@
if p.UploadIDMarker != "" {
uploads = trimAfterUploadIDAndKey(p.KeyMarker, p.UploadIDMarker, uploads)
isTruncated := true
next, err := session.Stream.Next(ctx)
I'm not sure if it's ok to separate this invocation. At least as it's done now.
See test:
Modified a bit this test (because there are two objects in
/my/
folder so this common prefix will appear twice and only on the third call it will be/zzz/
. And fixed this scenario. But I'm not quite sure whether I understand the problem withseparate invocations
you proposed, could you clarify a bit?@ -1340,0 +1316,4 @@
if filePath, err = formFilePath(tNode, fileName, m.nodePaths); err != nil {
if errors.Is(err, errParentPathNotFound) {
filePath = fileName
m.nodePaths[tNode.ID[0]] = filePath
I suppose we should check if
tNode.IsSplit()
and skip if necessary. And only after that usetNode.ID[0]
@ -1340,3 +1341,4 @@
if err != nil {
return nil, err
}
if node.ID[0] == m.rootID[0] {
Please, use the similar check
if !s.rootID.Equal(node.GetNodeID()) && strings.HasPrefix(getFilename(node), s.tailPrefix) {
@ -1344,3 +1355,1 @@
var parentPrefix string
if parentFilePath != "" { // The root of subTree can also have a parent
parentPrefix = strings.TrimSuffix(parentFilePath, separator) + separator // To avoid 'foo//bar'
func (m *multipartInfoStream) getTreeNodeFromStream(stream SubTreeStream) (*treeNode, error) {
This function should be
func getTreeNodeFromStream(stream SubTreeStream) (*treeNode, error)
or
func (m *multipartInfoStream) getTreeNodeFromStream() (*treeNode, error)
@ -515,0 +520,4 @@
for uploadsCount < p.MaxUploads {
info, err = session.Stream.Next(ctx)
if err != nil {
if err == io.EOF {
I would use
errors.Is(err, io.EOF)
. Inservice/tree
also@ -1340,0 +1300,4 @@
tNode, err = m.getTreeNodeFromStream(m.currentStream)
if err != nil {
if err == io.EOF {
var err2 error
Do we really need new variable
err2
?It seems we can write
because of
continue
below@ -1340,0 +1309,4 @@
return nil, err
}
var ok bool
fileName, ok = tNode.FileName()
Why not just
fileName, ok := tNode.FileName()
?@ -1340,0 +1314,4 @@
continue
}
if filePath, err = formFilePath(tNode, fileName, m.nodePaths); err != nil {
if errors.Is(err, errParentPathNotFound) {
Why do we treat this error special? It seems normally we traverse node in right order and if we don't see parent it's a bug in storage node. Root node filepath we can set initially to this map
@ -1340,0 +1326,4 @@
continue
}
if id, ok := tNode.Meta[uploadIDKV]; ok {
if m.keyMarker == "" || filePath > m.keyMarker || (filePath == m.keyMarker && m.uploadID != "" && id > m.uploadID) {
It seems we can simplify this to
4a32fde2bf
to5586b8fb96
5586b8fb96
tocea8d4b324
cea8d4b324
to88c4c117e5
88c4c117e5
to0f5a2e0a15
We have some problem with streaming #561
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.