WIP: Process EC container for Get/GetRange/Head concurrently #1237
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1237
Loading…
Reference in a new issue
No description provided.
Delete branch "dstepanov-yadro/frostfs-node:fix/ec_get_failover"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Now
get
requests for EC containers run concurrently with concurrency limit equal to EC data count.32afe410d3
toc9b1af0635
WIP: Process EC container for Get/GetRange/Head concurrentlyto Process EC container for Get/GetRange/Head concurrentlyc9b1af0635
to30c51fbd5d
31e94be0b6
toe33752e034
@ -94,0 +149,4 @@
}()
err := r.processECNodesRequests(ctx, nodes, dataCount)
cancel()
You are canceling context here and using it below for
r.writeCollectedObject(ctx)
, looks wrong.Right, fixed.
e33752e034
toc4d310d669
@ -94,0 +170,4 @@
defer cancel()
var wg sync.WaitGroup
wg.Add(1)
go func() {
[Optional]
What do you think: won't this approach cause goroutine spam if too many get-like requests are processed at the same time? Using
errgroup
would be safe I guessI didn't catch the thought.
errgroup
creates the same goroutines.Yes, it does but with the fixed number of workers.
Nevermind, sorry. I incorrecly read this part:
I've mistaken this goroutine generation within for-loop
@ -55,1 +65,4 @@
// In most cases, the value from the cache will be returned.
// Container appears in the cache when traverser generates.
cnr, err := r.containerSource.Get(r.address().Container())
Can we get it from the traverser directly?
Container is not traverser part, but container is requered to generate traverser.
Fixed: now traverser generator returns container.
@ -94,0 +129,4 @@
var errECInfo *objectSDK.ECInfoError
var errRemoved *apistatus.ObjectAlreadyRemoved
var errOutOfRange *apistatus.ObjectOutOfRange
var errSuccess *ecGetSuccessErr
What is the benefit of using a separate error type for successfully performed requests?
Go has multiple value returns can we use them?
To cancel
errgroup
's context and do not run other requests: ifeg.Go
's argument returns error, then errgroup cancels context. Of course it is possible to make the same thing without errgroup, but it will be not so simple.Also added
ctx.Done
check.It can be done with a special error, e.g.
errStopIteration
is what we use somewhere.But
ecGetSuccessErr
is special error used only to stop EC handling and pass result object.@ -94,0 +132,4 @@
var errSuccess *ecGetSuccessErr
switch {
case err == nil: // nil returns if all nodes failed or incomplete EC info received
s/returns/is returned/
done
@ -94,0 +209,4 @@
r.log.Debug(logs.GetCouldNotConstructRemoteNodeClient, zap.String("node_key", hex.EncodeToString(info.PublicKey())))
return err
}
obj, err := r.getRemote(egCtx, rs, info)
So for each node we call
getRemote
with the same parameters.Where is the place in code where we do combine chunks in the final object?
func (a *assemblerec) retrieveParts(mainCtx context.Context, headOnly bool) []*objectSDK.Object {
I have realised where my confusion comes from: we have 2 steps:
The "enough nodes" is defined on step 1, even though nodes may go down at step 2. In this case we will fail to fetch object?
The situation will happen during failover tests.
Right, if node fails between step 1 and step 2, then such request will fail.
@ -94,0 +230,4 @@
ecInfoGuard.Lock()
defer ecInfoGuard.Unlock()
r.infoEC = util.MergeECInfo(errECInfo.ECInfo(), r.infoEC)
if r.isRaw() {
Could you explain this line a bit? Why do we NOT return error if the number of chunks is not equal to the expected?
In case of
HEAD
request withraw
flag it is expected to get raw EC info error with all chunks.c4d310d669
tod731be8011
d731be8011
to7f7d04882a
@ -54,2 +64,4 @@
}
// In most cases, the value from the cache will be returned.
// Container appears in the cache when traverser generates.
Looks like a grammar mistake:
traverser generates
-->traverser is generated
.Fixed
99790c09a0
toe89f8bb6c7
e89f8bb6c7
toeebf7b4bfb
@ -65,19 +65,10 @@ func (r *request) processNode(ctx context.Context, info client.NodeInfo) bool {
mergeSplitInfo(r.splitInfo(), errSplitInfo.SplitInfo())
r.err = objectSDK.NewSplitInfoError(r.infoSplit)
case errors.As(err, &errECInfo):
Looks like part of the code in the
default
section is no more needed:done
eebf7b4bfb
to5d36e847b0
5d36e847b0
to906570789c
906570789c
to50b9dea0fb
Process EC container for Get/GetRange/Head concurrentlyto WIP: Process EC container for Get/GetRange/Head concurrentlyIncorrect implementation, close
Pull request closed