Make rpc client stream initialization get cancelled by dial timeout #304
Labels
No labels
P0
P1
P2
P3
good first issue
pool
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-sdk-go#304
Loading…
Reference in a new issue
No description provided.
Delete branch "aarifullin/frostfs-sdk-go:fix/grpc_client"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
c.conn
may be already invalidated but the rpc client can't detect this.NewStream
may hang trying to open a stream with invalidated connection. Using timer withdialTimeout
forNewStream
fixes this problem.Close #301
dialer
for open connections tooIt turns out that an additional RPC call will always be made for existing connections. Moreover, this will happen in 99% of cases. At first glance it looks very expensive.
e295f3afa5
to720dc2a1f1
I've introduced dial timeout for
NewStream
Callto Make rpc client stream initialization get cancelled by dial timeoutdialer
for open connections too@ -51,3 +51,3 @@
}
ctx, cancel := context.WithCancel(prm.ctx)
ctx, cancel := context.WithTimeout(prm.ctx, c.dialTimeout)
I believe, the gRPC client has one context for both dialing and streaming, and there's no separate dialing step in gRPC-Go v1.63+. Could you please double-check it?
You was right. I didn't have a clue that this value also cancels streaming. The approach has been changed, please, check this out
720dc2a1f1
to8f2c8007fd
I like it
If we incorporate this approach, we should consider whether we still need the Dial method. We currently use it to check if the target node is alive before making a request (example). As far as I know, it was another way to cope with the lack of distinction between the dialing and streaming steps
@ -59,0 +60,4 @@
// We use a separate timer instead of context timeout because the latter
// would propagate to all subsequent read/write operations on the opened stream,
// which is not desired for the stream's lifecycle management.
dialTimeoutTimer := time.NewTimer(c.dialTimeout)
Do we also need
dialTimeoutTimer.Stop()
call?Indeed! Fixed
@ -59,0 +61,4 @@
// would propagate to all subsequent read/write operations on the opened stream,
// which is not desired for the stream's lifecycle management.
dialTimeoutTimer := time.NewTimer(c.dialTimeout)
newStreamCh := make(chan interface{})
please, use
struct{}
if you do nothing with the value -- it better communicates the intent and could be optimized by compiler as zero-size type.Good to know! Fixed
8f2c8007fd
to2352156ee9
New commits pushed, approval review dismissed automatically according to repository settings
New commits pushed, approval review dismissed automatically according to repository settings
@ -59,0 +75,4 @@
close(newStreamCh)
}()
select {
case <-dialTimeoutTimer.C:
I think we still have a race-condition here and a possible goroutine leak.
stream, nil
is returned fromNewStream
It is true that
stream
from (2) should be closed?To prevent this we must wait
<-newStreamCh
and then check the stream.I have fixed something
newStreamCh
is closed, select goes further and the call returnserr
orstreamWrapper
cancel()
)This must work out
I suppose we don't need to close if it's successfully/unsuccessfully opened . The goal of this fix is to prevent hung open_ing_. So, if we have the result of
NewStream
, then we can return either opened stream or an errorIs there any way you can transform this supposition into knowledge? Please, do.
Check the changes out, please. Now it's closed if timer has expired
2352156ee9
toa5a399f748
@ -56,3 +55,1 @@
ServerStreams: info.ServerStream(),
ClientStreams: info.ClientStream(),
}, toMethodName(info))
defer cancel()
Should we save
cancel
as field ofsttreamWrapper
below if invoke it indefer
?Sure we don't. Fixed
a5a399f748
tocbc59f82fb
@ -59,0 +77,4 @@
select {
case <-dialTimeoutTimer.C:
select {
case <-newStreamCh:
You have just entered
<-dialTimeoutTimer.C
branch (newStreamCh
could be closed or not closed)goroutine leak can still happen in the
default
branch, because that's wherenewStreamCh
I mean the only question is whether we need to close the stream.
This inner select is useless.
Sorry. When I read your comment first time:
I mistook these points for "any of this" case. So, now I understand you mean what if timer has expired, but
NewStream
is fine.So, check the changes out: if stream has been opened but timer - expired, we close the stream
cbc59f82fb
toc3bb283cce
c3bb283cce
to7d53887170
7d53887170
toe08403faba
(*Client).Dial
seems to be redundant #308