Subscriber's Close() seems to work incorrectly #710
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#710
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Expected Behavior
Current Behavior
Consider the situation when a subscriber instance could not subscribe for notifications and it is going to invoke
Close()
after checking the error:Here is a chance that panic occurs earlier than
UnsubscribeAll()
:sub.Close()
->c.client.Close()
->close(c.closeChan)
whileIn this case neo-go will never get the unsubscribe request, but it cannot "pong" the client if it is off and unsubcribe it on its own after some time.
Also, we have a problem within
_ = c.UnsubscribeAll()
. Ifsub.BlockNotifications()
happens because of broken/unstable connection then subcriber can try resubcribe for events by swithing rpc - it leads to the point the subscriber uses new established websocket connection "underhood" andc.client.UnsubscribeAll()
may be sent to nowhere. That happens becausec.client.UnsubscribeAll()
does not check if the connection is reestablishing at this moment.Possible Solution
For
Close()
: introducecloseDone
channel and wait for its close (you can check #707).For
UnsubscribeAll
: add some loop likefor c.switchIsActive.Load() {}
before this invocationYour Environment
This can be reproduced in devenv - just need to make the such scenario
I have reconsidered the problem and made a such conslusion:
But neo-go really deals with that very well - it unsubscribes died subscribers. So, hanging subscriber exists for pong interval that lasts few seconds. If neo-go is full of event subscribers, the client can request for subscribption after some time (when neo-go will wipe out hanging ones)
I missed the fact that the method locks
switch
mutexSo, the problem is not actual