Control service doesn't start with only local interface is UP #963
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#963
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Expected Behavior
Control service should start with only local interface is UP. As example, for run local commands and show healthcheck.
Current Behavior
Control service doesn't start with only local interface is UP.
Steps to Reproduce (for bugs)
Node has mgmt, internal0/1, data0/1 interfaces.
Then binded on local interface
Regression
No
Version
Your Environment
Cloud
4 nodes
How do I reproduce the bug?
frostfs-storage
can't create API client: can't init SDK client: gRPC dial: context deadline exceeded
Potential problems that cause the bug
The control server create a tcp-socket
127.0.0.1:8091
but, actually, gRPC service does not bind with the socket yet, despite a storage node logscontrol service has been successfully initialized
. It makes it asynchronously.So, when it tries to make and wait for notary deposit, it hangs here. Other jobs cannot be completed because of this hang -> control's gRPC server does not bind with
127.0.0.1:8091
.I suppose the hang is happening, because local neo-go instance gets isolated from others after interfaces are shut down:
Wait
sees that like no blocks are produced.Possible solution
neo-go
should be also restarted to get it failed. The failure must cause frostfs-storage to switch RPC to available endpoint and that could fix the problem. But with shutdown data*, internal* interfaces switch is barely possible:Here a client is trying to iterate over other endpoints