[#963] node: Go on initialization even deposit notary is hung #1014
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1014
Loading…
Reference in a new issue
No description provided.
Delete branch "aarifullin/frostfs-node:fix/963-control_svc_iface_down"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It may happen that boot-up waits for the execution of a notary deposit transaction and waiting loop may hang for an indefinite time. In this case, we need to let frostfs-node go on initialization, although its functionality will be available partially - the initial motivation was successfuly initialize control service to make control API available no matter which morph problems occur
Close #963
5d95e29bce
toaa8cf973e3
Hm, so now some tests may be tricked by the healthcheck: it will show ready when in reality nothing could be done with the node (even Object.GET requires getting container from the blockchain).
We need to see how the integration tests will react, have you tried running them locally?
Also, do we have Warn logs if deposit is not handled?
No, I didn't. But I will try - it is definitely inappropriate to run them against dev-env
So, in the case of error,
makeAndWaitNotaryDeposit
forcefully finishes the application fatalOnError, fatalOnErrorIt means, that if wait-loop finally is broken due to error, the partially-functioning application fatally exits.
It would be nice to have Warn logs while polling for result, because the node is not fully functional during this.
Alright 👍
Can we also ensure that we have
STARTING
in the healthcheck if any of this is false:aa8cf973e3
to5c16240335
I suppose this proposal works only for the second case.
The other workers like
do not "freeze" initialization of some components - so, they do not need to inidicate they are ready.
I have introduced switching healthcheck status after
notary deposit
job is done. But I have to check it with failover tests and I'll tell if they are not brokenUPD: ran failovers, the result looks fine
5c16240335
to78dd55088b
What about logs in
waitNotaryDeposit
?We can log success with INFO and subsequent attempts in loops with DEBUG
78dd55088b
to74328ff8ee
Alright. I have added helpful debug and info logs
@ -159,2 +159,4 @@
ClientNotaryRequestWithPreparedMainTXInvoked = "notary request with prepared main TX invoked"
ClientNotaryRequestInvoked = "notary request invoked"
ClientNotaryDepositTransactionWasSuccessfullyPersisted = "notary deposit transaction was successfully persisted"
ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted = "attempt to wait for notary deposit transaction gets persisted"
It is either
for notary deposit transaction to get persisted
oruntil notary deposit transaction is persisted
Fixed: I preferred the first one
74328ff8ee
toc86e6582fe
@ -158,6 +158,7 @@ var (
func waitNotaryDeposit(ctx context.Context, c *cfg, tx util.Uint256) error {
for i := 0; i < notaryDepositRetriesAmount; i++ {
c.log.Debug(logs.ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted)
ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted -> ClientAttemptToWaitForNotaryDepositTransactionToGetPersisted
cmd/frostfs-node/morph.go:161:20: undefined: logs.ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted
c86e6582fe
to921a43fc10