[#963] node: Go on initialization even deposit notary is hung #1014

Merged
fyrchik merged 1 commits from aarifullin/frostfs-node:fix/963-control_svc_iface_down into master 2024-03-12 10:18:18 +00:00
Collaborator
  • Make makeAndWaitNotaryDeposit run asynchronously as worker during application boot-up.

It may happen that boot-up waits for the execution of a notary deposit transaction and waiting loop may hang for an indefinite time. In this case, we need to let frostfs-node go on initialization, although its functionality will be available partially - the initial motivation was successfuly initialize control service to make control API available no matter which morph problems occur

Close #963

* Make makeAndWaitNotaryDeposit run asynchronously as worker during application boot-up. It may happen that boot-up waits for the execution of a notary deposit transaction and waiting loop may hang for an indefinite time. In this case, we need to let frostfs-node go on initialization, although its functionality will be available partially - the initial motivation was successfuly initialize control service to make control API available no matter which morph problems occur Close #963
aarifullin requested review from storage-core-committers 2024-02-29 15:34:04 +00:00
aarifullin requested review from storage-core-developers 2024-02-29 15:34:11 +00:00
aarifullin added the
frostfs-node
label 2024-02-29 15:34:21 +00:00
aarifullin force-pushed fix/963-control_svc_iface_down from 5d95e29bce to aa8cf973e3 2024-02-29 15:52:55 +00:00 Compare

Hm, so now some tests may be tricked by the healthcheck: it will show ready when in reality nothing could be done with the node (even Object.GET requires getting container from the blockchain).
We need to see how the integration tests will react, have you tried running them locally?

Hm, so now some tests may be tricked by the healthcheck: it will show ready when in reality nothing could be done with the node (even Object.GET requires getting container from the blockchain). We need to see how the integration tests will react, have you tried running them locally?

Also, do we have Warn logs if deposit is not handled?

Also, do we have Warn logs if deposit is not handled?
Poster
Collaborator

have you tried running them locally?

No, I didn't. But I will try - it is definitely inappropriate to run them against dev-env

> have you tried running them locally? No, I didn't. But I will try - it is definitely inappropriate to run them against dev-env
Poster
Collaborator

Also, do we have Warn logs if deposit is not handled?

So, in the case of error, makeAndWaitNotaryDeposit forcefully finishes the application fatalOnError, fatalOnError

It means, that if wait-loop finally is broken due to error, the partially-functioning application fatally exits.

> Also, do we have Warn logs if deposit is not handled? - [waitNotaryDeposit](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-node/morph.go#L159-L183) returns error - [makeNotaryDeposit](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-node/morph.go#L151) returns error So, in the case of error, `makeAndWaitNotaryDeposit` forcefully finishes the application [fatalOnError](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-node/morph.go#L120), [fatalOnError](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-node/morph.go#L131) It means, that if wait-loop finally is broken due to error, the partially-functioning application fatally exits.

It would be nice to have Warn logs while polling for result, because the node is not fully functional during this.

It would be nice to have Warn logs while polling for result, because the node is not fully functional during this.
Poster
Collaborator

It would be nice to have Warn logs while polling for result, because the node is not fully functional during this.

Alright 👍

> It would be nice to have Warn logs while polling for result, because the node is not fully functional during this. Alright 👍

Can we also ensure that we have STARTING in the healthcheck if any of this is false:

  1. Node has finished initialization.
  2. Background notary waiter has exited.
Can we also ensure that we have `STARTING` in the healthcheck if any of this is false: 1. Node has finished initialization. 2. Background notary waiter has exited.
aarifullin force-pushed fix/963-control_svc_iface_down from aa8cf973e3 to 5c16240335 2024-03-07 09:12:10 +00:00 Compare
Poster
Collaborator

Can we also ensure that we have STARTING in the healthcheck if any of this is false:

  1. Node has finished initialization.
  2. Background notary waiter has exited.

I suppose this proposal works only for the second case.
The other workers like

c.workers = append(c.workers, newWorkerFromFunc(func(ctx context.Context) {
		c.treeService.Start(ctx)
	}))

do not "freeze" initialization of some components - so, they do not need to inidicate they are ready.
I have introduced switching healthcheck status after notary deposit job is done. But I have to check it with failover tests and I'll tell if they are not broken

UPD: ran failovers, the result looks fine

> Can we also ensure that we have `STARTING` in the healthcheck if any of this is false: > 1. Node has finished initialization. > 2. Background notary waiter has exited. I suppose this proposal works only for the second case. The other workers like ```go c.workers = append(c.workers, newWorkerFromFunc(func(ctx context.Context) { c.treeService.Start(ctx) })) ``` do not "freeze" initialization of some components - so, they do not need to inidicate they are ready. I have introduced switching healthcheck status after `notary deposit` job is done. But I have to check it with failover tests and I'll tell if they are not broken UPD: ran failovers, the result looks fine
aarifullin force-pushed fix/963-control_svc_iface_down from 5c16240335 to 78dd55088b 2024-03-07 09:22:55 +00:00 Compare
fyrchik approved these changes 2024-03-07 11:34:05 +00:00
fyrchik approved these changes 2024-03-07 11:36:26 +00:00
fyrchik left a comment
Owner

What about logs in waitNotaryDeposit?
We can log success with INFO and subsequent attempts in loops with DEBUG

What about logs in `waitNotaryDeposit`? We can log success with INFO and subsequent attempts in loops with DEBUG
aarifullin force-pushed fix/963-control_svc_iface_down from 78dd55088b to 74328ff8ee 2024-03-11 11:48:06 +00:00 Compare
Poster
Collaborator

What about logs in waitNotaryDeposit?
We can log success with INFO and subsequent attempts in loops with DEBUG

Alright. I have added helpful debug and info logs

> What about logs in `waitNotaryDeposit`? > We can log success with INFO and subsequent attempts in loops with DEBUG Alright. I have added helpful debug and info logs
acid-ant approved these changes 2024-03-11 13:25:07 +00:00
fyrchik reviewed 2024-03-11 18:37:38 +00:00
@ -159,2 +159,4 @@
ClientNotaryRequestWithPreparedMainTXInvoked = "notary request with prepared main TX invoked"
ClientNotaryRequestInvoked = "notary request invoked"
ClientNotaryDepositTransactionWasSuccessfullyPersisted = "notary deposit transaction was successfully persisted"
ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted = "attempt to wait for notary deposit transaction gets persisted"

It is either for notary deposit transaction to get persisted or until notary deposit transaction is persisted

It is either `for notary deposit transaction to get persisted` or `until notary deposit transaction is persisted`
Poster
Collaborator

Fixed: I preferred the first one

Fixed: I preferred the first one
fyrchik approved these changes 2024-03-11 18:37:48 +00:00
aarifullin force-pushed fix/963-control_svc_iface_down from 74328ff8ee to c86e6582fe 2024-03-11 20:59:34 +00:00 Compare
aarifullin requested review from acid-ant 2024-03-11 21:00:28 +00:00
aarifullin requested review from fyrchik 2024-03-11 21:00:29 +00:00
acid-ant reviewed 2024-03-12 05:59:42 +00:00
@ -158,6 +158,7 @@ var (
func waitNotaryDeposit(ctx context.Context, c *cfg, tx util.Uint256) error {
for i := 0; i < notaryDepositRetriesAmount; i++ {
c.log.Debug(logs.ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted)
Collaborator

ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted -> ClientAttemptToWaitForNotaryDepositTransactionToGetPersisted

ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted -> ClientAttemptToWaitForNotaryDepositTransactionToGetPersisted

cmd/frostfs-node/morph.go:161:20: undefined: logs.ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted

`cmd/frostfs-node/morph.go:161:20: undefined: logs.ClientAttemptToWaitForNotaryDepositTransactionGetsPersisted`
aarifullin force-pushed fix/963-control_svc_iface_down from c86e6582fe to 921a43fc10 2024-03-12 07:16:14 +00:00 Compare
acid-ant approved these changes 2024-03-12 07:18:03 +00:00
fyrchik approved these changes 2024-03-12 10:17:42 +00:00
fyrchik merged commit b4cb54e7ed into master 2024-03-12 10:18:18 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-core-developers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1014
There is no content yet.