ir: Execute netmap.addPeerIR only for state online #841

Merged
fyrchik merged 2 commits from acid-ant/frostfs-node:bugfix/ir-fix-mm into master 2023-12-11 13:14:43 +00:00
3 changed files with 64 additions and 1 deletions

43
docs/epoch.md Normal file
View file

@ -0,0 +1,43 @@
# Epoch
Review

I like the doc! Also, created a #856

I like the doc! Also, created a https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/856
The main purpose of the `epoch` in `frostfs` environment is to manipulate `netmap`.
Each new epoch, `ir` service trigger revision content of the `netmap` by adding or removing nodes to or from it.
`node` service trigger few internal processes each new epoch - for example, running GC.
Epoch also used in an object lifecycle.
At the startup, `ir` service initializes an epoch timer which handles new epoch tick.
Epoch timer is a block timer - which means that this timer ticks each block or set of blocks.
The epoch duration stores in the configurable parameter `EpochDuration` in the blockchain.
It is possible to get it via `frostfs-adm`:
```shell
> frostfs-adm morph dump-config -c config.yml -r http://morph-chain.frostfs.devenv:30333
...
EpochDuration: 240 (int)
...
>
```
Once epoch timer ticks, `ir` service call method [NewEpoch](https://git.frostfs.info/TrueCloudLab/frostfs-contract/src/commit/a1b61d3949581f4d65b0d32a33d98ba9c193dc2a/netmap/netmap_contract.go#L238)
of the `netmap` contract. Each `ir` instance can do this at the same time, but it is not an issue,
because multiple call of this method with the same set of parameters will give us the same result.
Utility `frostfs-adm` have a command to trigger new epoch:
```shell
> frostfs-adm morph force-new-epoch -c config.yml -r http://morph-chain.frostfs.devenv:30333
```
Command goes directly to the `netmap` contract and call method `NewEpoch`.
Method checks alphabet witness and stores candidates nodes which are not in the `OFFLINE` state as a current netmap.
Then executes method `NewEpoch` in `balance` and `container` contracts.
At the end it produces notification `NewEpoch` which is handled by `node` and `ir` services.
`ir` handler for `NewEpoch` updates internal state of the netmap, if it is necessary, updates state of the nodes or
marks for exclusion from netmap in the blockchain.
`node` handler for `NewEpoch` executes method `addPeer` of the `netmap` contract.
This method do nothing, but produces notification which handled by `ir` service.
`ir` in handler for `AddPeer` may update node state in the netmap if it is necessary.
At the startup, node bootstraps with state `ONLINE`. From the online state, it is possible to move to `MAINTENANCE` or `OFFLINE`.
Node moved to `OFFLINE` state automatically, when there is no bootstrap request from it for a number of epochs.
This number stored in the `ir` config `netmap_cleaner.threshold`.
From `OFFLINE` state node, once it bootstrapped, moves to `ONLINE`.
`MAINTENANCE` state persists even if node rebooted or unavailable for a few epochs.

View file

@ -144,6 +144,21 @@ func TestAddPeer(t *testing.T) {
time.Sleep(10 * time.Millisecond)
}
require.Nil(t, nc.notaryInvokes, "invalid notary invokes")
node.SetOnline()
ev = netmapEvent.AddPeer{
NodeBytes: node.Marshal(),
Request: &payload.P2PNotaryRequest{
MainTransaction: &transaction.Transaction{},
},
}
proc.handleAddPeer(ev)
for proc.pool.Running() > 0 {
time.Sleep(10 * time.Millisecond)
}
require.EqualValues(t, []notaryInvoke{
{
contract: nc.contractAddress,

View file

@ -57,7 +57,12 @@ func (np *Processor) processAddPeer(ev netmapEvent.AddPeer) bool {
updated := np.netmapSnapshot.touch(keyString, np.epochState.EpochCounter(), nodeInfoBinary)
if updated {
// `processAddPeer` reacts on `AddPeer` notification, `processNewEpoch` - on `NewEpoch`.
// This two notification produces in order - `NewEpoch` -> `AddPeer`.
// But there is no guarantee that code will be executed in the same order.
// That is why we need to perform `addPeerIR` only in case when node is online,
// because in scope of this method, contract set state `ONLINE` for the node.
if updated && nodeInfo.IsOnline() {

So what happens for Maintenance?

So what happens for `Maintenance`?

This code will be skipped, and node will be updated in handler for NewEpoch. Here fixed an issue with exclusion of the node in MM from netmap.

This code will be skipped, and node will be updated in handler for `NewEpoch`. [Here](https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/757) fixed an issue with exclusion of the node in MM from netmap.

So if we update node's attributes, and we are in the maintenance mode, they will only be applied after we go online?

So if we update node's attributes, and we are in the maintenance mode, they will only be applied after we go online?

Right. In other case, node will move to online state by netmap.AddPeerIR.

Right. In other case, node will move to `online` state by [netmap.AddPeerIR](https://git.frostfs.info/TrueCloudLab/frostfs-contract/src/commit/bc3186575ff47b75a811a78ddef3b10869f2d4e2/netmap/netmap_contract.go#L143).

Could you test this for >20 epoch ticks? (we have 15 snapshots in contract by default, thus 20)
Contract has some logic https://git.frostfs.info/TrueCloudLab/frostfs-contract/src/branch/master/netmap/netmap_contract.go#L504 , but I am not sure it'll continue to work if ir wont send updates.

Could you test this for >20 epoch ticks? (we have 15 snapshots in contract by default, thus 20) Contract has some logic https://git.frostfs.info/TrueCloudLab/frostfs-contract/src/branch/master/netmap/netmap_contract.go#L504 , but I am not sure it'll continue to work if ir wont send updates.

Tested in dev-env. Set maintenance status:

@:~$ docker inspect s01 | grep Image
            "Image": "truecloudlab/frostfs-dirty-storage:0.37.0-rc.1-144-g05249e6c",
@:~$
@:~$ frostfs-cli control set-status --endpoint s01.frostfs.devenv:8081 --status maintenance --wallet services/storage/wallet01.json 
Network status update request successfully sent.
@:~$ 
@:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch"
Epoch: 1
Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 ONLINE /dns4/s01.frostfs.devenv/tcp/8080
...

Force new epoch:

@:~$ frostfs-adm morph force-new-epoch -c cnt_create_cfg.yml -r http://morph-chain.frostfs.devenv:30333
Current epoch: 23, increase to 24.
Waiting for transactions to persist...
@:~$ 
@:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch"
Epoch: 24
Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 
... 
@:~$ 

Stop s01 container and force new epoch:

@:~$ docker stop s01
s01
@:~$ 
@:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch"
Epoch: 25
Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 
...
@:~$
@:~$ frostfs-adm morph force-new-epoch -c cnt_create_cfg.yml -r http://morph-chain.frostfs.devenv:30333
Current epoch: 50, increase to 51.
Waiting for transactions to persist...
@:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch"
Epoch: 51
Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 
...
@:~$

Node still in maintenance mode.
Start container and check for node status:

@:~$ docker start s01
s01
@:~$
@:~$ frostfs-adm morph force-new-epoch -c cnt_create_cfg.yml -r http://morph-chain.frostfs.devenv:30333
Current epoch: 55, increase to 56.
Waiting for transactions to persist...
@:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch"
Epoch: 56
Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080
...
@:~$ frostfs-cli control healthcheck --endpoint s01.frostfs.devenv:8081 --wallet /home/annikifa/workspace/frostfs-dev-env/services/storage/wallet01.json 
Enter password > 
Network status: MAINTENANCE
Health status: READY
@:~$
Tested in dev-env. Set maintenance status: ``` @:~$ docker inspect s01 | grep Image "Image": "truecloudlab/frostfs-dirty-storage:0.37.0-rc.1-144-g05249e6c", @:~$ @:~$ frostfs-cli control set-status --endpoint s01.frostfs.devenv:8081 --status maintenance --wallet services/storage/wallet01.json Network status update request successfully sent. @:~$ @:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch" Epoch: 1 Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 ONLINE /dns4/s01.frostfs.devenv/tcp/8080 ... ``` Force new epoch: ``` @:~$ frostfs-adm morph force-new-epoch -c cnt_create_cfg.yml -r http://morph-chain.frostfs.devenv:30333 Current epoch: 23, increase to 24. Waiting for transactions to persist... @:~$ @:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch" Epoch: 24 Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 ... @:~$ ``` Stop s01 container and force new epoch: ``` @:~$ docker stop s01 s01 @:~$ @:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch" Epoch: 25 Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 ... @:~$ @:~$ frostfs-adm morph force-new-epoch -c cnt_create_cfg.yml -r http://morph-chain.frostfs.devenv:30333 Current epoch: 50, increase to 51. Waiting for transactions to persist... @:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch" Epoch: 51 Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 ... @:~$ ``` Node still in maintenance mode. Start container and check for node status: ``` @:~$ docker start s01 s01 @:~$ @:~$ frostfs-adm morph force-new-epoch -c cnt_create_cfg.yml -r http://morph-chain.frostfs.devenv:30333 Current epoch: 55, increase to 56. Waiting for transactions to persist... @:~$ frostfs-cli netmap snapshot -g -r s04.frostfs.devenv:8080 | grep "Node\|Epoch" Epoch: 56 Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 MAINTENANCE /dns4/s01.frostfs.devenv/tcp/8080 ... @:~$ frostfs-cli control healthcheck --endpoint s01.frostfs.devenv:8081 --wallet /home/annikifa/workspace/frostfs-dev-env/services/storage/wallet01.json Enter password > Network status: MAINTENANCE Health status: READY @:~$ ```
np.log.Info(logs.NetmapApprovingNetworkMapCandidate,
zap.String("key", keyString))