not enough nodes to SELECT from answer on the dev-env #1448

Closed
opened 2024-10-24 07:23:16 +00:00 by fyrchik · 7 comments
Owner

It may be a frostfs-cli, frostfs-node or dev-env issue, need to investigate.
Container commands seem not to work. Object search works, though.

$ frostfs-cli container create --policy 'REP 1 CBF 1' -r grpc://s01.frostfs.devenv:8080 -w ../dev-env/wallets/wallet.json --await
Enter password >
CID: q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX
awaiting...
fcontainer has been persisted on sidechain
$ frostfs-cli container get -g --cid q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX -r s01.frostfs.devenv:8080
rpc error: status: code = 1024 message = not enough nodes to SELECT from
$ frostfs-cli netmap snapshot -g -r s01.frostfs.devenv:8080
Epoch: 1
Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 ONLINE /dns4/s01.frostfs.devenv/tcp/8080
        Continent: Europe
        Country: Russia
        CountryCode: RU
        Location: Moskva
        Price: 22
        SubDiv: Moskva
        SubDivCode: MOW
        UN-LOCODE: RU MOW
        User-Agent: FrostFS/0.34
Node 2: 02ac920cd7df0b61b289072e6b946e2da4e1a31b9ab1c621bb475e30fa4ab102c3 ONLINE /dns4/s03.frostfs.devenv/tcp/8080
        Continent: Europe
        Country: Sweden
        CountryCode: SE
        Location: Stockholm
        Price: 11
        SubDiv: Stockholms l�n
        SubDivCode: AB
        UN-LOCODE: SE STO
        User-Agent: FrostFS/0.34
Node 3: 038c862959e56b43e20f79187c4fe9e0bc7c8c66c1603e6cf0ec7f87ab6b08dc35 ONLINE /dns4/s04.frostfs.devenv/tcp/8082/tls /dns4/s04.frostfs.devenv/tcp/8080
        Continent: Europe
        Country: Finland
        CountryCode: FI
        Location: Helsinki (Helsingfors)
        Price: 44
        SubDiv: Uusimaa
        SubDivCode: 18
        UN-LOCODE: FI HEL
        User-Agent: FrostFS/0.34
Node 4: 03ff65b6ae79134a4dce9d0d39d3851e9bab4ee97abf86e81e1c5bbc50cd2826ae ONLINE /dns4/s02.frostfs.devenv/tcp/8080
        Continent: Europe
        Country: Russia
        CountryCode: RU
        Location: Saint Petersburg (ex Leningrad)
        Price: 33
        SubDiv: Sankt-Peterburg
        SubDivCode: SPE
        UN-LOCODE: RU LED
        User-Agent: FrostFS/0.34
$ frostfs-cli container get -g --cid q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX -r s01.frostfs.devenv:8080
rpc error: status: code = 1024 message = not enough nodes to SELECT from
$ frostfs-cli object search --root --cid q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX -r s01.frostfs.devenv:8080 -w wallets/wallet.json
Enter password >
Found 0 objects.
It may be a frostfs-cli, frostfs-node or dev-env issue, need to investigate. Container commands seem not to work. Object search works, though. ``` $ frostfs-cli container create --policy 'REP 1 CBF 1' -r grpc://s01.frostfs.devenv:8080 -w ../dev-env/wallets/wallet.json --await Enter password > CID: q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX awaiting... fcontainer has been persisted on sidechain $ frostfs-cli container get -g --cid q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX -r s01.frostfs.devenv:8080 rpc error: status: code = 1024 message = not enough nodes to SELECT from $ frostfs-cli netmap snapshot -g -r s01.frostfs.devenv:8080 Epoch: 1 Node 1: 022bb4041c50d607ff871dec7e4cd7778388e0ea6849d84ccbd9aa8f32e16a8131 ONLINE /dns4/s01.frostfs.devenv/tcp/8080 Continent: Europe Country: Russia CountryCode: RU Location: Moskva Price: 22 SubDiv: Moskva SubDivCode: MOW UN-LOCODE: RU MOW User-Agent: FrostFS/0.34 Node 2: 02ac920cd7df0b61b289072e6b946e2da4e1a31b9ab1c621bb475e30fa4ab102c3 ONLINE /dns4/s03.frostfs.devenv/tcp/8080 Continent: Europe Country: Sweden CountryCode: SE Location: Stockholm Price: 11 SubDiv: Stockholms l�n SubDivCode: AB UN-LOCODE: SE STO User-Agent: FrostFS/0.34 Node 3: 038c862959e56b43e20f79187c4fe9e0bc7c8c66c1603e6cf0ec7f87ab6b08dc35 ONLINE /dns4/s04.frostfs.devenv/tcp/8082/tls /dns4/s04.frostfs.devenv/tcp/8080 Continent: Europe Country: Finland CountryCode: FI Location: Helsinki (Helsingfors) Price: 44 SubDiv: Uusimaa SubDivCode: 18 UN-LOCODE: FI HEL User-Agent: FrostFS/0.34 Node 4: 03ff65b6ae79134a4dce9d0d39d3851e9bab4ee97abf86e81e1c5bbc50cd2826ae ONLINE /dns4/s02.frostfs.devenv/tcp/8080 Continent: Europe Country: Russia CountryCode: RU Location: Saint Petersburg (ex Leningrad) Price: 33 SubDiv: Sankt-Peterburg SubDivCode: SPE UN-LOCODE: RU LED User-Agent: FrostFS/0.34 $ frostfs-cli container get -g --cid q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX -r s01.frostfs.devenv:8080 rpc error: status: code = 1024 message = not enough nodes to SELECT from $ frostfs-cli object search --root --cid q3PEzhNExx2EgtsrpTvRobR8GQQcRwUJAYqtzQFm6HX -r s01.frostfs.devenv:8080 -w wallets/wallet.json Enter password > Found 0 objects. ```
fyrchik added this to the v0.44.0 milestone 2024-10-24 07:23:16 +00:00
fyrchik added the
bug
label 2024-10-24 07:23:16 +00:00
a-savchuk was assigned by fyrchik 2024-10-24 07:23:22 +00:00
fyrchik added the
frostfs-cli
frostfs-node
labels 2024-10-24 07:24:21 +00:00
Member

Steps to reproduce

  1. Run dev-env with make down clean up
  2. Run all command described above as fast as possible -- we need to do it on the first epoch
## Steps to reproduce 1. Run `dev-env` with `make down clean up` 2. Run all command described above **as fast as possible** -- we need to do it on the first epoch
Author
Owner

Why does the epoch matter? If I see the network map, container should be built.

Why does the epoch matter? If I see the network map, container should be built.
Member

Why does the epoch matter? If I see the network map, container should be built.

I'm not sure, but I think the problem is in this function where we get a network map. I keep working on the problem

func (ac *apeChecker) isContainerKey(pk []byte, cnrID cid.ID, cont *containercore.Container) (bool, error) {
binCnrID := make([]byte, sha256.Size)
cnrID.Encode(binCnrID)
nm, err := netmap.GetLatestNetworkMap(ac.nm)
if err != nil {
return false, err
}
in, err := isContainerNode(nm, pk, binCnrID, cont)
if err != nil {
return false, err
} else if in {
return true, nil
}
// then check previous netmap, this can happen in-between epoch change
// when node migrates data from last epoch container
nm, err = netmap.GetPreviousNetworkMap(ac.nm)
if err != nil {
return false, err
}
return isContainerNode(nm, pk, binCnrID, cont)
}

> Why does the epoch matter? If I see the network map, container should be built. I'm not sure, but I think the problem is in this function where we get a network map. I keep working on the problem https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/bc8d79ddf949cb88546bc214d688a03f5ab9740e/pkg/services/container/ape.go#L531-L555
Member

Suppose we're on the 1st epoch and pk isn't in the current network map

  1. netmap.GetLatestNetworkMap returns the current network map like the one above
  2. isContainerNode returns false and no error
  3. we continue
  4. netmap.GetPreviousNetworkMap returns an empty network map because of the 0th epoch
  5. isContainerNode returns false and an error not enough nodes to SELECT from

In general, that error could occur not only on the 1st epoch but also when a network map has had not enough nodes on the previous two epochs (fact checking needed)

Suppose we're on the 1st epoch and `pk` isn't in the current network map 1. `netmap.GetLatestNetworkMap` returns the current network map like the one above 2. `isContainerNode` returns `false` and no error 3. we continue 4. `netmap.GetPreviousNetworkMap` returns an empty network map because of the 0th epoch 5. `isContainerNode` returns `false` and an error `not enough nodes to SELECT from` In general, that error could occur not only on the 1st epoch but also when a network map has had not enough nodes on the previous two epochs (fact checking needed)
Member

I noticed that the problem occurs only when the -g flag is used

I noticed that the problem occurs only when the `-g` flag is used
Author
Owner

Most likely because all the wallets you use belong either to node or container owner.

Most likely because all the wallets you use belong either to node or container owner.
Member

In general, that error could occur not only on the 1st epoch but also when a network map has had not enough nodes on the previous two epochs (fact checking needed)

cnrVectors, _ := nm.ContainerNodes(cont.Value.PlacementPolicy(), binCnrID)

func (m NetMap) ContainerNodes(p PlacementPolicy, pivot []byte) ([][]NodeInfo, error) {

In fact, we can successfully get container nodes even when a selector doesn't have enough nodes for a replica, however, in this case the network map mustn't be empty, otherwise an error is returned.

Ref. TrueCloudLab/frostfs-sdk-go#167

> In general, that error could occur not only on the 1st epoch but also when a network map has had not enough nodes on the previous two epochs (fact checking needed) https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/6c45a17af66843ac5757fcd4b8f8e6acd0bca087/pkg/services/container/ape.go#L558 https://git.frostfs.info/TrueCloudLab/frostfs-sdk-go/src/commit/56c4aaaaca2a124dc1e5002a1684867886b373a3/netmap/netmap.go#L243 In fact, we can successfully get container nodes even when a selector doesn't have enough nodes for a replica, however, in this case **the network map mustn't be empty**, otherwise an error is returned. Ref. https://git.frostfs.info/TrueCloudLab/frostfs-sdk-go/pulls/167
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1448
No description provided.