Validate advertised node addresses before adding to netmap #1497
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1497
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Is your feature request related to a problem? Please describe.
I've encountered all sorts of weird problems (OOM, cryptic errors returned for PUT requests, etc) due to a misconfiguration on my part: storage nodes were (mis)configured to advertise both their and their neighbors' addresses in
node.addresses[]
:addresses: # list of addresses announced by Storage node in the Network map
- s01.frostfs.devenv:8080
- /dns4/s02.frostfs.devenv/tcp/8081
- grpc://127.0.0.1:8082
- grpcs://localhost:8083
Describe the solution you'd like
Let's discuss whether innerring should intervene and gracefully handle such scenarios. This is especially relevant for public FrostFS deployments where untrusted actors may intentionally add misconfigured storage nodes to the network.
Innerring node may check (a) whether the advertised address is responsive and (b) whether the node replying on that address is the one that's advertising it. I think that dropping unresponsive addresses from netmap is a step too far (e.g. nodes may want to advertise their LAN address for local peering) but what about dropping addresses which sign replies with a wrong key? Theoretically, there exists a chance of false positive (LAN address collision between different LANs) but is that significant enough?
Describe alternatives you've considered
Fixing all dysfunctional behaviors caused by nodes advertising wrong addresses on netmap would be quite an effort, but I guess that's still an alternative to consider.
Additional context
innerring: Validate advertised node addresses before adding to netmapto Validate advertised node addresses before adding to netmapI think that should be solved with the reputation system. The problem with a single validation event:
To prevent misconfiguration, we may enable some validation on the node itself, like matching
node.addresses
section withgrpc.endpoints
. However, this is not possible in general case, and I am not sure any specific rule will be useful.@potyarkin could you elaborate on what specific misconfiguration caused such an error message?
Each node has only 1 key and should already be validated during bootstrap.
My misconfigured nodes advertised addresses of all their neighbors as their own, i.e.
So when some client would want to talk specifically to
storage-node-01
it would in fact connect to any random node from the cluster, sometimes to a correct one but often not. Cryptic error messages are not the worst that could happen. I think (but can not prove) that OOMs in my cluster were also caused by this.This was an honest mistake and I should've read the docs better, but this can also be used with malice. If untrusted actors are allowed to add storage nodes (like in the planet-wide FrostFS scenario) they can configure their nodes to knowingly advertise addresses of other good nodes - and this will wreak all sorts of havoc, not only on the misconfigured nodes but through the whole FrostFS network.