Should policer handle an object already removed error returned by a HEAD request? #1543

Closed
opened 2024-12-06 07:51:04 +00:00 by a-savchuk · 4 comments
Member

When the policer receives an object already removed error returned by a HEAD request, it logs the error and keep iterating through nodes. It seems an error of that type should be handled some way. The main problem is that the policer clutters logs with error messages.

_, err := p.remoteHeader(callCtx, nodes[i], addr, false)
cancel()
if err == nil {
shortage--
checkedNodes.submitReplicaHolder(nodes[i])
} else {
if client.IsErrObjectNotFound(err) {
checkedNodes.submitReplicaCandidate(nodes[i])
continue
} else if client.IsErrNodeUnderMaintenance(err) {
shortage, uncheckedCopies = p.handleMaintenance(ctx, nodes[i], checkedNodes, shortage, uncheckedCopies)
} else {
p.log.Error(ctx, logs.PolicerReceiveObjectHeaderToCheckPolicyCompliance,
zap.Stringer("object", addr),
zap.String("error", err.Error()),
)
}
}
}

dec 05 14:01:10 node2 frostfs-node[1400596]: error        policer/check.go:154        receive object header to check policy compliance        {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.127/tcp/8080 /ip4/192.168.199.127/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"}
dec 05 14:01:10 node2 frostfs-node[1400596]: error        policer/check.go:154        receive object header to check policy compliance        {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.113/tcp/8080 /ip4/192.168.199.113/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"}
dec 05 14:01:10 node2 frostfs-node[1400596]: error        policer/check.go:154        receive object header to check policy compliance        {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.128/tcp/8080 /ip4/192.168.199.128/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"}
...

Describe the solution you'd like

Avoid logging errors of that type and handle them. For example, if the policer knows the object is already removed, it can skip replication and move on — just a suggestion, need to be discussed.

Describe alternatives you've considered

Keep things as they are.

Additional context

This issue arose when debugging the placement of tombstones with a large number of inhumed objects.

## Is your feature request related to a problem? Please describe. <!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when ... --> When the policer receives an `object already removed` error returned by a `HEAD` request, it logs the error and keep iterating through nodes. It seems an error of that type should be handled some way. The main problem is that the policer clutters logs with error messages. https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/7df3520d486555a0211f7e37ee3e0fa9a96cf92c/pkg/services/policer/check.go#L140-L160 ``` dec 05 14:01:10 node2 frostfs-node[1400596]: error policer/check.go:154 receive object header to check policy compliance {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.127/tcp/8080 /ip4/192.168.199.127/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"} dec 05 14:01:10 node2 frostfs-node[1400596]: error policer/check.go:154 receive object header to check policy compliance {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.113/tcp/8080 /ip4/192.168.199.113/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"} dec 05 14:01:10 node2 frostfs-node[1400596]: error policer/check.go:154 receive object header to check policy compliance {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.128/tcp/8080 /ip4/192.168.199.128/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"} ... ``` ## Describe the solution you'd like Avoid logging errors of that type and handle them. For example, if the policer knows the object is already removed, it can skip replication and move on &mdash; **just a suggestion, need to be discussed**. ## Describe alternatives you've considered Keep things as they are. ## Additional context This issue arose when debugging the placement of tombstones with a large number of inhumed objects.
a-savchuk added the
discussion
frostfs-node
triage
observability
labels 2024-12-06 07:51:04 +00:00
Owner

We cannot trust any single node, so other nodes still need to be checked.
However, avoiding replication on this node makes sense.

We cannot trust any single node, so other nodes still need to be checked. However, avoiding replication on _this_ node makes sense.
Author
Member

However, avoiding replication on this node makes sense.

It's like the policer works right now

My point was: do we really need to log errors of that type with the ERROR log level?

> However, avoiding replication on _this_ node makes sense. It's like the policer works right now My point was: do we really need to log errors of that type with the ERROR log level?
Owner

I think such logical errors should we logged with DEBUG, they are expected.
We should leave ERROR for network errors.

I think such logical errors should we logged with DEBUG, they are expected. We should leave ERROR for network errors.
Owner

Discussed, closed.

Discussed, closed.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1543
No description provided.