Should policer handle an `object already removed` error returned by a `HEAD` request? #1543

New issue

Closed

opened 2024-12-06 07:51:04 +00:00 by a-savchuk · 4 comments

a-savchuk commented

2024-12-06 07:51:04 +00:00

Member

When the policer receives an object already removed error returned by a HEAD request, it logs the error and keep iterating through nodes. It seems an error of that type should be handled some way. The main problem is that the policer clutters logs with error messages.

 			_, err := p.remoteHeader(callCtx, nodes[i], addr, false)
 			cancel()
 			if err == nil {
 				shortage--
 				checkedNodes.submitReplicaHolder(nodes[i])
 			} else {
 				if client.IsErrObjectNotFound(err) {
 					checkedNodes.submitReplicaCandidate(nodes[i])
 					continue
 				} else if client.IsErrNodeUnderMaintenance(err) {
 					shortage, uncheckedCopies = p.handleMaintenance(ctx, nodes[i], checkedNodes, shortage, uncheckedCopies)
 				} else {
 					p.log.Error(ctx, logs.PolicerReceiveObjectHeaderToCheckPolicyCompliance,
 						zap.Stringer("object", addr),
 						zap.String("error", err.Error()),
 					)
 				}
 			}
 		}

dec 05 14:01:10 node2 frostfs-node[1400596]: error        policer/check.go:154        receive object header to check policy compliance        {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.127/tcp/8080 /ip4/192.168.199.127/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"}
dec 05 14:01:10 node2 frostfs-node[1400596]: error        policer/check.go:154        receive object header to check policy compliance        {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.113/tcp/8080 /ip4/192.168.199.113/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"}
dec 05 14:01:10 node2 frostfs-node[1400596]: error        policer/check.go:154        receive object header to check policy compliance        {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.128/tcp/8080 /ip4/192.168.199.128/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"}
...

Describe the solution you'd like

Avoid logging errors of that type and handle them. For example, if the policer knows the object is already removed, it can skip replication and move on — just a suggestion, need to be discussed.

Describe alternatives you've considered

Keep things as they are.

Additional context

This issue arose when debugging the placement of tombstones with a large number of inhumed objects.

## Is your feature request related to a problem? Please describe.  When the policer receives an `object already removed` error returned by a `HEAD` request, it logs the error and keep iterating through nodes. It seems an error of that type should be handled some way. The main problem is that the policer clutters logs with error messages. https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/7df3520d486555a0211f7e37ee3e0fa9a96cf92c/pkg/services/policer/check.go#L140-L160 ``` dec 05 14:01:10 node2 frostfs-node[1400596]: error policer/check.go:154 receive object header to check policy compliance {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.127/tcp/8080 /ip4/192.168.199.127/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"} dec 05 14:01:10 node2 frostfs-node[1400596]: error policer/check.go:154 receive object header to check policy compliance {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.113/tcp/8080 /ip4/192.168.199.113/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"} dec 05 14:01:10 node2 frostfs-node[1400596]: error policer/check.go:154 receive object header to check policy compliance {"component": "Object Policer", "object": "CozR4eAC3XZj3JVwPZpDC2MSyxZEiQHSsiSEmfbfJgLb/7aDhHRhuNLv5LoakzibxnYjpXQ4SpkbMAV6q7JH7HFNu", "error": "(*object.RemoteReader) could not head object in [/ip4/192.168.198.128/tcp/8080 /ip4/192.168.199.128/tcp/8080]: read object header from FrostFS: status: code = 2052 message = object already removed"} ... ``` ## Describe the solution you'd like Avoid logging errors of that type and handle them. For example, if the policer knows the object is already removed, it can skip replication and move on — **just a suggestion, need to be discussed**. ## Describe alternatives you've considered Keep things as they are. ## Additional context This issue arose when debugging the placement of tombstones with a large number of inhumed objects.

a-savchuk added the

labels 2024-12-06 07:51:04 +00:00

fyrchik commented

2024-12-06 07:56:47 +00:00

Owner

We cannot trust any single node, so other nodes still need to be checked.
However, avoiding replication on this node makes sense.

We cannot trust any single node, so other nodes still need to be checked. However, avoiding replication on _this_ node makes sense.

a-savchuk commented

2024-12-06 08:07:27 +00:00

Author

Member

However, avoiding replication on this node makes sense.

It's like the policer works right now

My point was: do we really need to log errors of that type with the ERROR log level?

> However, avoiding replication on _this_ node makes sense. It's like the policer works right now My point was: do we really need to log errors of that type with the ERROR log level?

fyrchik commented

2024-12-06 08:14:24 +00:00

Owner

I think such logical errors should we logged with DEBUG, they are expected.
We should leave ERROR for network errors.

I think such logical errors should we logged with DEBUG, they are expected. We should leave ERROR for network errors.

👍 1

a-savchuk referenced this issue

2024-12-10 11:12:40 +00:00

Fix checking EC parent existence on Put object to shard #1548