Should policer handle an object already removed
error returned by a HEAD
request? #1543
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1543
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Is your feature request related to a problem? Please describe.
When the policer receives an
object already removed
error returned by aHEAD
request, it logs the error and keep iterating through nodes. It seems an error of that type should be handled some way. The main problem is that the policer clutters logs with error messages._, err := p.remoteHeader(callCtx, nodes[i], addr, false)
cancel()
if err == nil {
shortage--
checkedNodes.submitReplicaHolder(nodes[i])
} else {
if client.IsErrObjectNotFound(err) {
checkedNodes.submitReplicaCandidate(nodes[i])
continue
} else if client.IsErrNodeUnderMaintenance(err) {
shortage, uncheckedCopies = p.handleMaintenance(ctx, nodes[i], checkedNodes, shortage, uncheckedCopies)
} else {
p.log.Error(ctx, logs.PolicerReceiveObjectHeaderToCheckPolicyCompliance,
zap.Stringer("object", addr),
zap.String("error", err.Error()),
)
}
}
}
Describe the solution you'd like
Avoid logging errors of that type and handle them. For example, if the policer knows the object is already removed, it can skip replication and move on — just a suggestion, need to be discussed.
Describe alternatives you've considered
Keep things as they are.
Additional context
This issue arose when debugging the placement of tombstones with a large number of inhumed objects.
We cannot trust any single node, so other nodes still need to be checked.
However, avoiding replication on this node makes sense.
It's like the policer works right now
My point was: do we really need to log errors of that type with the ERROR log level?
I think such logical errors should we logged with DEBUG, they are expected.
We should leave ERROR for network errors.
Discussed, closed.