writecache: Fix metric values #759
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#759
Loading…
Reference in a new issue
No description provided.
Delete branch "dstepanov-yadro/frostfs-node:fix/wc_counter"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #725
Main problem is that
c.metrics.Evict
(alsoc.metrics.Delete
andc.metrics.Add
) changes actual count metric value, but the same object can be flushed twice, so in this case metric value can be negative.Now only one place left with metrics update -
c.estimateCacheSize()
.Thanx to @elebedeva for troubleshooting, bug reproduce and fix suggestion.
@ -60,6 +60,7 @@ func (c *cache) runFlushLoop(ctx context.Context) {
case <-tt.C:
c.flushSmallObjects(ctx)
tt.Reset(defaultFlushInterval)
c.estimateCacheSize()
Scenario
Cache DB contains two object
Goroutine 1: flush and delete object 1, decrements counter, counter value = 1
Goroutine 2: flush and delete object 2, decrements counter, counter value = 0, set metric value to 0
Goroutine 1: set metric value to 1
Now metric will be updated every 1 second in case of empty cache.
In you scenario we have metric=1, when it should be 0. What is the scenario for the negative metric?
Described in main comment.
When we flush the same object twice: we can put the same object in channel twice, if worker haven't deleted it yet.
c.metrics.Evict
is called twice, so metric counter also was decremented twice, becauserecordDeleted
is not checked inc.metrics.Evict
: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/writecache/writecachebbolt/storage.go#L81d368a5aeba
tod4b6ebe7e7
@ -73,7 +73,7 @@ func (c *cache) deleteFromDB(key string) {
err := c.db.Batch(func(tx *bbolt.Tx) error {
b := tx.Bucket(defaultBucket)
key := []byte(key)
recordDeleted = !recordDeleted && b.Get(key) != nil
Why this change?
We discussed it in previous PR with metrics, so it was just not deleted.
We also need this for the support branch.