[#1648] writecache: Fix race condition when reporting cache size metrics
All checks were successful
DCO action / DCO (pull_request) Successful in 38s
Vulncheck / Vulncheck (pull_request) Successful in 55s
Build / Build Components (pull_request) Successful in 1m29s
Pre-commit hooks / Pre-commit (pull_request) Successful in 1m30s
Tests and linters / gopls check (pull_request) Successful in 2m24s
Tests and linters / Run gofumpt (pull_request) Successful in 2m47s
Tests and linters / Tests (pull_request) Successful in 3m1s
Tests and linters / Staticcheck (pull_request) Successful in 3m10s
Tests and linters / Lint (pull_request) Successful in 3m18s
Tests and linters / Tests with -race (pull_request) Successful in 4m0s

There is a race condition when multiple cache operation try to report
the cache size metrics simultaneously. Consider the following example:
- the initial total size of objects stored in the cache size is 2
- worker X deletes an object and reads the cache size, which is 1
- worker Y deletes an object and reads the cache size, which is 0
- worker Y reports the cache size it learnt, which is 0
- worker X reports the cache size it learnt, which is 1

As a result, the observed cache size is 1 (i. e. one object remains
in the cache), which is incorrect because the actual cache size is 0.

To fix this, let's report the metrics periodically in the flush loop.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
This commit is contained in:
Aleksey Savchuk 2025-02-18 10:51:43 +03:00
parent 9b29e7392f
commit 8f776b2f41
Signed by: a-savchuk
GPG key ID: 70C0A7FF6F9C4639
6 changed files with 3 additions and 11 deletions

View file

@ -94,7 +94,6 @@ func (c *cache) Open(_ context.Context, mod mode.Mode) error {
if err != nil {
return metaerr.Wrap(err)
}
c.initCounters()
return nil
}

View file

@ -52,8 +52,6 @@ func (c *cache) Delete(ctx context.Context, addr oid.Address) error {
storagelog.OpField("fstree DELETE"),
)
deleted = true
// counter changed by fstree
c.estimateCacheSize()
}
return metaerr.Wrap(err)
}

View file

@ -87,6 +87,9 @@ func (c *cache) pushToFlushQueue(ctx context.Context, fl *flushLimiter) {
}
c.modeMtx.RUnlock()
// counter changed by fstree
c.estimateCacheSize()
case <-ctx.Done():
return
}

View file

@ -73,8 +73,6 @@ func (c *cache) putBig(ctx context.Context, prm common.PutPrm) error {
storagelog.StorageTypeField(wcStorageType),
storagelog.OpField("fstree PUT"),
)
// counter changed by fstree
c.estimateCacheSize()
return nil
}

View file

@ -18,7 +18,3 @@ func (c *cache) hasEnoughSpace(objectSize uint64) bool {
}
return c.maxCacheSize >= size+objectSize
}
func (c *cache) initCounters() {
c.estimateCacheSize()
}

View file

@ -51,7 +51,5 @@ func (c *cache) deleteFromDisk(ctx context.Context, addr oid.Address, size uint6
storagelog.OpField("fstree DELETE"),
)
c.metrics.Evict(StorageTypeFSTree)
// counter changed by fstree
c.estimateCacheSize()
}
}