frostfs-node/pkg/local_object_storage/writecache/metrics.go
Aleksey Savchuk b28e45f5a4
All checks were successful
DCO action / DCO (pull_request) Successful in 51s
Vulncheck / Vulncheck (pull_request) Successful in 1m1s
Build / Build Components (pull_request) Successful in 1m45s
Tests and linters / Lint (pull_request) Successful in 2m26s
Tests and linters / Run gofumpt (pull_request) Successful in 2m37s
Tests and linters / Tests (pull_request) Successful in 2m51s
Tests and linters / Staticcheck (pull_request) Successful in 3m13s
Tests and linters / Tests with -race (pull_request) Successful in 3m26s
Tests and linters / gopls check (pull_request) Successful in 4m13s
Pre-commit hooks / Pre-commit (pull_request) Successful in 1m13s
[#1648] writecache: Fix race condition when reporting cache size metrics
There is a race condition when multiple cache operation try to report
the cache size metrics simultaneously. Consider the following example:
- the initial total size of objects stored in the cache size is 2
- worker X deletes an object and reads the cache size, which is 1
- worker Y deletes an object and reads the cache size, which is 0
- worker Y reports the cache size it learnt, which is 0
- worker X reports the cache size it learnt, which is 1

As a result, the observed cache size is 1 (i. e. one object remains
in the cache), which is incorrect because the actual cache size is 0.

To fix this, a separate worker for reporting the cache size metric has
been created. All operations should use a queue (a buffered channel) to
request the reporter worker to report the metrics. Currently, all queue
writes are non-blocking.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
2025-02-18 13:39:39 +03:00

85 lines
1.9 KiB
Go

package writecache
import (
"time"
"git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard/mode"
)
type StorageType string
func (t StorageType) String() string {
return string(t)
}
const (
StorageTypeUndefined StorageType = "null"
StorageTypeDB StorageType = "db"
StorageTypeFSTree StorageType = "fstree"
)
type Metrics interface {
SetShardID(string)
Get(d time.Duration, success bool, st StorageType)
Delete(d time.Duration, success bool, st StorageType)
Put(d time.Duration, success bool, st StorageType)
Flush(success bool, st StorageType)
Evict(st StorageType)
SetEstimateSize(uint64)
SetMode(m mode.ComponentMode)
SetActualCounters(uint64)
SetPath(path string)
Close()
}
func DefaultMetrics() Metrics { return metricsStub{} }
type metricsStub struct{}
func (metricsStub) SetShardID(string) {}
func (metricsStub) SetPath(string) {}
func (metricsStub) Get(time.Duration, bool, StorageType) {}
func (metricsStub) Delete(time.Duration, bool, StorageType) {}
func (metricsStub) Put(time.Duration, bool, StorageType) {}
func (metricsStub) SetEstimateSize(uint64) {}
func (metricsStub) SetMode(mode.ComponentMode) {}
func (metricsStub) SetActualCounters(uint64) {}
func (metricsStub) Flush(bool, StorageType) {}
func (metricsStub) Evict(StorageType) {}
func (metricsStub) Close() {}
func (c *cache) startCacheSizeReporter() {
go func() {
defer close(c.sizeMetricsReporterStopped)
for range c.sizeMetricsReporterQueue {
count, size := c.counter.CountSize()
c.metrics.SetActualCounters(count)
c.metrics.SetEstimateSize(size)
}
}()
}
func (c *cache) stopCacheSizeReporter() {
if c.sizeMetricsReporterQueue == nil {
// Underlying storage was not initialized.
return
}
close(c.sizeMetricsReporterQueue)
<-c.sizeMetricsReporterStopped
c.sizeMetricsReporterQueue = nil
c.sizeMetricsReporterStopped = nil
}