[#1248] node: Do not update cache twice #30

Merged
carpawell merged 1 commit from carpawell/fix/multiple-cache-update-requests-FROST into master 2023-04-21 09:45:55 +00:00
2 changed files with 56 additions and 9 deletions

View file

@ -69,6 +69,7 @@ Changelog for FrostFS Node
- Non-alphabet nodes do not try to handle alphabet events (#181)
- Failing SN and IR transactions because of incorrect scopes (#2230, #2263)
- Global scope used for some transactions (#2230, #2263)
- Concurrent morph cache misses (#30)
### Removed
### Updated

View file

@ -24,9 +24,19 @@ type valueWithTime[V any] struct {
e error
}
// valueInProgress is a struct that contains
// values that are being fetched/updated.
type valueInProgress[V any] struct {
m sync.RWMutex
v V
fyrchik commented 2023-01-27 08:25:38 +00:00 (Migrated from github.com)
Outdated
Review

We already have valueWithTime which can be extended. Can we avoid having a map and add a mutex to valueWithTime instead?

We already have `valueWithTime` which can be extended. Can we avoid having a map and add a mutex to `valueWithTime` instead?
carpawell commented 2023-02-20 12:28:42 +00:00 (Migrated from github.com)
Outdated
Review

what if multiple routines come and request the same value that does not exist yet?

what if multiple routines come and request the same value that does not exist yet?
dstepanov-yadro commented 2023-03-01 08:21:44 +00:00 (Migrated from github.com)
Outdated
Review

I think it's about something like this:
map[key]{*value, keyMutex}

First get object from this map with global mutex (it can be optimized by using hashed key and shards), then lock keyMutex to get value.

I think it's about something like this: map[key]{*value, keyMutex} First get object from this map with global mutex (it can be optimized by using hashed key and shards), then lock keyMutex to get value.
e error
ale64bit marked this conversation as resolved Outdated

not sure if it's a common practice, but I normally see the mutex declared right before the thing it guards.

not sure if it's a common practice, but I normally see the mutex declared right before the thing it guards.

do you mean field order? (if yes, fixed)

do you mean field order? (if yes, fixed)
}
// entity that provides TTL cache interface.
type ttlNetCache[K comparable, V any] struct {
ttl time.Duration
m sync.RWMutex // protects progMap
progMap map[K]*valueInProgress[V] // contains fetch-in-progress keys

is it simpler to use https://pkg.go.dev/sync#Map and avoid all the mutexes?

is it simpler to use https://pkg.go.dev/sync#Map and avoid all the mutexes?

The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content.

> The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content.

Sure, but this usage seems to fall into the description on the next paragraph after the one you are quoting, isn't? And type safety is a non-concern given that the parent type is polymorphic.

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.

Sure, but this usage seems to fall into the description on the next paragraph after the one you are quoting, isn't? And type safety is a non-concern given that the parent type is polymorphic. > > The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.

did not think about it cause i've never used it and we do not have it in our repos. added an exapmle (separate commit), may be squashed or dropped @fyrchik @acid-ant @aarifullin

did not think about it cause i've never used it and we do not have it in our repos. added an exapmle (separate commit), may be squashed or dropped @fyrchik @acid-ant @aarifullin

(1) when the entry for a given key is only ever written once but read many times, as in caches that only grow

There are delete calls exists.

(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.

This point is closer to how this cache will be used, but does not fully correspond to it.

> (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow There are `delete` calls exists. > (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. This point is closer to how this cache will be used, but does not fully correspond to it.

ok, dropped

ok, dropped
ttl time.Duration
sz int
@ -41,32 +51,68 @@ func newNetworkTTLCache[K comparable, V any](sz int, ttl time.Duration, netRdr n
fatalOnErr(err)
return &ttlNetCache[K, V]{
ttl: ttl,
sz: sz,
cache: cache,
netRdr: netRdr,
ttl: ttl,
sz: sz,
cache: cache,
netRdr: netRdr,
progMap: make(map[K]*valueInProgress[V]),
}
}
func waitForUpdate[V any](vip *valueInProgress[V]) (V, error) {

can we replace any with generic type?

can we replace ```any``` with generic type?

hm, yeah, actualValue is useless with generic, in fact. dropped

hm, yeah, `actualValue` is useless with generic, in fact. dropped
vip.m.RLock()
defer vip.m.RUnlock()
return vip.v, vip.e
}
// reads value by the key.
//
// updates the value from the network on cache miss or by TTL.

key is redundant

key is redundant

Also, why is it a method of ttlNetCache?

Also, why is it a method of `ttlNetCache`?

used to be used, dropped

used to be used, dropped
//
// returned value should not be modified.
func (c *ttlNetCache[K, V]) get(key K) (V, error) {
val, ok := c.cache.Peek(key)
valWithTime, ok := c.cache.Peek(key)
if ok {
if time.Since(val.t) < c.ttl {
return val.v, val.e
if time.Since(valWithTime.t) < c.ttl {
return valWithTime.v, valWithTime.e
}
c.cache.Remove(key)
}

if you happen to keep this version, it probably should be

val, ok := c.progMap.LoadOrStore(key, &valueInProgress[V]{})
val.m.Lock()
defer val.m.Unlock()
if ok {
(...)
if you happen to keep this version, it probably should be ``` val, ok := c.progMap.LoadOrStore(key, &valueInProgress[V]{}) val.m.Lock() defer val.m.Unlock() if ok { (...) ```

well, it seems like it is wrong to me: the whole PR is about the restricting simultaneous network cache updates so we need an "updating" routine to fill map with already locked mutex, otherwise, other routines are able to get it and RLock its mutex despite the fact that the value is not fetched yet

so i think it should be either LoadOrStore with already held mutex or Load + Store if we dont want to have useless mutexes

well, it seems like it is wrong to me: the whole PR is about the restricting simultaneous network cache updates so we _need_ an "updating" routine to fill map with already locked mutex, otherwise, other routines are able to get it and `RLock` its mutex despite the fact that the value is not fetched yet so i think it should be either `LoadOrStore` with already held mutex or `Load` + `Store` if we dont want to have useless mutexes

I see. But what I had in mind is that if you are calling LoadOrStore and there happened to be a value already (locked or otherwise), the mutex you just created and locked is never actually unlocked, is it? (the one from the fresh val that didn't happen to be stored in the map).

I see. But what I had in mind is that if you are calling `LoadOrStore` and there happened to be a value already (locked or otherwise), the mutex you just created and locked is never actually unlocked, is it? (the one from the fresh `val` that didn't happen to be stored in the map).
v, err := c.netRdr(key)
c.m.RLock()
valInProg, ok := c.progMap[key]
c.m.RUnlock()
if ok {
return waitForUpdate(valInProg)
}
c.m.Lock()
valInProg, ok = c.progMap[key]
if ok {
c.m.Unlock()
return waitForUpdate(valInProg)
}
valInProg = &valueInProgress[V]{}
valInProg.m.Lock()
c.progMap[key] = valInProg
c.m.Unlock()
v, err := c.netRdr(key)
c.set(key, v, err)
Review

Do we still need this line?

Do we still need this line?
Review

yes, that's a cache still. why did you doubt? got the value (not concurrently with that PR) -- updated the cache

yes, that's a cache still. why did you doubt? got the value (not concurrently with that PR) -- updated the cache
Review

Because below we explicitly set valInProg values

Because below we explicitly set `valInProg` values
Review

below we explicitly set valInProg values

it is for a concurrent routines (that wait when network job is done) only. all the other will take a val directly from the cache cause no cache miss will be faced after an update (well, for 30s only but anyway)

> below we explicitly set valInProg values it is for a concurrent routines (that wait when network job is done) only. all the other will take a val directly from the cache cause no cache miss will be faced after an update (well, for 30s only but anyway)
valInProg.v = v
valInProg.e = err
valInProg.m.Unlock()
c.m.Lock()
delete(c.progMap, key)
fyrchik marked this conversation as resolved Outdated

Why do we unconditionally delete something from the map we have just written to?

Why do we unconditionally delete something from the map we have just written to?

if we are here (that line), we are the routine (the only one per a value) that failed to get a val from the cache and decided to fetch if from the network. when the work is done, we remove record from "the scheduler" (progMap) cause it is finished

if we are here (that line), we are the routine (the only one per a value) that failed to get a val from the cache and decided to fetch if from the network. when the work is done, we remove record from "the scheduler" (`progMap`) cause it is finished
c.m.Unlock()
return v, err
}