Fix blobovnicza cache #669
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#669
Loading…
Reference in a new issue
No description provided.
Delete branch "dstepanov-yadro/frostfs-node:fix/open_cache_capacity"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #536
Fixed and simplified work with individual databases in blobovnicza. Now the database opens at the first request, and closes after all users have released ownership. Active databases (those that are being written to) are not closed until they are full. The cache is also the usual owner of the database.
Tested on dev-env with cache capacity = 1 and max size = 1KB.
After all the databases are filled, the metrics show that there are no open databases.
After the Get requests were executed, 1 database remained open for each shard.
3717c7e1c5
toee2388a936
ee2388a936
tob7a7318af0
3fbc356e9d
to294fd35cb9
294fd35cb9
to0ed25be63d
0ed25be63d
to1961150c47
1961150c47
to1683ca1d5f
1683ca1d5f
to550bb40728
@ -99,3 +98,3 @@
}
// from now objects should not be saved
// blobovnizca accepts object event if full
Now the check for the fullness of the database is performed at the blobovnicza tree level.
@ -0,0 +91,4 @@
return nil, nil
}
blz, err := db.Open() //open db for usage, will be closed on activeDB.Close()
db 0 is active (refCount = 1)
goroutine 1: got and opened db 0 (refCount = 2)
goroutine 2: got db 0
goroutine 1: saved object to db 0, db 0 is full, closed (refCount = 1)
goroutine 3: replaced active db from db 0 to db 1, so db 0 was physically closed (refCount = 0)
goroutine 2: physically opens db 0 (refCount = 1)
To prevent this situation active db manager returns active db opened, so user has to close it.
550bb40728
toa6aaa518d0
@ -0,0 +12,4 @@
"github.com/stretchr/testify/require"
)
func TestBlobovniczaTree_Concurrency(t *testing.T) {
Test from task.
@ -59,16 +60,10 @@ func (b *Blobovniczas) Delete(ctx context.Context, prm common.DeletePrm) (res co
return res, err
}
activeCache := make(map[string]struct{})
It was a protection in case the active database was changed during the iteration. Now changing the active database does not affect the opening of the database.
@ -7,6 +7,8 @@ import (
apistatus "git.frostfs.info/TrueCloudLab/frostfs-sdk-go/client/status"
)
var errClosed = logicerr.New("blobvnicza is closed")
Error is logical to not to count as shard error.
@ -0,0 +196,4 @@
}
type openDBCounter struct {
cond *sync.Cond
Yes, it's not Go way. But it looks simpler than mutex-protected channel.
@ -170,4 +86,0 @@
if ok {
if old != nil {
if active.ind == b.blzLeafWidth-1 {
return active, logicerr.New("no more Blobovniczas")
In general, this is an invalid statement. Imagine the situation: the width of the leaf level is 3, the current active database has index = 2, but all objects were deleted from the database with index = 0. So database with index = 0 is not full and can be next active database.
a6aaa518d0
tob592dbc49e
@ -0,0 +57,4 @@
func (c *dbCache) getExisted(path string) *sharedDB {
c.cacheGuard.Lock()
defer c.cacheGuard.Unlock()
cache.Get
can rebuild cache, so exclusive lock.WIP: Fix blobovnicza cacheto Fix blobovnicza cacheb592dbc49e
tob7b77cee47
@ -0,0 +38,4 @@
func newActiveDBManager(dbManager *dbManager, leafWidth uint64) *activeDBManager {
return &activeDBManager{
levelToActiveDBGuard: &sync.RWMutex{},
levelToActiveDB: make(map[string]*sharedDB),
levelToActive
orlevelToShared
?Active DB of type
sharedDB
.Ok, for some reason I've though shared/active is a pair of mutually exclusive roles.
@ -0,0 +72,4 @@
defer m.levelToActiveDBGuard.Unlock()
for _, db := range m.levelToActiveDB {
db.Close()
Could you leave a comment about error handling here? We may found ourselves in a situation when the DB was not closed and later would not be opened because open FD leaked.
It is existed behaviour: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/blobstor/blobovniczatree/blobovnicza.go#L111
When closing DB we just log error: see
sharedDB.Close()
I don't argue, I think it is just something worth thinking about if you touch the code or investigate problems.
@ -0,0 +50,4 @@
require.NoError(t, err)
_, err = st.Get(context.Background(), common.GetPrm{Address: addr})
require.NoError(t, err) // fails very often, correlated to how many goroutines are started
Still fails?
Ups, comment removed
@ -23,59 +21,37 @@ func (b *Blobovniczas) Open(readOnly bool) error {
func (b *Blobovniczas) Init() error {
b.log.Debug(logs.BlobovniczatreeInitializingBlobovniczas)
b.openManagers()
Why do we do this both in
Open()
andInit()
?Fixed. Should only be in
Open()
.@ -0,0 +78,4 @@
b.guard.Lock()
defer b.guard.Unlock()
if b.refCount == 0 {
What is the expected scenario for this to happen?
Developer mistake.
It's worth some high-level logs then (I would panic, but it is not obvious why it couldn't happen)
Added error log
@ -0,0 +111,4 @@
result := &levelDbManager{
databases: make([]*sharedDB, width),
}
var idx uint64
It is not used outside the loop, why not go with
idx := 0
?Fixed
@ -0,0 +219,4 @@
c.cond.L.Lock()
defer c.cond.L.Unlock()
if c.count > 0 {
It there an expected situation when
c.count == 0
here?Only if there is a developer error.
@ -119,3 +62,1 @@
b.opened.Add(p, blz)
return blz, nil
func (b *Blobovniczas) openBlobovnicza(p string) *sharedDB {
Is the name and the comment still valid?
Yes, they are still valid.
Then why do we call
Open()
on the result at the callsites?Ok, then just return. Fixed.
b7b77cee47
toa537c653b4
a537c653b4
to996c73c34f
996c73c34f
to7cd96089b9
7cd96089b9
tof12feb272d
f12feb272d
to9ee22c255a
9ee22c255a
to7456c8556a