Fix blobovnicza cache #669

Merged
fyrchik merged 1 commit from dstepanov-yadro/frostfs-node:fix/open_cache_capacity into master 2024-09-04 19:51:03 +00:00

Closes #536

Fixed and simplified work with individual databases in blobovnicza. Now the database opens at the first request, and closes after all users have released ownership. Active databases (those that are being written to) are not closed until they are full. The cache is also the usual owner of the database.

Tested on dev-env with cache capacity = 1 and max size = 1KB.
After all the databases are filled, the metrics show that there are no open databases.
After the Get requests were executed, 1 database remained open for each shard.
image

Closes #536 Fixed and simplified work with individual databases in blobovnicza. Now the database opens at the first request, and closes after all users have released ownership. Active databases (those that are being written to) are not closed until they are full. The cache is also the usual owner of the database. Tested on dev-env with cache capacity = 1 and max size = 1KB. After all the databases are filled, the metrics show that there are no open databases. After the Get requests were executed, 1 database remained open for each shard. ![image](/attachments/6f5d4427-a5bc-439d-9f1f-28f08dccbb8b)
dstepanov-yadro force-pushed fix/open_cache_capacity from 3717c7e1c5 to ee2388a936 2023-08-30 21:37:58 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from ee2388a936 to b7a7318af0 2023-08-31 07:13:55 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 3fbc356e9d to 294fd35cb9 2023-08-31 08:50:53 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 294fd35cb9 to 0ed25be63d 2023-08-31 09:00:34 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 0ed25be63d to 1961150c47 2023-08-31 09:31:13 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 1961150c47 to 1683ca1d5f 2023-08-31 12:04:44 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 1683ca1d5f to 550bb40728 2023-08-31 14:23:39 +00:00 Compare
dstepanov-yadro reviewed 2023-08-31 14:33:23 +00:00
@ -99,3 +98,3 @@
}
// from now objects should not be saved
// blobovnizca accepts object event if full
Author
Member

Now the check for the fullness of the database is performed at the blobovnicza tree level.

Now the check for the fullness of the database is performed at the blobovnicza tree level.
dstepanov-yadro reviewed 2023-08-31 14:42:28 +00:00
@ -0,0 +91,4 @@
return nil, nil
}
blz, err := db.Open() //open db for usage, will be closed on activeDB.Close()
Author
Member

db 0 is active (refCount = 1)
goroutine 1: got and opened db 0 (refCount = 2)
goroutine 2: got db 0
goroutine 1: saved object to db 0, db 0 is full, closed (refCount = 1)
goroutine 3: replaced active db from db 0 to db 1, so db 0 was physically closed (refCount = 0)
goroutine 2: physically opens db 0 (refCount = 1)

To prevent this situation active db manager returns active db opened, so user has to close it.

db 0 is active (refCount = 1) goroutine 1: got and opened db 0 (refCount = 2) goroutine 2: got db 0 goroutine 1: saved object to db 0, db 0 is full, closed (refCount = 1) goroutine 3: replaced active db from db 0 to db 1, so db 0 was physically closed (refCount = 0) goroutine 2: physically opens db 0 (refCount = 1) To prevent this situation active db manager returns active db opened, so user has to close it.
dstepanov-yadro force-pushed fix/open_cache_capacity from 550bb40728 to a6aaa518d0 2023-08-31 14:49:31 +00:00 Compare
dstepanov-yadro reviewed 2023-08-31 14:51:30 +00:00
@ -0,0 +12,4 @@
"github.com/stretchr/testify/require"
)
func TestBlobovniczaTree_Concurrency(t *testing.T) {
Author
Member

Test from task.

Test from task.
dstepanov-yadro reviewed 2023-08-31 14:54:12 +00:00
@ -59,16 +60,10 @@ func (b *Blobovniczas) Delete(ctx context.Context, prm common.DeletePrm) (res co
return res, err
}
activeCache := make(map[string]struct{})
Author
Member

It was a protection in case the active database was changed during the iteration. Now changing the active database does not affect the opening of the database.

It was a protection in case the active database was changed during the iteration. Now changing the active database does not affect the opening of the database.
dstepanov-yadro reviewed 2023-08-31 14:55:29 +00:00
@ -7,6 +7,8 @@ import (
apistatus "git.frostfs.info/TrueCloudLab/frostfs-sdk-go/client/status"
)
var errClosed = logicerr.New("blobvnicza is closed")
Author
Member

Error is logical to not to count as shard error.

Error is logical to not to count as shard error.
dstepanov-yadro reviewed 2023-08-31 14:57:17 +00:00
@ -0,0 +196,4 @@
}
type openDBCounter struct {
cond *sync.Cond
Author
Member

Yes, it's not Go way. But it looks simpler than mutex-protected channel.

Yes, it's not Go way. But it looks simpler than mutex-protected channel.
dstepanov-yadro reviewed 2023-08-31 15:01:22 +00:00
@ -170,4 +86,0 @@
if ok {
if old != nil {
if active.ind == b.blzLeafWidth-1 {
return active, logicerr.New("no more Blobovniczas")
Author
Member

In general, this is an invalid statement. Imagine the situation: the width of the leaf level is 3, the current active database has index = 2, but all objects were deleted from the database with index = 0. So database with index = 0 is not full and can be next active database.

In general, this is an invalid statement. Imagine the situation: the width of the leaf level is 3, the current active database has index = 2, but all objects were deleted from the database with index = 0. So database with index = 0 is not full and can be next active database.
dstepanov-yadro force-pushed fix/open_cache_capacity from a6aaa518d0 to b592dbc49e 2023-08-31 15:11:16 +00:00 Compare
dstepanov-yadro reviewed 2023-08-31 15:14:36 +00:00
@ -0,0 +57,4 @@
func (c *dbCache) getExisted(path string) *sharedDB {
c.cacheGuard.Lock()
defer c.cacheGuard.Unlock()
Author
Member

cache.Get can rebuild cache, so exclusive lock.

`cache.Get` can rebuild cache, so exclusive lock.
dstepanov-yadro changed title from WIP: Fix blobovnicza cache to Fix blobovnicza cache 2023-08-31 15:28:31 +00:00
dstepanov-yadro requested review from storage-core-committers 2023-08-31 15:28:34 +00:00
dstepanov-yadro requested review from storage-core-developers 2023-08-31 15:28:40 +00:00
dstepanov-yadro force-pushed fix/open_cache_capacity from b592dbc49e to b7b77cee47 2023-09-01 06:15:47 +00:00 Compare
fyrchik reviewed 2023-09-01 06:39:17 +00:00
@ -0,0 +38,4 @@
func newActiveDBManager(dbManager *dbManager, leafWidth uint64) *activeDBManager {
return &activeDBManager{
levelToActiveDBGuard: &sync.RWMutex{},
levelToActiveDB: make(map[string]*sharedDB),
Owner

levelToActive or levelToShared?

`levelToActive` or `levelToShared`?
Author
Member

Active DB of type sharedDB.

Active DB of type `sharedDB`.
Owner

Ok, for some reason I've though shared/active is a pair of mutually exclusive roles.

Ok, for some reason I've though shared/active is a pair of mutually exclusive roles.
fyrchik marked this conversation as resolved
@ -0,0 +72,4 @@
defer m.levelToActiveDBGuard.Unlock()
for _, db := range m.levelToActiveDB {
db.Close()
Owner

Could you leave a comment about error handling here? We may found ourselves in a situation when the DB was not closed and later would not be opened because open FD leaked.

Could you leave a comment about error handling here? We may found ourselves in a situation when the DB was not closed and later would not be opened because open FD leaked.
Author
Member
It is existed behaviour: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/blobstor/blobovniczatree/blobovnicza.go#L111 When closing DB we just log error: see `sharedDB.Close()`
Owner

I don't argue, I think it is just something worth thinking about if you touch the code or investigate problems.

I don't argue, I think it is just something worth thinking about if you touch the code or investigate problems.
@ -0,0 +50,4 @@
require.NoError(t, err)
_, err = st.Get(context.Background(), common.GetPrm{Address: addr})
require.NoError(t, err) // fails very often, correlated to how many goroutines are started
Owner

Still fails?

Still fails?
Author
Member

Ups, comment removed

Ups, comment removed
fyrchik marked this conversation as resolved
@ -23,59 +21,37 @@ func (b *Blobovniczas) Open(readOnly bool) error {
func (b *Blobovniczas) Init() error {
b.log.Debug(logs.BlobovniczatreeInitializingBlobovniczas)
b.openManagers()
Owner

Why do we do this both in Open() and Init()?

Why do we do this both in `Open()` and `Init()`?
Author
Member

Fixed. Should only be in Open().

Fixed. Should only be in `Open()`.
fyrchik marked this conversation as resolved
@ -0,0 +78,4 @@
b.guard.Lock()
defer b.guard.Unlock()
if b.refCount == 0 {
Owner

What is the expected scenario for this to happen?

What is the expected scenario for this to happen?
Author
Member

Developer mistake.

Developer mistake.
Owner

It's worth some high-level logs then (I would panic, but it is not obvious why it couldn't happen)

It's worth some high-level logs then (I would panic, but it is not obvious why it couldn't happen)
Author
Member

Added error log

Added error log
fyrchik marked this conversation as resolved
@ -0,0 +111,4 @@
result := &levelDbManager{
databases: make([]*sharedDB, width),
}
var idx uint64
Owner

It is not used outside the loop, why not go with idx := 0?

It is not used outside the loop, why not go with `idx := 0`?
Author
Member

Fixed

Fixed
fyrchik marked this conversation as resolved
@ -0,0 +219,4 @@
c.cond.L.Lock()
defer c.cond.L.Unlock()
if c.count > 0 {
Owner

It there an expected situation when c.count == 0 here?

It there an expected situation when `c.count == 0` here?
Author
Member

Only if there is a developer error.

Only if there is a developer error.
fyrchik marked this conversation as resolved
fyrchik reviewed 2023-09-01 06:42:48 +00:00
@ -119,3 +62,1 @@
b.opened.Add(p, blz)
return blz, nil
func (b *Blobovniczas) openBlobovnicza(p string) *sharedDB {
Owner

Is the name and the comment still valid?

Is the name and the comment still valid?
Author
Member

Yes, they are still valid.

Yes, they are still valid.
Owner

Then why do we call Open() on the result at the callsites?

Then why do we call `Open()` on the result at the callsites?
Author
Member

Ok, then just return. Fixed.

Ok, then just return. Fixed.
dstepanov-yadro force-pushed fix/open_cache_capacity from b7b77cee47 to a537c653b4 2023-09-01 07:18:18 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from a537c653b4 to 996c73c34f 2023-09-01 07:26:34 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 996c73c34f to 7cd96089b9 2023-09-01 07:32:35 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from 7cd96089b9 to f12feb272d 2023-09-01 09:00:25 +00:00 Compare
dstepanov-yadro force-pushed fix/open_cache_capacity from f12feb272d to 9ee22c255a 2023-09-01 09:05:04 +00:00 Compare
acid-ant approved these changes 2023-09-01 09:59:31 +00:00
fyrchik approved these changes 2023-09-01 10:42:05 +00:00
dstepanov-yadro force-pushed fix/open_cache_capacity from 9ee22c255a to 7456c8556a 2023-09-01 10:53:19 +00:00 Compare
fyrchik merged commit 7456c8556a into master 2023-09-01 11:27:32 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-core-developers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#669
No description provided.