Index attributes for non-S3 containers #1412

Merged
fyrchik merged 1 commit from dstepanov-yadro/frostfs-node:fix/metabase_index_some_attr into master 2024-10-26 11:30:27 +00:00

Changes:

  • metabase.Put stores indexes for FilePath and S3-Access-Box-CRDT-Name user attributes for non S3 containers (containers with .s3-location-constraint attribute assumed as S3 containers)
  • metabase.Select uses those indexes for select for non S3 containers
  • evacuation checks container type when moves object to other local shard
  • metabase resync checks container type when saves object to metabase
  • frostfs-adm metabase upgrade connects to blockchain to check container type on metabase upgrade
  • frostfs-adm metabase upgrade works now only with config files, no --path argument could be specified as it requires also morph connection, which leads to a lot of arguments, and this is hard to maintain
  • unrelated metabase migration steps run concurrently
  • metabase upgrade checks if user attributes bucket and subbuckets are empty and drops their if yes: for containers deleted without deletion info (before support/v0.38 branch) a lot of junk could be stored in metabase
  • metabase.Delete deletes user attribute containers if they are empty

Perf tests:
image

About last 2 results: with FilePath attribute size of metabase is larger, so compaction performs longer

Changes: - `metabase.Put` stores indexes for `FilePath` and `S3-Access-Box-CRDT-Name` user attributes for non S3 containers (containers with `.s3-location-constraint` attribute assumed as S3 containers) - `metabase.Select` uses those indexes for select for non S3 containers - evacuation checks container type when moves object to other local shard - metabase resync checks container type when saves object to metabase - `frostfs-adm metabase upgrade` connects to blockchain to check container type on metabase upgrade - `frostfs-adm metabase upgrade` works now only with config files, no `--path` argument could be specified as it requires also morph connection, which leads to a lot of arguments, and this is hard to maintain - unrelated metabase migration steps run concurrently - metabase upgrade checks if user attributes bucket and subbuckets are empty and drops their if yes: for containers deleted without deletion info (before `support/v0.38` branch) a lot of junk could be stored in metabase - `metabase.Delete` deletes user attribute containers if they are empty Perf tests: ![image](/attachments/7858d615-47ad-4122-ad4c-1caedd61e2b8) About last 2 results: with FilePath attribute size of metabase is larger, so compaction performs longer
dstepanov-yadro added 7 commits 2024-10-03 08:23:57 +00:00
Use fields instead of methods.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
For non S3 containers it is expected to use attributes index for some
attributes.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
[#9999] adm: Resolve container type by metabase upgrade
Some checks failed
DCO action / DCO (pull_request) Successful in 1m29s
Tests and linters / Run gofumpt (pull_request) Successful in 1m32s
Pre-commit hooks / Pre-commit (pull_request) Successful in 2m10s
Vulncheck / Vulncheck (pull_request) Successful in 2m6s
Tests and linters / Staticcheck (pull_request) Failing after 2m18s
Tests and linters / Lint (pull_request) Failing after 2m24s
Build / Build Components (pull_request) Successful in 2m29s
Tests and linters / gopls check (pull_request) Successful in 2m42s
Tests and linters / Tests (pull_request) Successful in 4m19s
Tests and linters / Tests with -race (pull_request) Successful in 6m11s
2e91ec7e9d
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 2e91ec7e9d to b2d143aac6 2024-10-03 14:10:47 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from b2d143aac6 to a9bf076a7e 2024-10-03 14:12:53 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from a9bf076a7e to 0c014166e6 2024-10-03 14:16:18 +00:00 Compare
dstepanov-yadro reviewed 2024-10-03 14:23:46 +00:00
@ -16,4 +22,3 @@
)
const (
pathFlag = "path"
Author
Member

Now metabase upgrade requires morph connection to resolve container. It is required to specify a lot of flags/args to pass required morph parameters. So now for metabase paths and morph endpoints only config is used.

Now metabase upgrade requires morph connection to resolve container. It is required to specify a lot of flags/args to pass required morph parameters. So now for metabase paths and morph endpoints only config is used.
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 0c014166e6 to 896b667d83 2024-10-03 14:25:23 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 896b667d83 to 002b2d1cc3 2024-10-03 14:26:56 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 002b2d1cc3 to dea227c872 2024-10-03 14:28:59 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from dea227c872 to 4716edc6a4 2024-10-03 14:29:50 +00:00 Compare
dstepanov-yadro reviewed 2024-10-03 14:33:46 +00:00
@ -36,0 +41,4 @@
}
}
func testSelect2(t *testing.T, db *meta.DB, cnr cid.ID, fs objectSDK.SearchFilters, useAttrIndex bool, exp ...oid.Address) {
Author
Member

No imagination, sorry.

No imagination, sorry.
dstepanov-yadro reviewed 2024-10-03 14:35:37 +00:00
@ -418,0 +439,4 @@
return nil
}
if err := bkt.DeleteBucket(item.key); err != nil {
Author
Member

Differs from v2: bucket is now deleted if there are no more entries in it.

Differs from v2: bucket is now deleted if there are no more entries in it.
dstepanov-yadro reviewed 2024-10-03 14:38:23 +00:00
@ -384,0 +423,4 @@
// user specified attributes
for i := range attrs {
if _, index := indexedAttributes[attrs[i].Key()]; !index {
continue
Author
Member

Delete related: since the version of the metabase has been changed, there should be no indexes for other attributes.

`Delete` related: since the version of the metabase has been changed, there should be no indexes for other attributes.
dstepanov-yadro reviewed 2024-10-03 14:39:52 +00:00
@ -259,6 +279,81 @@ func bucketNamesForType(cnr cid.ID, mType objectSDK.SearchMatchType, typeVal str
return
}
func (db *DB) selectFromFKBT(
Author
Member

The same code as from support/v0.42

The same code as from `support/v0.42`
dstepanov-yadro added 2 commits 2024-10-04 07:50:13 +00:00
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
[#1412] metabase: Drop empty user attribute buckets on upgrade
All checks were successful
Tests and linters / Run gofumpt (pull_request) Successful in 1m26s
DCO action / DCO (pull_request) Successful in 1m37s
Vulncheck / Vulncheck (pull_request) Successful in 1m59s
Pre-commit hooks / Pre-commit (pull_request) Successful in 2m12s
Build / Build Components (pull_request) Successful in 2m26s
Tests and linters / gopls check (pull_request) Successful in 2m40s
Tests and linters / Tests (pull_request) Successful in 2m50s
Tests and linters / Staticcheck (pull_request) Successful in 3m3s
Tests and linters / Lint (pull_request) Successful in 3m25s
Tests and linters / Tests with -race (pull_request) Successful in 6m20s
2650d85186
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 2650d85186 to eac1f93c97 2024-10-04 10:59:20 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from eac1f93c97 to 6b5b44dc98 2024-10-04 11:14:38 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 6b5b44dc98 to 4519846f99 2024-10-04 11:23:45 +00:00 Compare
dstepanov-yadro requested review from storage-core-committers 2024-10-04 11:56:04 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-10-04 11:56:05 +00:00
dstepanov-yadro changed title from WIP: Index attributes for non-S3 containers to Index attributes for non-S3 containers 2024-10-04 11:56:11 +00:00
fyrchik reviewed 2024-10-04 12:45:33 +00:00
fyrchik left a comment
Owner

Honestly, I do not like the idea of carrying some bool flag together with the object through the whole data-path, the metabase is leaking all the way up to storage service.
I'd like to discuss possible alternatives.

I would rather have pkg/core/object.ObjectWIthIndex struct where such info could be stored (where we can convert between object and objectwithindex, but the latter always contains info about s3, being impossible to construct otherwise).

Another option is to have Put and PutIndexed(obj) (the latter may contain a list of attributes), leaving all the branching as high as possible.

My last suggestion is to have interface in the metabase itself. The dificulty here is to ensure we make no network requests during a transaction. It should be possible with some sort of pinning in our LRU cache and ensuring caller always fetches the container. This is the smallest solution in terms of files changed.

If there is no favourable solution, let's stick with the one in the PR.

My another concern is Select -- it seems unnecessary to provide an option there, because on Put we can save all necessary info about the container in the metabase. And if there are no objects, we can just return an empty result anyway. Am I missing something?

Honestly, I do not like the idea of carrying some bool flag together with the object through the whole data-path, the metabase is leaking all the way up to storage service. I'd like to discuss possible alternatives. I would rather have `pkg/core/object.ObjectWIthIndex` struct where such info could be stored (where we can convert between object and objectwithindex, but the latter always contains info about s3, being impossible to construct otherwise). Another option is to have `Put` and `PutIndexed(obj)` (the latter may contain a list of attributes), leaving all the branching as high as possible. My last suggestion is to have interface in the metabase itself. The dificulty here is to ensure we make no network requests during a transaction. It should be possible with some sort of pinning in our LRU cache and ensuring caller always fetches the container. This is the smallest solution in terms of files changed. If there is no favourable solution, let's stick with the one in the PR. My another concern is `Select` -- it seems unnecessary to provide an option there, because on Put we can save all necessary info about the container in the metabase. And if there are no objects, we can just return an empty result anyway. Am I missing something?
@ -384,0 +422,4 @@
// user specified attributes
for i := range attrs {
if _, index := indexedAttributes[attrs[i].Key()]; !index {
Owner

use IsAttributeIndexable?

use `IsAttributeIndexable`?
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -218,7 +223,13 @@ func (db *DB) selectFastFilter(
selectAllFromBucket(tx, primaryBucketName(cnr, bucketName), to, fNum)
selectAllFromBucket(tx, tombstoneBucketName(cnr, bucketName), to, fNum)
selectAllFromBucket(tx, bucketNameLockers(cnr, bucketName), to, fNum)
default:
Owner

How come we had no code here before, where is the attribute search?

How come we had no code here before, where is the attribute search?
Author
Member

In slow filters

In slow filters
fyrchik marked this conversation as resolved
@ -262,0 +288,4 @@
) { //
matchFunc, ok := db.matchers[f.Operation()]
if !ok {
db.log.Debug(logs.MetabaseMissingMatcher, zap.Uint32("operation", uint32(f.Operation())))
Owner

Doesn't f.Operation() have a string method?

Doesn't `f.Operation()` have a string method?
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -512,3 +607,3 @@
res.fastFilters = append(res.fastFilters, filters[i])
default:
res.slowFilters = append(res.slowFilters, filters[i])
if _, indexed := indexedAttributes[filters[i].Header()]; useAttributeIndex && indexed {
Owner

use IsIndexedAttribute?

use `IsIndexedAttribute`?
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -170,6 +169,28 @@ func smallBucketName(cnr cid.ID, key []byte) []byte {
return bucketName(cnr, smallPrefix, key)
}
// attributeBucketName returns <CID>_attr_<attributeKey>.
Owner

This comment seems wrong.

This comment seems wrong.
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -275,6 +275,27 @@ func (s *Shard) refillObject(ctx context.Context, data []byte, addr oid.Address,
return nil
}
var hasIndexableAttribute bool
Owner

How about helpers from slices package?
hasIndexableAttribute = slices.IndexFunc(obj.Attributes(), func(a) { meta.IsAttributeIndexable(a.Key()) }) > 0

How about helpers from `slices` package? `hasIndexableAttribute = slices.IndexFunc(obj.Attributes(), func(a) { meta.IsAttributeIndexable(a.Key()) }) > 0`
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -26,3 +27,3 @@
// SetContainerID is a Select option to set the container id to search in.
func (p *SelectPrm) SetContainerID(cnr cid.ID) {
func (p *SelectPrm) SetContainerID(cnr cid.ID, isS3Container bool) {
Owner

It is a bit strange to see S3 on the shard level, how about isIndexedContainer?

It is a bit strange to see `S3` on the shard level, how about `isIndexedContainer`?
Author
Member

ok, fixed

ok, fixed
fyrchik marked this conversation as resolved
@ -28,3 +29,3 @@
var errInvalidECPlacement = errors.New("invalid EC placement: EC placement must have one placement vector with at least one node")
func (p *Policer) processECContainerObject(ctx context.Context, objInfo objectcore.Info, policy netmap.PlacementPolicy) error {
func (p *Policer) processECContainerObject(ctx context.Context, objInfo objectcore.Info, policy netmap.PlacementPolicy, cnr containerSDK.Container) error {
Owner

If we pass the container, we can omit placement policy to reduce argument number and avoid duplication: policy == cnr.PlacementPolicy().

If we pass the container, we can omit placement policy to reduce argument number and avoid duplication: `policy == cnr.PlacementPolicy()`.
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -453,3 +454,3 @@
}
func (r *testReplicator) HandleLocalPutTask(ctx context.Context, task replicator.Task) {
func (r *testReplicator) HandleLocalPutTask(ctx context.Context, task replicator.Task, cnr containerSDK.Container) {
Owner

Why have you decided not to make it a part of replicator.Task?

Why have you decided not to make it a part of `replicator.Task`?
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
Author
Member

My another concern is Select -- it seems unnecessary to provide an option there, because on Put we can save all necessary info about the container in the metabase. And if there are no objects, we can just return an empty result anyway. Am I missing something?

I don't think that Search is the large used RPC call, so having one more network request about container is not so bad in comparison with storing addition info for each container in metabase, which is used for every user request.

> My another concern is `Select` -- it seems unnecessary to provide an option there, because on Put we can save all necessary info about the container in the metabase. And if there are no objects, we can just return an empty result anyway. Am I missing something? I don't think that `Search` is the large used RPC call, so having one more network request about container is not so bad in comparison with storing addition info for each container in metabase, which is used for every user request.
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 4519846f99 to 91a9480733 2024-10-04 14:22:32 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 91a9480733 to 05c1fe8d1a 2024-10-04 14:33:01 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 05c1fe8d1a to 1b9f5fb7eb 2024-10-04 14:42:19 +00:00 Compare
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 1b9f5fb7eb to d491494295 2024-10-07 06:49:55 +00:00 Compare
fyrchik reviewed 2024-10-07 12:22:12 +00:00
@ -262,0 +288,4 @@
) { //
matchFunc, ok := db.matchers[f.Operation()]
if !ok {
db.log.Debug(logs.MetabaseMissingMatcher, zap.Stringer("operation", f.Operation()))
Owner

Oh, we should not log inside the transaction, it can easily hang the db.
Do we do it anywhere else currently?
If no, let's remove.

Oh, we should not log inside the transaction, it can easily hang the db. Do we do it anywhere else currently? If no, let's remove.
Author
Member

Done.

Done.
dstepanov-yadro force-pushed fix/metabase_index_some_attr from d491494295 to 44f7bccb1e 2024-10-07 14:20:24 +00:00 Compare
fyrchik approved these changes 2024-10-08 08:15:47 +00:00
@ -1417,0 +1423,4 @@
return container.NewInfoProvider(func() (container.Source, error) {
// threadsafe: called on init or on sighup when morph initialized
if c.cfgMorph.client == nil {
initMorphComponents(ctx, c)
Owner

Do we call containerClient.NewFromMorph on each invocation instead of once?

Do we call `containerClient.NewFromMorph` on each invocation instead of once?
Author
Member

No: container.InfoProvider uses sync.Once to create container.Source

No: `container.InfoProvider` uses `sync.Once` to create `container.Source`
fyrchik marked this conversation as resolved
@ -0,0 +36,4 @@
func NewInfoProvider(sourceFactory func() (Source, error)) InfoProvider {
return &infoProvider{
mtx: &sync.RWMutex{},
cache: make(map[cid.ID]infoValue),
Owner

Does this cache have unrestricted size?

Does this cache have unrestricted size?
Author
Member

Yes: for 10 000 containers this requires only (32 bytes + 4 bytes) * 10 000 = 360 KB of memory, so cache capacity leads only for complexity. Adding TTL is also redundant, because the only case for TTL is if container was deleted, but this leads only to some extra IO operations, not to errors.

Yes: for 10 000 containers this requires only (32 bytes + 4 bytes) * 10 000 = 360 KB of memory, so cache capacity leads only for complexity. Adding TTL is also redundant, because the only case for TTL is if container was deleted, but this leads only to some extra IO operations, not to errors.
fyrchik marked this conversation as resolved
dstepanov-yadro force-pushed fix/metabase_index_some_attr from 44f7bccb1e to c065d55ca3 2024-10-08 08:47:38 +00:00 Compare
fyrchik merged commit c065d55ca3 into master 2024-10-08 11:22:37 +00:00
fyrchik referenced this pull request from a commit 2024-10-08 11:22:39 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-core-developers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1412
No description provided.