Speedup metabase search #1685

Open
opened 2025-03-18 11:58:14 +00:00 by fyrchik · 0 comments
Owner

After moving to the new metabase, container search by unindexed attributes scales linearly with the number of objects. While we cannot change the asymptotics, we can significantly reduce the constant factor. The work was already started in #1683. Other things to explore:

  1. selectFastFilter is called like this:
for i := range group.fastFilters {
    db.selectFastFilter(tx, cnr, group.fastFilters[i], mAddr, i)
}

So, for each filter we fetch every object. The number of matched filters is then used to check whether all filters have matched. We might reorder the loops: iterate over objects and then over filters. It enables 3 further optimizations: (1) accumulate in mAddr only objects that match, (2) read each object exactly once, IO is expensive and (3) do not use mAddr at all, process each object immediately.

  1. Cache buckets. I have explored this idea once, but for Get the complexity it has introduced wasn't justifiable. For Select the situation is better, tx.Bucket can be clearly seen in profile. The idea is to cache buckets that are accessed frequently, such as all objectStatus() buckets: primary, graveyard etc. Then all functions retrieve these buckets from cache. The lifetime of this cache is contained in the lifetime of the transaction.

  2. To filter attributes, we do not need to unmarshal object. Just use https://github.com/VictoriaMetrics/easyproto to traverse byte slice without allocations and only unmarshal object on filter match.

After moving to the new metabase, container search by unindexed attributes scales linearly with the number of objects. While we cannot change the asymptotics, we can significantly reduce the constant factor. The work was already started in #1683. Other things to explore: 1. `selectFastFilter` is called like this: ```go for i := range group.fastFilters { db.selectFastFilter(tx, cnr, group.fastFilters[i], mAddr, i) } ``` So, for each filter we fetch every object. The number of matched filters is then used to check whether all filters have matched. We might reorder the loops: iterate over objects and then over filters. It enables 3 further optimizations: (1) accumulate in `mAddr` only objects that match, (2) read each object exactly once, IO is expensive and (3) do not use `mAddr` at all, process each object immediately. 2. Cache buckets. I have explored this idea once, but for `Get` the complexity it has introduced wasn't justifiable. For `Select` the situation is better, `tx.Bucket` can be clearly seen in profile. The idea is to cache buckets that are accessed frequently, such as all `objectStatus()` buckets: `primary`, `graveyard` etc. Then all functions retrieve these buckets from cache. The lifetime of this cache is contained in the lifetime of the transaction. 3. To filter attributes, we do not need to unmarshal object. Just use https://github.com/VictoriaMetrics/easyproto to traverse byte slice without allocations and only unmarshal object on filter match.
fyrchik added the
frostfs-node
refactoring
perfomance
discussion
labels 2025-03-18 11:59:02 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1685
No description provided.