Speedup metabase search #1685

New issue

Open

opened 2025-03-18 11:58:14 +00:00 by fyrchik · 0 comments

fyrchik commented

2025-03-18 11:58:14 +00:00

Owner

selectFastFilter is called like this:

for i := range group.fastFilters {
    db.selectFastFilter(tx, cnr, group.fastFilters[i], mAddr, i)
}

So, for each filter we fetch every object. The number of matched filters is then used to check whether all filters have matched. We might reorder the loops: iterate over objects and then over filters. It enables 3 further optimizations: (1) accumulate in mAddr only objects that match, (2) read each object exactly once, IO is expensive and (3) do not use mAddr at all, process each object immediately.

Cache buckets. I have explored this idea once, but for Get the complexity it has introduced wasn't justifiable. For Select the situation is better, tx.Bucket can be clearly seen in profile. The idea is to cache buckets that are accessed frequently, such as all objectStatus() buckets: primary, graveyard etc. Then all functions retrieve these buckets from cache. The lifetime of this cache is contained in the lifetime of the transaction.
To filter attributes, we do not need to unmarshal object. Just use https://github.com/VictoriaMetrics/easyproto to traverse byte slice without allocations and only unmarshal object on filter match.

After moving to the new metabase, container search by unindexed attributes scales linearly with the number of objects. While we cannot change the asymptotics, we can significantly reduce the constant factor. The work was already started in #1683. Other things to explore: 1. `selectFastFilter` is called like this: ```go for i := range group.fastFilters { db.selectFastFilter(tx, cnr, group.fastFilters[i], mAddr, i) } ``` So, for each filter we fetch every object. The number of matched filters is then used to check whether all filters have matched. We might reorder the loops: iterate over objects and then over filters. It enables 3 further optimizations: (1) accumulate in `mAddr` only objects that match, (2) read each object exactly once, IO is expensive and (3) do not use `mAddr` at all, process each object immediately. 2. Cache buckets. I have explored this idea once, but for `Get` the complexity it has introduced wasn't justifiable. For `Select` the situation is better, `tx.Bucket` can be clearly seen in profile. The idea is to cache buckets that are accessed frequently, such as all `objectStatus()` buckets: `primary`, `graveyard` etc. Then all functions retrieve these buckets from cache. The lifetime of this cache is contained in the lifetime of the transaction. 3. To filter attributes, we do not need to unmarshal object. Just use https://github.com/VictoriaMetrics/easyproto to traverse byte slice without allocations and only unmarshal object on filter match.