metabase: Add upgrade from v2 to v3 #1334
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1334
Loading…
Reference in a new issue
No description provided.
Delete branch "dstepanov-yadro/frostfs-node:feat/metabase_upgrade"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Add upgrade from v2 to v3 metabase.
Tested on metabase file generated with
TestGenerateMetabaseFile
onsupport/v0.42
branch.Metabase file size before upgrade 2.7 GB, after - 1.5 GB (4 million objects). Elapsed time on my laptop - 30 sec.
frostfs-adm output examples:
70db0b800c
to41e4dbbf14
41e4dbbf14
to47dc36fb44
WIP: metabase: Add upgrade from v2 to v3to metabase: Add upgrade from v2 to v347dc36fb44
toc159736e62
c159736e62
to60ae1335fa
60ae1335fa
to3c2a572694
@ -0,0 +47,4 @@
}
updater, found := updates[version]
if !found {
return fmt.Errorf("unsupported version %d: no update available", version)
Why we need to fail here, maybe just print debug message that update is not necessary?
Because this means that database schema is not supported by code. For example, metabase v1 will not work with v0.42 frostfs-node version.
@ -0,0 +68,4 @@
return fmt.Errorf("can't open new metabase to compact: %w", err)
}
if err := bbolt.Compact(dst, db.boltDB, 256<<20); err != nil {
return fmt.Errorf("failed to compact metabase: %w", errors.Join(err, dst.Close(), os.Remove(tmpFileName)))
Looks like we need to remove all files which matches pattern
db.info.Path + "." + "*"
, not onlytmpFileName
.Or how about to do cleanup before starting update?
I suppose to not to delete files by pattern to not to delete something needed (manual backup for example)
That will lead to increasing garbage inside
db.info.Path
when node killed.3c2a572694
toed8993fc7e
ed8993fc7e
to7a36729dcd
7a36729dcd
to2f8997a6d3
metabase: Add upgrade from v2 to v3to WIP: metabase: Add upgrade from v2 to v32f8997a6d3
to17c8967057
17c8967057
to2d61c61bd7
2d61c61bd7
tofde016db0b
WIP: metabase: Add upgrade from v2 to v3to metabase: Add upgrade from v2 to v3fde016db0b
to4f85d7857e
Also, about idempotence and partial upgrades.
If the metabase is left in the inconsistent state, do we report to the user about unfinished upgrade?
@ -0,0 +26,4 @@
path, _ := cmd.Flags().GetString(pathFlag)
noCompact, _ := cmd.Flags().GetBool(noCompactFlag)
if _, err := os.Stat(path); err != nil {
return fmt.Errorf("failed to check file: %w", err)
What did you want to check?
Doesn't
meta.Upgrade
return error on missing file?moved to
meta.Upgrade
Ah, I understand now.
If it doesn't exist, then Upgrade will create a new file, because we open with RW and this is not what we want, right?
Exactly!
@ -189,6 +189,7 @@ The following table describes configuration for each shard.
| `mode` | `string` | `read-write` | Shard Mode.<br/>Possible values: `read-write`, `read-only`, `degraded`, `degraded-read-only`, `disabled` |
| `resync_metabase` | `bool` | `false` | Flag to enable metabase resync on start. |
| `resync_metabase_worker_count` | `int` | `1000` | Count of concurrent workers to resync metabase. |
| `skip_metabase_compact_on_upgrade` | `bool` | `false` | If `true` then metabase will not be compacted on upgrade. |
This is such a rarely used flag, that I do not know whether it makes any sense.
What is the tradeoff we get while changing it?
Outdated
@ -0,0 +19,4 @@
)
const (
logEveryCount = 50_000
logFrequency
?ok, done
@ -0,0 +30,4 @@
func Upgrade(ctx context.Context, path string, compact bool, log func(msg string)) error {
db, err := bbolt.Open(path, os.ModePerm, bbolt.DefaultOptions)
if err != nil {
return fmt.Errorf("failed to open metabase: %w", err)
It will be
failet to .. : failed to .. : failed to ..
How about
open metabase: %w
?ok, fixed
@ -0,0 +71,4 @@
if err != nil {
return fmt.Errorf("can't open new metabase to compact: %w", err)
}
if err := bbolt.Compact(dst, db, 256<<20); err != nil {
Magic constant.
fixed
@ -0,0 +130,4 @@
case <-ctx.Done():
return ctx.Err()
case obj := <-objects:
return db.Update(func(tx *bbolt.Tx) error {
Why parallelize but use
Update
instead ofBatch
?Outdated
@ -0,0 +137,4 @@
if err := db.Batch(func(tx *bbolt.Tx) error {
if err := putUniqueIndexItem(tx, namedBucketItem{
name: expEpochToObjectBucketName,
key: expirationEpochKey(obj.expirationEpoch, obj.containerID, obj.objectID),
Do we always
put
by some new key?Otherwise we might not behave idempotently, this is a must for upgrade.
For same object
expirationEpochKey
will produce the same key. Why this leads to idempotence lost?@ -0,0 +164,4 @@
log("expiration epoch buckets completed completed with error: " + err.Error())
return err
}
log(fmt.Sprintf("filling expiration epoch buckets completed successfully, total %d objects", count.Load()))
Can log have signature similar to
fmt.Println
to avoidfmt.Sprintf
on callsites?done
@ -0,0 +225,4 @@
continue
}
attributeKey := string(attrKey[1+cidSize:])
if attributeKey != objectV2.SysAttributeExpEpochNeoFS && attributeKey != objectV2.SysAttributeExpEpoch {
If we need only 2 attributes, we can have 2 separate
Seek
loops.It seems we currently are iterating over all attributes.
How? User attributes buckets have names with pattern
{prefix}{cid}{attribute}
.Oh, I see the problem -- we need to iterate specific attribute, but in all containers.
Still possible with seeks (remember current seed and seek for each container), but your implementation is more readable.
@ -0,0 +279,4 @@
if err := objectID.Decode(attrKeyValueItem); err != nil {
return fmt.Errorf("failed to decode object id from container '%s' expiration epoch %d: %w", containerID, expirationEpoch, err)
}
it.lastAttributeKey = bytes.Clone(attrKey)
Why do we clone them? They seem to be used only during tx.
No,
it
used outside transaction:@ -0,0 +61,4 @@
t.Skip("for generating db")
// This test generates a metabase file with 2 million objects for 10 000 containers,
// of which
// 500 000 are simple objects,
Could you move these numbers to constants and define here instead of the comment?
It might be useful to easily adjust the test, instead of searching the whole function and thinking whether THIS 100_000 is the one I need.
done
@ -0,0 +68,4 @@
// 100 000 million are locked (total 200 000).
db := New(WithPath(upgradeFilePath), WithEpochState(epochState{e: 1000}), WithLogger(test.NewLogger(t)),
WithMaxBatchDelay(100*time.Millisecond), WithMaxBatchSize(1000))
Do you have any specific motivation to use sth different from defaults in tests?
no, debugee
@ -0,0 +70,4 @@
db := New(WithPath(upgradeFilePath), WithEpochState(epochState{e: 1000}), WithLogger(test.NewLogger(t)),
WithMaxBatchDelay(100*time.Millisecond), WithMaxBatchSize(1000))
require.NoError(t, db.Open(context.Background(), mode.ReadWrite))
db.boltDB.AllocSize = 128 << 20
Magic constant.
fixed
@ -0,0 +190,4 @@
lock := testutil.GenerateObjectWithCID(containers[i%len(containers)])
lock.SetType(objectSDK.TypeLock)
testutil.AddAttribute(lock, objectV2.SysAttributeExpEpoch, strconv.FormatUint(uint64(i%1000+1000), 10))
_, err = db.Put(ctx, PutPrm{
Isn't this broken?
db.Put
uses v3 logic, because that is what we want in our codebase and the test need to generate v2 db.I used this test on
support/v0.42
branch.No.
frostfs-node
of current version will not work with v2 metabase.frostfs-node
of previous version will start with partially updated version, because it is impossible to add check to previous version.4f85d7857e
to07d1a5e6bd
07d1a5e6bd
to4bce19c1d6
4bce19c1d6
to1bd1a6c021
43362f3577
todbea2c275e
dbea2c275e
to882c068410
@ -0,0 +22,4 @@
upgradeLogFrequency = 50_000
upgradeWorkersCount = 1_000
compactMaxTxSize = 256 << 20
upgradeTimeout = 1 * time.Second
We use 100ms in all other code, but no big deal.