Optimize tree service GetSubTree latency #957

Closed
fyrchik wants to merge 5 commits from fyrchik:optimize-list-latency into master
Owner

Background:

  1. Listing is slow, let's make it faster.
  2. Ideal solution is to store sorted items in the database, but this could be incompatible and affects PUT, which is already slow.

This PR picks up a low-hanging fruit.
From master to this PR:

goos: linux
goarch: amd64
pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/services/tree
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                        │     old      │                pairing                │
                        │    sec/op    │    sec/op      vs base                │
GetSubTree/latency-8      3.542m ± 12%    1.286m ± 21%  -63.69% (p=0.000 n=10)
GetSubTree/total_time-8   73.19m ±  9%   102.20m ±  7%  +39.65% (p=0.000 n=10)
geomean                   16.10m          11.47m        -28.79%

                        │     old      │               pairing                │
                        │     B/op     │     B/op      vs base                │
GetSubTree/latency-8      28.23Mi ± 0%   34.33Mi ± 0%  +21.62% (p=0.000 n=10)
GetSubTree/total_time-8   28.23Mi ± 0%   34.33Mi ± 0%  +21.62% (p=0.000 n=10)
geomean                   28.23Mi        34.33Mi       +21.62%

                        │     old     │               pairing               │
                        │  allocs/op  │  allocs/op   vs base                │
GetSubTree/latency-8      400.0k ± 0%   500.0k ± 0%  +25.00% (p=0.000 n=10)
GetSubTree/total_time-8   400.0k ± 0%   500.0k ± 0%  +25.00% (p=0.000 n=10)
geomean                   400.0k        500.0k       +25.00%

TBD:

  1. Use both implementations, fallback on Slice if the number of items is small.
  2. Try reduce memory consumption.
  3. Non-sorted implementations can be special and the simplest.
Background: 1. Listing is slow, let's make it faster. 2. Ideal solution is to store sorted items in the database, but this could be incompatible and affects PUT, which is already slow. This PR picks up a low-hanging fruit. From master to this PR: ``` goos: linux goarch: amd64 pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/services/tree cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz │ old │ pairing │ │ sec/op │ sec/op vs base │ GetSubTree/latency-8 3.542m ± 12% 1.286m ± 21% -63.69% (p=0.000 n=10) GetSubTree/total_time-8 73.19m ± 9% 102.20m ± 7% +39.65% (p=0.000 n=10) geomean 16.10m 11.47m -28.79% │ old │ pairing │ │ B/op │ B/op vs base │ GetSubTree/latency-8 28.23Mi ± 0% 34.33Mi ± 0% +21.62% (p=0.000 n=10) GetSubTree/total_time-8 28.23Mi ± 0% 34.33Mi ± 0% +21.62% (p=0.000 n=10) geomean 28.23Mi 34.33Mi +21.62% │ old │ pairing │ │ allocs/op │ allocs/op vs base │ GetSubTree/latency-8 400.0k ± 0% 500.0k ± 0% +25.00% (p=0.000 n=10) GetSubTree/total_time-8 400.0k ± 0% 500.0k ± 0% +25.00% (p=0.000 n=10) geomean 400.0k 500.0k +25.00% ``` TBD: 1. Use both implementations, fallback on `Slice` if the number of items is small. 2. Try reduce memory consumption. 3. Non-sorted implementations can be special and the simplest.
fyrchik force-pushed optimize-list-latency from 0b0940042d to ddbc95b859 2024-02-02 21:19:34 +00:00 Compare
fyrchik requested review from alexvanin 2024-02-02 21:19:37 +00:00
Author
Owner

@alexvanin An important problem is to pick up suitable implementation. If we knew the number of items to return in advance (first page?), more optimizations could've been possible. Does AWS CLI provides this info (so s3 can send it to us) or is all paging done on client?

@alexvanin An important problem is to pick up suitable implementation. If we knew the number of items to return in advance (first page?), more optimizations could've been possible. Does AWS CLI provides this info (so s3 can send it to us) or is all paging done on client?
fyrchik requested review from storage-core-committers 2024-02-02 21:21:53 +00:00
fyrchik requested review from storage-core-developers 2024-02-02 21:21:56 +00:00
Author
Owner

Without parent link:

goos: linux
goarch: amd64
pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/services/tree
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                        │      old      │               noparent               │
                        │    sec/op     │    sec/op     vs base                │
GetSubTree/latency-8      3542.1µ ± 12%   572.1µ ± 17%  -83.85% (p=0.000 n=10)
GetSubTree/total_time-8    73.19m ±  9%   86.02m ±  3%  +17.54% (p=0.000 n=10)
geomean                    16.10m         7.015m        -56.43%

                        │     old      │               noparent               │
                        │     B/op     │     B/op      vs base                │
GetSubTree/latency-8      28.23Mi ± 0%   32.81Mi ± 0%  +16.22% (p=0.000 n=10)
GetSubTree/total_time-8   28.23Mi ± 0%   32.81Mi ± 0%  +16.21% (p=0.000 n=10)
geomean                   28.23Mi        32.81Mi       +16.21%

                        │     old     │              noparent               │
                        │  allocs/op  │  allocs/op   vs base                │
GetSubTree/latency-8      400.0k ± 0%   500.0k ± 0%  +25.00% (p=0.000 n=10)
GetSubTree/total_time-8   400.0k ± 0%   500.0k ± 0%  +25.00% (p=0.000 n=10)
geomean                   400.0k        500.0k       +25.00%
Without parent link: ``` goos: linux goarch: amd64 pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/services/tree cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz │ old │ noparent │ │ sec/op │ sec/op vs base │ GetSubTree/latency-8 3542.1µ ± 12% 572.1µ ± 17% -83.85% (p=0.000 n=10) GetSubTree/total_time-8 73.19m ± 9% 86.02m ± 3% +17.54% (p=0.000 n=10) geomean 16.10m 7.015m -56.43% │ old │ noparent │ │ B/op │ B/op vs base │ GetSubTree/latency-8 28.23Mi ± 0% 32.81Mi ± 0% +16.22% (p=0.000 n=10) GetSubTree/total_time-8 28.23Mi ± 0% 32.81Mi ± 0% +16.21% (p=0.000 n=10) geomean 28.23Mi 32.81Mi +16.21% │ old │ noparent │ │ allocs/op │ allocs/op vs base │ GetSubTree/latency-8 400.0k ± 0% 500.0k ± 0% +25.00% (p=0.000 n=10) GetSubTree/total_time-8 400.0k ± 0% 500.0k ± 0% +25.00% (p=0.000 n=10) geomean 400.0k 500.0k +25.00% ```
fyrchik force-pushed optimize-list-latency from ddbc95b859 to 2fbdb0d281 2024-02-02 22:07:28 +00:00 Compare
fyrchik force-pushed optimize-list-latency from 2fbdb0d281 to 63a86d6e00 2024-02-06 08:21:25 +00:00 Compare
fyrchik force-pushed optimize-list-latency from 63a86d6e00 to 8cbf9a0b44 2024-02-06 11:15:19 +00:00 Compare
fyrchik changed title from WIP: Optimize tree service GetSubTree latency to Optimize tree service GetSubTree latency 2024-02-08 07:55:03 +00:00
dstepanov-yadro requested changes 2024-02-08 09:03:13 +00:00
@ -0,0 +8,4 @@
type MinPairingHeap struct {
head *phNode
size int

Looks like size is redundant.

Looks like size is redundant.
aarifullin reviewed 2024-02-09 09:09:59 +00:00
@ -0,0 +22,4 @@
return &MinPairingHeap{}
}
func (m *MinPairingHeap) Insert(infos ...pilorama.NodeInfo) {
Member

[Optionally]

Sorry, it seems I have gone far away 😄

Wouldn't you like to consider generic solution?

package heap

type MinPairingHeap[ValT, KeyT any] struct {
	head   *phNode[ValT, KeyT]
	getKey func(*ValT) KeyT
	cmp    func(KeyT, KeyT) bool
}

type phNode[ValT, KeyT any] struct {
	val     *ValT
	key     func(*ValT) KeyT
	child   *phNode[ValT, KeyT]
	sibling *phNode[ValT, KeyT]
}

func (p *phNode[ValT, KeyT]) Key() KeyT {
	return p.key(p.val)
}

func NewPairing[ValT, KeyT any](key func(*ValT) KeyT, cmp func(KeyT, KeyT) bool) *MinPairingHeap[ValT, KeyT] {
	return &MinPairingHeap[ValT, KeyT]{
		getKey: key,
		cmp:    cmp,
	}
}

func (m *MinPairingHeap[ValT, KeyT]) Insert(infos ...ValT) {
	for i := range infos {
		tmp := &phNode[ValT, KeyT]{val: &infos[i], key: m.getKey}
		m.head = meld(tmp, m.head, m.cmp)
	}
}

func meld[ValT, KeyT any](m1, m2 *phNode[ValT, KeyT], cmp func(KeyT, KeyT) bool) *phNode[ValT, KeyT] {
	if m1 == nil {
		return m2
	}
	if m2 == nil {
		return m1
	}
	if cmp(m1.Key(), m2.Key()) {
		m1.child, m2.sibling = m2, m1.child
		return m1
	}
	m2.child, m1.sibling = m1, m2.child
	return m2
}

For check

func TestMin(t *testing.T) {
	nodes := make([]pilorama.NodeInfo, 0, 10)

	for i := 1; i <= 10; i++ { // Start with 1 to avoid intersecting with RootID = 0.
		nodes = append(nodes, pilorama.NodeInfo{
			ParentID: pilorama.RootID,
			ID:       pilorama.Node(i),
			Meta: pilorama.Meta{
				Items: []pilorama.KeyValue{{
					Key:   pilorama.AttributeFilename,
					Value: []byte(uuid.New().String()),
				}},
			},
		})
	}

	pairing := NewPairing(func(vt *pilorama.NodeInfo) []byte {
		return vt.Meta.GetAttr(pilorama.AttributeFilename)
	}, func(lhs []byte, rhs []byte) bool {
		return bytes.Compare(lhs, rhs) == -1
	})

	pairing.Insert(nodes...)
}
[Optionally] Sorry, it seems I have gone far away 😄 Wouldn't you like to consider generic solution? ```go package heap type MinPairingHeap[ValT, KeyT any] struct { head *phNode[ValT, KeyT] getKey func(*ValT) KeyT cmp func(KeyT, KeyT) bool } type phNode[ValT, KeyT any] struct { val *ValT key func(*ValT) KeyT child *phNode[ValT, KeyT] sibling *phNode[ValT, KeyT] } func (p *phNode[ValT, KeyT]) Key() KeyT { return p.key(p.val) } func NewPairing[ValT, KeyT any](key func(*ValT) KeyT, cmp func(KeyT, KeyT) bool) *MinPairingHeap[ValT, KeyT] { return &MinPairingHeap[ValT, KeyT]{ getKey: key, cmp: cmp, } } func (m *MinPairingHeap[ValT, KeyT]) Insert(infos ...ValT) { for i := range infos { tmp := &phNode[ValT, KeyT]{val: &infos[i], key: m.getKey} m.head = meld(tmp, m.head, m.cmp) } } func meld[ValT, KeyT any](m1, m2 *phNode[ValT, KeyT], cmp func(KeyT, KeyT) bool) *phNode[ValT, KeyT] { if m1 == nil { return m2 } if m2 == nil { return m1 } if cmp(m1.Key(), m2.Key()) { m1.child, m2.sibling = m2, m1.child return m1 } m2.child, m1.sibling = m1, m2.child return m2 } ``` For check ```go func TestMin(t *testing.T) { nodes := make([]pilorama.NodeInfo, 0, 10) for i := 1; i <= 10; i++ { // Start with 1 to avoid intersecting with RootID = 0. nodes = append(nodes, pilorama.NodeInfo{ ParentID: pilorama.RootID, ID: pilorama.Node(i), Meta: pilorama.Meta{ Items: []pilorama.KeyValue{{ Key: pilorama.AttributeFilename, Value: []byte(uuid.New().String()), }}, }, }) } pairing := NewPairing(func(vt *pilorama.NodeInfo) []byte { return vt.Meta.GetAttr(pilorama.AttributeFilename) }, func(lhs []byte, rhs []byte) bool { return bytes.Compare(lhs, rhs) == -1 }) pairing.Insert(nodes...) } ```
Author
Owner

I have rejected it, because it looks complicated (hello, callbacks) and I don't see any other usecases.
Not to mention that even slight performance degradation because of genericness can become crucial for 4M elements listing.

I have rejected it, because it looks complicated (hello, callbacks) and I don't see any other usecases. Not to mention that even slight performance degradation because of genericness can become crucial for 4M elements listing.
aarifullin marked this conversation as resolved
aarifullin approved these changes 2024-02-09 09:10:31 +00:00
aarifullin left a comment
Member

Nice!

Nice!
acid-ant approved these changes 2024-02-09 14:08:21 +00:00
Author
Owner

Not worth the time now, there was another optimization in #1059 and we should have tree service replacement soon.

Not worth the time now, there was another optimization in #1059 and we should have tree service replacement soon.
fyrchik closed this pull request 2024-04-10 08:08:36 +00:00
All checks were successful
DCO action / DCO (pull_request) Successful in 2m13s
Required
Details
Build / Build Components (1.20) (pull_request) Successful in 2m36s
Required
Details
Vulncheck / Vulncheck (pull_request) Successful in 3m1s
Required
Details
Build / Build Components (1.21) (pull_request) Successful in 4m7s
Required
Details
Tests and linters / Staticcheck (pull_request) Successful in 4m1s
Required
Details
Tests and linters / Tests (1.20) (pull_request) Successful in 6m13s
Required
Details
Tests and linters / Lint (pull_request) Successful in 7m12s
Required
Details
Tests and linters / Tests with -race (pull_request) Successful in 7m27s
Required
Details
Tests and linters / Tests (1.21) (pull_request) Successful in 2m30s
Required
Details

Pull request closed

Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#957
No description provided.