[#371] ir: Add morph cache metrics #400

Merged
fyrchik merged 1 commit from aarifullin/frostfs-node:feature/371-morph_cache_metr into master 2023-06-13 10:10:16 +00:00
Member

Signed-off-by: Airat Arifullin a.arifullin@yadro.com

Signed-off-by: Airat Arifullin a.arifullin@yadro.com
aarifullin added the
enhancement
label 2023-05-26 09:17:06 +00:00
aarifullin changed title from [#371] ir: Add morph cache metrics: to WIP: [#371] ir: Add morph cache metrics: 2023-05-26 09:17:12 +00:00
aarifullin force-pushed feature/371-morph_cache_metr from bf42a35286 to 1bef474d37 2023-05-26 09:19:42 +00:00 Compare
aarifullin reviewed 2023-05-26 09:23:27 +00:00
@ -0,0 +64,4 @@
var _ MorphCacheMetrics = (*morphCacheMetrics)(nil)
func (m *morphCacheMetrics) IncGetNNSContractHash(success bool) {
m.getGroupKeyCounter.value.With(prometheus.Labels{
Author
Member

I was thinking is it worth it to add the nns contract hash label but after all I concluded that it's not helpful information from the point of metric calculation.

The second point - the events like getting NNS contract hash and group key are rare compared to getting tx height but I've added them anyway

I was thinking is it worth it to add the nns contract hash label but after all I concluded that it's not helpful information from the point of metric calculation. The second point - the events like getting NNS contract hash and group key are rare compared to getting tx height but I've added them anyway
aarifullin changed title from WIP: [#371] ir: Add morph cache metrics: to [#371] ir: Add morph cache metrics: 2023-05-26 09:23:48 +00:00
aarifullin requested review from storage-core-developers 2023-05-26 09:23:57 +00:00
aarifullin requested review from storage-core-committers 2023-05-26 09:23:57 +00:00
acid-ant reviewed 2023-05-26 11:49:37 +00:00
@ -0,0 +32,4 @@
type morphCacheMetrics struct {
getNNSContractHashCounter metric[*prometheus.CounterVec]
getNNSContractHashDuration metric[*prometheus.HistogramVec]
Member

According to this #395 - is it necessary for counter here too?

According to this #395 - is it necessary for counter here too?
Author
Member

Yeah, I think so. These will be removed

Yeah, I think so. These will be removed
Author
Member

I checked out @dstepanov-yadro 's PRs and his implementation for writeCache.

You can look at these lines and this writeCache' feature - you can see that cache uses db and fsTree.

I need to separate flags - cache_hit and found, because we can encounter these situations:

  1. cache hit -> object found (true, true)
  2. cache miss but this miss is being processed and object is found (false, true)
  3. cache miss, object has not been found at all - (false, false)

If I use only success flag like this, then we will never know was it cache miss or error during processing cache miss

@dstepanov-yadro , please, fix me, if I am not correct

I checked out @dstepanov-yadro 's PRs and his implementation for `writeCache`. You can look at these [lines](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/writecache/get.go#L39) and this `writeCache`' feature - you can see that cache uses `db` and `fsTree`. I need to separate flags - `cache_hit` and `found`, because we can encounter these situations: 1. cache hit -> object found (true, true) 2. cache miss but this miss is being processed and object is found (false, true) 3. cache miss, object has not been found at all - (false, false) If I use only `success` flag like [this](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/writecache/get.go#L39), then we will never know was it cache miss or error during processing cache miss @dstepanov-yadro , please, fix me, if I am not correct

I think than @acid-ant suggests using getNNSContractHashDuration metric[*prometheus.HistogramVec] with two headers: cache_hit and success.

I think than @acid-ant suggests using `getNNSContractHashDuration metric[*prometheus.HistogramVec]` with two headers: cache_hit and success.
Author
Member

with two headers: cache_hit and success

I've introduced cache_hit = true/false and found = true/false. I find the success label inconvenient because it's not verbose from my POV :)

P.S. Replaced it to success anyway

> with two headers: cache_hit and success I've introduced `cache_hit = true/false` and `found = true/false`. I find the `success` label inconvenient because it's not verbose from my POV :) **P.S. Replaced it to `success` anyway**
dstepanov-yadro marked this conversation as resolved
aarifullin changed title from [#371] ir: Add morph cache metrics: to WIP: [#371] ir: Add morph cache metrics: 2023-05-26 12:38:31 +00:00
aarifullin force-pushed feature/371-morph_cache_metr from 1bef474d37 to d3e5057045 2023-05-26 13:35:37 +00:00 Compare
aarifullin changed title from WIP: [#371] ir: Add morph cache metrics: to [#371] ir: Add morph cache metrics: 2023-05-26 13:47:06 +00:00
aarifullin force-pushed feature/371-morph_cache_metr from d3e5057045 to 380fdb3285 2023-05-29 08:54:52 +00:00 Compare
aarifullin changed title from [#371] ir: Add morph cache metrics: to [#371] ir: Add morph cache metrics 2023-05-29 08:54:57 +00:00
fyrchik reviewed 2023-05-29 10:10:41 +00:00
fyrchik left a comment
Owner

The task was also about caches in here https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-node/cache.go

Also, we would like to have the same metrics for all caches:

  1. Hit/miss counters (done).
  2. Speaking of time, I think it makes sense to have another duration metric for for all requests: we use cache to be fast, so cumulative duration metric (for both hit and miss) is also descriptive. Maybe we can even replace miss duration metric with this common one, let's discuss. The rationale is that we can have a separate duration metric just for morph client, irregardless of the cache.

Having this in mind, maybe we should create some generic structure for such usecase?
We can then provide a name to this cache ("nns_hash", "eacl" etc.) and not reflect it in methods. It seems easier both to read and maintain.

The task was also about caches in here https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-node/cache.go Also, we would like to have the same metrics for all caches: 1. Hit/miss counters (done). 2. Speaking of time, I think it makes sense to have another duration metric for for all requests: we use cache to be fast, so cumulative duration metric (for both hit and miss) is also descriptive. Maybe we can even replace `miss duration` metric with this common one, let's discuss. The rationale is that we can have a separate duration metric just for morph client, irregardless of the cache. Having this in mind, maybe we should create some generic structure for such usecase? We can then provide a name to this cache ("nns_hash", "eacl" etc.) and not reflect it in methods. It seems easier both to read and maintain.
@ -0,0 +14,4 @@
mcGetGroupKey = "get_group_key"
mcGetTxHeight = "get_tx_height"
mcNNSContractID = "nns_contract_id"
mcNNSContractHash = "nns_contract_hash"
Owner

Is it used anywhere?

Is it used anywhere?
fyrchik marked this conversation as resolved
@ -89,2 +90,4 @@
gKey *keys.PublicKey
txHeights *lru.Cache[util.Uint256, uint32]
morphCacheMetrics metrics.MorphCacheMetrics
Owner

It resides in morph/Client.cache, may be call the field just "metrics"?

It resides in `morph/Client.cache`, may be call the field just "metrics"?
fyrchik marked this conversation as resolved
@ -84,0 +88,4 @@
defer func() {
c.cache.morphCacheMetrics.IncGetNNSContractHash(cacheHit)
c.cache.morphCacheMetrics.AddGetNNSContractHashDuration(found, time.Since(startedAt))
Owner

The comment says // Duration of processing get nns contract hash request if cache miss happens, but this line is executed unconditionally. Whom to believe?

The comment says `// Duration of processing get nns contract hash request if cache miss happens`, but this line is executed unconditionally. Whom to believe?
fyrchik marked this conversation as resolved
@ -221,7 +233,15 @@ func (c *Client) SetGroupSignerScope() error {
// contractGroupKey returns public key designating FrostFS contract group.
func (c *Client) contractGroupKey() (*keys.PublicKey, error) {
var cacheHit, found bool
Owner

We use this found as a success indicator, what about success?

We use this `found` as a success indicator, what about `success`?
Author
Member

Alright. I give up :) It seems the majority wants to see success flag - I'll fix it.

For me success label doesn't fit well with metric name.

Let's consider get_tx_height/tx_height metric. How does it look in the histogram with label success:

  • Was it success = true because cache has hit OR because cache has missed and we processed this situation and successfully found the requested?
  • Was it success = false because cache has missed OR because cache has missed and we processed this situation and unsuccessfully found the requested?

These two points were in my mind...

P.S. Okay. Agreed. The same can bee applied for found label

Alright. I give up :) It seems the majority wants to see `success` flag - I'll fix it. For me `success` label doesn't fit well with metric name. Let's consider `get_tx_height`/`tx_height` metric. How does it look in the histogram with label `success`: - Was it `success = true` because cache has hit **OR** because cache has missed and we processed this situation and *successfully* `found` the requested? - Was it `success = false` because cache has missed **OR** because cache has missed and we processed this situation and *unsuccessfully* `found` the requested? These two points were in my mind... P.S. Okay. Agreed. The same can bee applied for `found` label

Let's look at the other side. This is a metric for the cache. I think it's not particularly important here whether we found the object in the end or not. So found is morph client metric, not morph cache.
@fyrchik @aarifullin what do you think?

Let's look at the other side. This is a metric for the cache. I think it's not particularly important here whether we found the object in the end or not. So `found` is `morph client` metric, not `morph cache`. @fyrchik @aarifullin what do you think?
Author
Member

From my POV - it's better to keep morph cache because we count hits/misses and the duration of the miss processing

From my POV - it's better to keep `morph cache` because we count hits/misses and the duration of the miss processing
Author
Member

maybe we should create some generic structure for such usecase?

I've posted above the difference between metrics for morphCache and writeCache. So, we can make "some generic structure" only for "in-memory" cases:

type cache struct {
	m *sync.RWMutex

	nnsHash   *util.Uint160
	gKey      *keys.PublicKey
	txHeights *lru.Cache[util.Uint256, uint32]
}
> maybe we should create some generic structure for such usecase? I've posted above the difference between metrics for `morphCache` and `writeCache`. So, we can make "some generic structure" only for "in-memory" cases: ``` type cache struct { m *sync.RWMutex nnsHash *util.Uint160 gKey *keys.PublicKey txHeights *lru.Cache[util.Uint256, uint32] } ```
aarifullin force-pushed feature/371-morph_cache_metr from 380fdb3285 to fbb6252ba8 2023-05-29 13:50:05 +00:00 Compare
acid-ant approved these changes 2023-05-30 07:39:55 +00:00
aarifullin changed title from [#371] ir: Add morph cache metrics to WIP: [#371] ir: Add morph cache metrics 2023-05-30 15:05:46 +00:00
aarifullin force-pushed feature/371-morph_cache_metr from fbb6252ba8 to 0aa32cc936 2023-05-31 10:31:39 +00:00 Compare
aarifullin reviewed 2023-05-31 10:32:46 +00:00
@ -9,12 +9,20 @@ import (
const treeServiceLabelSuccess = "success"
type TreeMetricsRegister interface {
Author
Member

tree.MetricsRegister has been moved from pkg/services/tree/metrics.go to here and renamed because this caused import cycle

`tree.MetricsRegister` has been moved from `pkg/services/tree/metrics.go` to here and renamed because this caused import cycle
aarifullin changed title from WIP: [#371] ir: Add morph cache metrics to [#371] ir: Add morph cache metrics 2023-05-31 10:38:08 +00:00
aarifullin requested review from acid-ant 2023-05-31 10:38:12 +00:00
Author
Member

maybe we should create some generic structure for such usecase?

I've posted above the difference between metrics for morphCache and writeCache. So, we can make "some generic structure" only for "in-memory" cases:

type cache struct {
	m *sync.RWMutex

	nnsHash   *util.Uint160
	gKey      *keys.PublicKey
	txHeights *lru.Cache[util.Uint256, uint32]
}

I have given more detailed explanation in this post

> > maybe we should create some generic structure for such usecase? > > I've posted above the difference between metrics for `morphCache` and `writeCache`. So, we can make "some generic structure" only for "in-memory" cases: > > ``` > type cache struct { > m *sync.RWMutex > > nnsHash *util.Uint160 > gKey *keys.PublicKey > txHeights *lru.Cache[util.Uint256, uint32] > } > ``` > > I have given more detailed explanation in this [post](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/371#issuecomment-11721)
aarifullin changed title from [#371] ir: Add morph cache metrics to WIP: [#371] ir: Add morph cache metrics 2023-06-01 14:12:36 +00:00
aarifullin force-pushed feature/371-morph_cache_metr from 0aa32cc936 to 050ff1d56d 2023-06-05 07:58:45 +00:00 Compare
aarifullin changed title from WIP: [#371] ir: Add morph cache metrics to [#371] ir: Add morph cache metrics 2023-06-05 07:59:01 +00:00
aarifullin requested review from dstepanov-yadro 2023-06-05 08:00:14 +00:00
acid-ant approved these changes 2023-06-05 12:03:58 +00:00
dstepanov-yadro reviewed 2023-06-06 06:57:30 +00:00
@ -0,0 +58,4 @@
Name: "tx_height",
Help: "Transaction height in the morph cache.",
},
[]string{mcTxHash, mcSuccess}),

Is mcTxHash countable? If not we will get a lot of metric records.

Is mcTxHash countable? If not we will get a lot of metric records.
Author
Member

Alright, probably I misunderstood the meaing of the labels. I've fixed!

Alright, probably I misunderstood the meaing of the labels. I've fixed!
aarifullin force-pushed feature/371-morph_cache_metr from 050ff1d56d to a05637712e 2023-06-06 09:00:03 +00:00 Compare
aarifullin requested review from acid-ant 2023-06-06 09:53:58 +00:00
acid-ant approved these changes 2023-06-06 12:32:47 +00:00
dstepanov-yadro reviewed 2023-06-07 07:45:18 +00:00
@ -0,0 +40,4 @@
nnsContractHashDuration: metrics.NewHistogramVec(prometheus.HistogramOpts{
Namespace: namespace,
Subsystem: mcSubsystem,
Name: "nns_contract_hash",

See https://prometheus.io/docs/practices/naming/

A metric name... ...should have a suffix describing the unit, in plural form. ...
See https://prometheus.io/docs/practices/naming/ ``` A metric name... ...should have a suffix describing the unit, in plural form. ... ```
Author
Member

Oh, sorry... Fixed!

Oh, sorry... Fixed!
aarifullin force-pushed feature/371-morph_cache_metr from a05637712e to 1e8e5296fb 2023-06-07 08:02:23 +00:00 Compare
aarifullin requested review from acid-ant 2023-06-07 08:03:01 +00:00
dstepanov-yadro approved these changes 2023-06-07 08:23:40 +00:00
acid-ant approved these changes 2023-06-13 07:20:29 +00:00
aarifullin requested review from storage-core-developers 2023-06-13 07:52:48 +00:00
aarifullin requested review from storage-core-committers 2023-06-13 07:52:54 +00:00
dstepanov-yadro approved these changes 2023-06-13 09:33:32 +00:00
fyrchik approved these changes 2023-06-13 10:10:04 +00:00
fyrchik merged commit 90e9247b69 into master 2023-06-13 10:10:16 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#400
No description provided.