WIP: Improve debug logging for APE check failures #1655

aarifullin · 2025-02-24T16:53:54Z

aarifullin commented

2025-02-24 16:53:54 +00:00

This pull-request is a compromise between two suggestion #1 and #2

Request and Resource types return properties with Properties()
ChainRouterError is extended. The error message has not been changed, but the type provides more fields
Object service is logging all related to ape check info in case of ChainRouterError occurred
Tree service is logging all related to ape check info in case of ChainRouterError occurred

This pull-request is a compromise between two suggestion [#1](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/1574#issue-4876) and [#2](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/1574#issuecomment-63238) 1. `Request` and `Resource` types return properties with `Properties()` 2. `ChainRouterError` is extended. The error message has not been changed, but the type provides more fields 3. Object service is logging all related to ape check info in case of `ChainRouterError` occurred 4. Tree service is logging all related to ape check info in case of `ChainRouterError` occurred Close #1574

aarifullin added 4 commits 2025-02-24 16:53:54 +00:00

[#1574 ] ape: Introduce Properties() method for Request and Resource 00faa9f854

* Both methods are helpers only, they are not required for interface
  implementation.

Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>

[#1574 ] ape: Extend ChainRouterError b83bce1435

* Introduce new fields and getters for them;
* Fix `CheckAPE` in `checkerCoreImpl` at `newChainRouterError`.

Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>

[#1574 ] object: Introduce debug logging for APE check failures 521e72b995

Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>

[#1574 ] tree: Introduce debug logging for APE check failures

DCO action / DCO (pull_request) Successful in 31s

Details

Vulncheck / Vulncheck (pull_request) Successful in 1m0s

Details

Pre-commit hooks / Pre-commit (pull_request) Successful in 1m29s

Details

Build / Build Components (pull_request) Successful in 1m33s

Details

Tests and linters / Lint (pull_request) Successful in 3m0s

Details

Tests and linters / Staticcheck (pull_request) Successful in 2m55s

Details

Tests and linters / Run gofumpt (pull_request) Successful in 2m56s

Details

Tests and linters / Tests (pull_request) Successful in 3m5s

Details

Tests and linters / gopls check (pull_request) Successful in 3m56s

Details

Tests and linters / Tests with -race (pull_request) Successful in 4m1s

Details

060e4b2f76

Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>

requested reviews from storage-core-developers, storage-core-committers, dkirillov

2025-02-24 16:53:54 +00:00

dkirillov approved these changes 2025-02-25 06:48:25 +00:00

Dismissed

dkirillov left a comment

LGTM, but what about container operations?

aarifullin commented

2025-02-25 09:03:54 +00:00

@dkirillov wrote in #1655 (comment):

LGTM, but what about container operations?

Oh, true. Forgot about this service. Let me fix this

@dkirillov wrote in https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1655#issuecomment-68378: > LGTM, but what about container operations? Oh, true. Forgot about this service. Let me fix this

fyrchik reviewed 2025-02-25 09:20:43 +00:00

pkg/services/object/ape/service.go Outdated

					
				@ -31,0 +38,4 @@

					logger.Debug(ctx, logs.APECheckDeniedRequest,

						zap.Stringer("status", chainRouterErr.Status()),

						zap.String("operation", chainRouterErr.Request().Operation()),

						zap.Any("target", chainRouterErr.Target()),

fyrchik commented

2025-02-25 09:20:43 +00:00

It might be easier to implement custom encoder for *apecommon.ChainRouterError type (or another custom type defined in this package).
Sth like "request": {"properties": {...}, "target": ...}, "resource": {"name": ..., "properties": ...}.
Have you considered this? 6d482535bd/zapcore/marshaler.go (L30)

Why?

Because .Any or .Stringer allocates (likely) even when debug is enabled.
Because passing variadic parameter passing may allocate too.
Because .Request().Properties() may allocate too.

It _might_ be easier to implement custom encoder for `*apecommon.ChainRouterError` type (or another custom type defined in this package). Sth like `"request": {"properties": {...}, "target": ...}, "resource": {"name": ..., "properties": ...}`. Have you considered this? https://github.com/uber-go/zap/blob/6d482535bdd97f4d97b2f9573ac308f1cf9b574e/zapcore/marshaler.go#L30 Why? 1. Because `.Any` or `.Stringer` allocates (likely) even when debug is enabled. 2. Because passing variadic parameter passing may allocate too. 3. Because `.Request().Properties()` may allocate too.

aarifullin commented

2025-02-25 13:29:44 +00:00

I have implemented an encoder for *apecommon.ChainRouterError. Please, check.

I agreed with your suggestion

I have implemented an encoder for `*apecommon.ChainRouterError`. Please, check. I agreed with your suggestion

fyrchik commented

2025-02-26 12:41:19 +00:00

Looks neat!

fyrchik marked this conversation as resolved

fyrchik reviewed 2025-02-25 09:22:41 +00:00

pkg/services/object/ape/service.go Outdated

					
				@ -181,2 +220,4 @@

						if err := p.apeChecker.CheckAPE(ctx, prm); err != nil {

							var chainRouterErr *apecommon.ChainRouterError

							if errors.As(err, &chainRouterErr) {

fyrchik commented

2025-02-25 09:22:41 +00:00

I think this is a bad practice:
Logging an error is one way to handle an error.
At the same time we return this error too, so it may be handled twice.
Do you have any other solutions in mind?

I think this is a bad practice: Logging an error is one way to _handle_ an error. At the same time we _return_ this error too, so it may be handled twice. Do you have any other solutions in mind?

aarifullin commented

2025-02-25 09:55:56 +00:00

Logging an error is one way to handle an error.

Are you talking about auto-tests? Because I have no idea who else might be processing error from debug logs.

Do you have any other solutions in mind?

A client should receive a status going along with concise reason. We can't make an error message that contains thorough details about failure - it may cause a degradation as the server will send big messages to the client on unsuccessful checks.

The idea "client gets the chain router check error. We go to server logs to find out the true reason" is absolutely OK with me

> Logging an error is one way to handle an error. Are you talking about auto-tests? Because I have no idea who else might be processing error from debug logs. > Do you have any other solutions in mind? A client should receive a status going along with concise reason. We can't make an error message that contains thorough details about failure - it may cause a degradation as the server will send big messages to the client on unsuccessful checks. The idea "client gets the chain router check error. We go to server logs to find out the true reason" is absolutely OK with me

fyrchik commented

2025-02-26 12:37:14 +00:00

We go to server logs to find out the true reason

Logging an error is perfectly fine.
Logging it twice is not.
In your example it is logged, then returned (on the caller it can also be logged (again), then returned etc.).
The easiest rule of thumb to solve this problem is to handle errors once.
Thus, my comment.

>We go to server logs to find out the true reason Logging an error is perfectly fine. Logging it twice is not. In your example it is logged, then returned (on the caller it can also be logged (again), then returned etc.). The easiest rule of thumb to solve this problem is to handle errors once. Thus, my comment.

fyrchik commented

2025-02-26 12:42:00 +00:00

As an example of how it might be solved: provide details for all requests and hide it behind logging tag.
This way it is not an error and no garbage is seen.

As an example of how it might be solved: provide details for all requests and hide it behind logging tag. This way it is not an error and no garbage is seen.

aarifullin commented

2025-02-26 15:06:39 +00:00

In your example it is logged, then returned (on the caller it can also be logged (again), then returned etc.).

I caught your idea for now. Yes, I agreed. This point is absolutely fair

> In your example it is logged, then returned (on the caller it can also be logged (again), then returned etc.). I caught your idea for now. Yes, I agreed. This point is absolutely fair

aarifullin commented

2025-02-27 10:25:14 +00:00

Let me explain why this is barely possible and justify the current approach

At the same time we return this error too, so it may be handled twice.

ChainRouterError won't be processed from calling side ever. For instance, object service request executor (analyzeStatus) gets ObjectAccessDenied error with a reason. It still gets the same message ("access to operation <method> is denied by access policy engine: <status>"). So,ChainRouterError is kind of an intermediate error and its encoder used to log messages only in APE middleware. Please, check out how ChainRouterError implements Error() and how it goes to apeErr.
No way to carry the error as logging tag. We should keep some information within the context, but this won't work for methods like Put and Patch (grpc-streaming and all)
Operations which are logging "operation finished with error" are Get, Delete and Search.
Put and Patch are logged with audit middleware on Info level
I suggest to log message only when error occurs. Logging every operation will blow journalctl
So, if take in account 1), 2), 3), 4) then I can conclude that the current logging is very helpful and fine

I'd like to leave this as it is and we'll be fine

Let me explain why this is barely possible and justify the current approach > At the same time we return this error too, so it may be handled twice. 1. `ChainRouterError` won't be processed from calling side ever. For instance, object service request executor (`analyzeStatus`) gets `ObjectAccessDenied` error with a reason. It still gets the same message (`"access to operation <method> is denied by access policy engine: <status>"`). So,`ChainRouterError` is kind of an intermediate error and its encoder used to log messages only in APE middleware. Please, check out how `ChainRouterError` implements [Error()](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/ab3922488a671e0e5d342036eff9d73b2ef19e01/pkg/services/common/ape/error.go#L21) and how it goes to [apeErr](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/ab3922488a671e0e5d342036eff9d73b2ef19e01/pkg/services/common/ape/error.go#L21). 2. No way to carry the error as [logging tag](https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1655#issuecomment-68578). We should keep some information within the context, but this won't work for methods like `Put` and `Patch` (grpc-streaming and all) 3. Operations which are logging `"operation finished with error"` are `Get`, `Delete` and `Search`. `Put` and `Patch` are logged with `audit` middleware on `Info` level 4. I suggest to log message only when error occurs. Logging every operation will blow `journalctl` 5. So, if take in account 1), 2), 3), 4) then I can conclude that the current logging is very helpful and fine I'd like to leave this as it is and we'll be fine

fyrchik commented

2025-02-27 11:56:20 +00:00

That we don't see how to do better, doesn't imply the current approach is either helpful or fine.

ChainRouterError won't be processed from calling side ever.

Agree. But should it be an error then? What we log is operation context.

No way to carry the error as logging tag. We should keep some information within the context

We shouldn't, it specifies a place in code and is not context.Context-dependent.
log.WithTag(TagAPEOperationContext).Debug() will do the job.
See #1619.

I suggest to log message only when error occurs.

Why, though? #1574 talks about understanding "what goes wrong". But unexpected ALLOW could also happen, and they also need to be debugged. Having someone access my private container is definitely a bug, even a more dangerous one.

Logging every operation will blow journalctl

As enabling debug level already do. Having a separate tag looks like a perfectly valid usecase for tags to me.
That PR is in review, but if we had a single function that logs verbose info reused between container and object services, that would help tag adoption. And, of course, it depends on the load, consider node living in the old epoch spamming HEAD requests to all nodes in the container.

Currently, container and object services do the same thing here, but in different ways.
The common denominator seems to be router (used directly in container and indirectly via CheckAPE in object).
Maybe have a custom router that logs everything?

That we don't see how to do better, doesn't imply the current approach is either helpful or fine. >ChainRouterError won't be processed from calling side ever. Agree. But should it be an error then? What we log is operation context. >No way to carry the error as logging tag. We should keep some information within the context We shouldn't, it specifies a place in code and is not `context.Context`-dependent. `log.WithTag(TagAPEOperationContext).Debug()` will do the job. See #1619. >I suggest to log message only when error occurs. Why, though? #1574 talks about understanding "what goes wrong". But unexpected ALLOW could also happen, and they also need to be debugged. Having someone access my private container is definitely a bug, even a more dangerous one. >Logging every operation will blow journalctl As enabling debug level already do. Having a separate tag looks like a perfectly valid usecase for tags to me. That PR is in review, but if we had a _single_ function that logs verbose info reused between container and object services, that would help tag adoption. And, of course, it depends on the load, consider node living in the old epoch spamming HEAD requests to all nodes in the container. Currently, container and object services do the same thing here, but in different ways. The common denominator seems to be `router` (used directly in `container` and indirectly via `CheckAPE` in `object`). Maybe have a custom router that logs everything?

fyrchik reviewed 2025-02-25 09:24:57 +00:00

pkg/services/object/ape/service.go Outdated

					
				@ -68,1 +84,3 @@

					return &Service{

				func NewService(apeChecker Checker, next objectSvc.ServiceServer, opts ...func(s *Service)) *Service {

					s := &Service{

						logger:     logger.NewLoggerWrapper(zap.L()),

fyrchik commented

2025-02-25 09:24:57 +00:00

The default should be zap.NewNop(), because we don't want to log sensitive data somewhere, we want to precisely control all the sinks.

The default should be `zap.NewNop()`, because we don't want to log sensitive data somewhere, we want to precisely control all the sinks.

fyrchik commented

2025-02-25 09:25:17 +00:00

Or, even better, make this argument explicit and remove variadic opts array.

Or, even better, make this argument explicit and remove variadic `opts` array.

aarifullin commented

2025-02-25 13:28:54 +00:00

Fixed

fyrchik marked this conversation as resolved

aarifullin force-pushed feat/improve_ape_log from 060e4b2f76 to bd30e4838f

2025-02-25 13:28:48 +00:00

Compare

aarifullin dismissed dkirillov's review 2025-02-25 13:28:49 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

aarifullin commented

2025-02-25 13:30:14 +00:00

@dkirillov wrote in #1655 (comment):

LGTM, but what about container operations?

Added for container service

@dkirillov wrote in https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1655#issuecomment-68378: > LGTM, but what about container operations? Added for container service

aarifullin force-pushed feat/improve_ape_log from bd30e4838f to ab3922488a

2025-02-25 14:07:03 +00:00

Compare

fyrchik reviewed 2025-02-26 12:31:45 +00:00

pkg/services/common/ape/error.go Outdated

					
				@ -34,0 +85,4 @@

				func (e *ChainRouterError) marshalRequest(enc zapcore.ObjectEncoder) error {

					enc.AddString("operation", e.request.Operation())

					if err := enc.AddObject("target", zapcore.ObjectMarshalerFunc(

fyrchik commented

2025-02-26 12:30:37 +00:00

enc.AddObject("target", zapcore.ObjectMarshalerFunc(e.marshalTarget)) should work too and takes less space.

`enc.AddObject("target", zapcore.ObjectMarshalerFunc(e.marshalTarget))` should work too and takes less space.

fyrchik commented

2025-02-26 12:30:52 +00:00

Also, in some other places this could be simplified too.

aarifullin commented

2025-02-27 10:29:42 +00:00

fixed

fyrchik marked this conversation as resolved

dkirillov reviewed 2025-02-27 08:22:22 +00:00

pkg/services/object/ape/service.go Outdated

					
				@ -110,1 +123,4 @@

						if err := g.apeChecker.CheckAPE(g.Context(), prm); err != nil {

							var chainRouterErr *apecommon.ChainRouterError

							if errors.As(err, &chainRouterErr) {

								logChainRouterError(g.Context(), g.logger, chainRouterErr)

dkirillov commented

2025-02-27 08:22:22 +00:00

Why don't we log error just once in g.apeChecker.CheckAPE ab3922488a/pkg/services/object/ape/checker.go (L77) ?

Why don't we log error just once in `g.apeChecker.CheckAPE` https://git.frostfs.info/aarifullin/frostfs-node/src/commit/ab3922488a671e0e5d342036eff9d73b2ef19e01/pkg/services/object/ape/checker.go#L77 ?