Copies numbers validity checker #45

Open
opened 2023-03-30 13:03:37 +00:00 by alexvanin · 7 comments
Owner

New copies numbers parameter of object.PUT request is related to placement policy of the container. We should have validation function to ignore invalid copies numbers parameters before and during request processing.

Describe the solution you'd like

Add validation function which takes container policy and new copies numbers value and validates this tuple. Function should return some specific errors, so calling functions may use error.Is to define exact error type.

## Is your feature request related to a problem? Please describe. New `copies numbers` parameter of object.PUT request is related to placement policy of the container. We should have validation function to ignore invalid `copies numbers` parameters before and during request processing. ## Describe the solution you'd like Add validation function which takes container policy and new `copies numbers` value and validates this tuple. Function should return some specific errors, so calling functions may use `error.Is` to define exact error type.
Member

@alexvanin, will such a func be general enough for SDK? Func's body will just compare len? What new custom errors do you see?

@alexvanin, will such a func be general enough for SDK? Func's body will just compare `len`? What new custom errors do you see?
Author
Owner

As far as I understand, there is not only len requirement for copies numbers. Numbers should be less or equal than REP values?

Take 'REP 2 IN A REP 3 IN B' policy. Copies numbers {10, 5} are invalid, while {1, 2} are valid.

So I consider these errors, for example:

  • ErrCopiesNumbersTooShort
  • ErrCopiesNumbersTooLong
  • ErrCopiesNumbersImproperValue
As far as I understand, there is not only `len` requirement for `copies numbers`. Numbers should be less or equal than `REP` values? > Take 'REP 2 IN A REP 3 IN B' policy. Copies numbers {10, 5} are invalid, while {1, 2} are valid. So I consider these errors, for example: - ErrCopiesNumbersTooShort - ErrCopiesNumbersTooLong - ErrCopiesNumbersImproperValue
Member

Numbers should be less or equal than REP values?

TrueCloudLab/frostfs-api#17 says nothing about REP number limitation.

Who gonna use that func? The node only?

Also, such a policy is correct: "REP 4 SELECT 1 FROM *" (well, ok, currently with --force flag only in frostfs-cli but "REP 3 SELECT 1 FROM *" works even without --force (in dev-env with 3+ nodes currently available, of course)), and only a PUT can says that smth wrong (but an object will be persisted). copies_number has not been used before and, therefore, it is hard to get inspired by the previous implementation (copies_number 12345 was ok for REP 3 but it looks more like a bug).

/cc @TrueCloudLab/storage-core-committers

> Numbers should be less or equal than REP values? https://git.frostfs.info/TrueCloudLab/frostfs-api/pulls/17 says nothing about REP number limitation. Who gonna use that func? The node only? Also, such a policy is correct: `"REP 4 SELECT 1 FROM *"` (well, ok, currently with `--force` flag only in `frostfs-cli` but `"REP 3 SELECT 1 FROM *"` works even without `--force` (in dev-env with 3+ nodes _currently available_, of course)), and only a `PUT` can says that smth wrong (but an object will be persisted). `copies_number` has not been used before and, therefore, it is hard to get inspired by the previous implementation (`copies_number` 12345 was ok for `REP 3` but it looks more like a bug). /cc @TrueCloudLab/storage-core-committers
Author
Owner

TrueCloudLab/frostfs-api#17 says nothing about REP number limitation.

Let's clarify and discuss it. I originally thought, that we are implying that copies number value is less or equal than REP value.

There might be the case, when client wants to store temporary more copies (if possible) than REP defines. If so, then API description is okay. If not, let's clarify that in specification.

However both cases are naturally limited by SELECT * CBF value, so we can use it as a hard limit for copies numbers?

Who gonna use that func? The node only?

This function is going to be used in back-ends for visual policy constructors. While front-end can create valid JSON of placement policy and copies numbers, it definetley can't validate complex logic of policies. So verification function with distinct error types can provide good feedback to the front-end.

Also, such a policy is correct: "REP 4 SELECT 1 FROM *"

In the context of front-end policy constructors, policy validation function would be great too, but we have to create separate issue on that 😃.

> TrueCloudLab/frostfs-api#17 says nothing about REP number limitation. Let's clarify and discuss it. I originally thought, that we are implying that `copies number` value is less or equal than `REP` value. There might be the case, when client wants to store temporary *more* copies (if possible) than `REP` defines. If so, then API description is okay. If not, let's clarify that in specification. However both cases are naturally limited by `SELECT * CBF` value, so we can use it as a hard limit for copies numbers? > Who gonna use that func? The node only? This function is going to be used in back-ends for visual policy constructors. While front-end can create valid JSON of placement policy and `copies numbers`, it definetley can't validate complex logic of policies. So verification function with distinct error types can provide good feedback to the front-end. > Also, such a policy is correct: "REP 4 SELECT 1 FROM *" In the context of front-end policy constructors, policy validation function would be great too, but we have to create separate issue on that 😃.
Owner

Actually, copies_number can be bigger than the REP value, this way we could temporarily store e.g. 2 copies in one REP 1 group, replicating it later on another node.

Actually, `copies_number` _can_ be bigger than the REP value, this way we could temporarily store e.g. 2 copies in one `REP 1` group, replicating it later on another node.
Member

@fyrchik, does that mean that we do not need request validation at all and only PUT requests' result can say smth about IncompleteObjectPut? Or PUT requests must not have more copies than SELECT * CBF?

@fyrchik, does that mean that we do not need request validation at all and only `PUT` requests' result can say smth about `IncompleteObjectPut`? Or `PUT` requests must not have more copies than `SELECT` * `CBF`?
Owner

I believe that the problem that we are trying to solve here is to avoid attaching incorrect vectors with S3 (it is written once and then attached to every PUT). So a separate function makes sense (we can reuse it in CLI, for example).

Validating this in the storage node seems error-prone, especially considering backwards compatibility and possible placement policy improvements. It is not a part of a protocol (like incomplete object put has node wasn't able to put some copy semantics, not node decided that it won't be able to do this in future).

We can always add it in future.

I believe that the problem that we are trying to solve _here_ is to avoid attaching incorrect vectors with S3 (it is written once and then attached to every PUT). So a separate function makes sense (we can reuse it in CLI, for example). Validating this in the storage node seems error-prone, especially considering backwards compatibility and possible placement policy improvements. It is not a part of a protocol (like `incomplete object put` has `node wasn't able to put some copy` semantics, not `node decided that it won't be able to do this in future`). We can always add it in future.
fyrchik added reference master 2023-05-02 07:51:26 +00:00
fyrchik removed reference master 2023-05-02 07:51:33 +00:00
snegurochka added the
discussion
enhancement
labels 2023-05-03 17:15:15 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-sdk-go#45
No description provided.