Add chain serializers #42

Merged
dstepanov-yadro merged 1 commit from dstepanov-yadro/policy-engine:feat/marshal into master 2024-09-04 19:51:23 +00:00

Closes #1

For binary marshalling added simple binary marshaller based on encoding/binary package. It could be optimized with unsafe string marshal/unmarshal methods, but it looks error-prone in public package.

For json marshalling added json marshaller based on easyjson. For enum-types added custom marshal/unmarshal methods to save to json human-readable strings.

Also changed ID's type from string to bytes (see discussion below).

Closes #1 For binary marshalling added simple binary marshaller based on `encoding/binary` package. It could be optimized with unsafe string marshal/unmarshal methods, but it looks error-prone in public package. For json marshalling added json marshaller based on `easyjson`. For enum-types added custom marshal/unmarshal methods to save to json human-readable strings. Also changed `ID`'s type from string to bytes (see discussion below).
dstepanov-yadro force-pushed feat/marshal from a5818ce3d0 to 0191c88935 2024-01-17 14:59:53 +00:00 Compare
dstepanov-yadro changed title from Add chain marshallers to Add chain serializers 2024-01-17 15:00:22 +00:00
dstepanov-yadro force-pushed feat/marshal from 0191c88935 to 232bed56b8 2024-01-17 15:15:21 +00:00 Compare
dstepanov-yadro force-pushed feat/marshal from 232bed56b8 to 53022608ed 2024-01-17 15:28:59 +00:00 Compare
dstepanov-yadro reviewed 2024-01-17 15:35:03 +00:00
@ -0,0 +11,4 @@
"github.com/stretchr/testify/require"
)
func TestID(t *testing.T) {
Author
Member

Test from #1 (comment)

I think if ID is defined as string, it means that it should have bytes from utf-8 encoding. So as ID is public, so developer has to set valid value to this field.

Test from https://git.frostfs.info/TrueCloudLab/policy-engine/issues/1#issuecomment-30422 I think if ID is defined as string, it means that it should have bytes from utf-8 encoding. So as ID is public, so developer has to set valid value to this field.
Member

Probably yes, but we started use chainID as name in policy contract so we have to fit length limitation and have deterministic id.

Probably yes, but we started use chainID as [name](https://git.frostfs.info/TrueCloudLab/frostfs-contract/src/commit/f2a82aa635aa57d9b05092d8cf15b170b53cc324/policy/policy_contract.go#L89) in policy contract so we have to fit length limitation and have deterministic id.
Author
Member

Looks like then we need to change ID type to []byte.

Looks like then we need to change `ID` type to `[]byte`.
Author
Member

Replaced ID marshalling as base64 bytes, so now your test passes.
String has one advantage: it can't be nil.

Replaced `ID` marshalling as `base64` bytes, so now your test passes. String has one advantage: it can't be nil.
dstepanov-yadro requested review from storage-core-committers 2024-01-17 15:35:20 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-01-17 15:35:21 +00:00
dstepanov-yadro requested review from storage-services-committers 2024-01-17 15:35:29 +00:00
dstepanov-yadro requested review from storage-services-developers 2024-01-17 15:35:29 +00:00
dstepanov-yadro force-pushed feat/marshal from 53022608ed to 8ec573f89b 2024-01-17 18:43:56 +00:00 Compare
dstepanov-yadro force-pushed feat/marshal from 8ec573f89b to a214196c87 2024-01-18 07:35:26 +00:00 Compare
aarifullin reviewed 2024-01-18 08:58:35 +00:00
@ -0,0 +85,4 @@
{
const prefix string = ",\"ID\":"
out.RawString(prefix[1:])
out.Base64Bytes(in.ID)
Member

The file's header says

// Code generated by easyjson for marshaling/unmarshaling. DO NOT EDIT.

I looked at the last commit and I guess you have edited this on your own. Please, could you leave some warning comment like "this line is hand-written". Because anyone who is going to re-generate it, of course, will forget about Base64 :)

The file's header says ```go // Code generated by easyjson for marshaling/unmarshaling. DO NOT EDIT. ``` I looked at the last commit and I guess you have edited this on your own. Please, could you leave some warning comment like "this line is hand-written". Because anyone who is going to re-generate it, of course, will forget about `Base64` :)
Author
Member

Right, this file was edited with easyjson tool ran with make generate.
I have deleted ID's MarshalEasyJSON and UnmarshalEasyJSON methods, so easyjson changed chain marshal/unmarshal implementation.

Right, this file was edited with easyjson tool ran with `make generate`. I have deleted `ID`'s `MarshalEasyJSON` and `UnmarshalEasyJSON` methods, so easyjson changed chain marshal/unmarshal implementation.
dkirillov approved these changes 2024-01-18 08:59:22 +00:00
aarifullin approved these changes 2024-01-18 10:48:24 +00:00
aarifullin left a comment
Member

Excellent work! 👍

Excellent work! 👍
acid-ant approved these changes 2024-01-23 07:14:05 +00:00
Makefile Outdated
@ -63,0 +65,4 @@
easyjson-install:
@rm -rf $(EASYJSON_DIR)
@mkdir $(EASYJSON_DIR)
Member

-p required.

`-p` required.
Author
Member

Done

Done
acid-ant marked this conversation as resolved
fyrchik reviewed 2024-01-23 08:10:21 +00:00
@ -0,0 +1,463 @@
// Code generated by easyjson for marshaling/unmarshaling. DO NOT EDIT.
Owner

What about adding /**/*_easyjson.go -diff -merge in .gitattributes like we do for .pb.go in node?

What about adding `/**/*_easyjson.go -diff -merge` in `.gitattributes` like we do for `.pb.go` in node?
Owner

What about adding /**/*_easyjson.go -diff -merge in .gitattributes like we do for .pb.go in node?

What about adding `/**/*_easyjson.go -diff -merge` in `.gitattributes` like we do for `.pb.go` in node?
Author
Member

Done

Done
@ -0,0 +91,4 @@
return offset, nil
}
func SliceUnmarshal[T any](buf []byte, offset int, unmarshalT func(buf []byte, offset int) (T, int, error)) ([]T, int, error) {
Owner

Why have you decided to implement this package instead of reusing https://github.com/nspcc-dev/neo-go/tree/master/pkg/io ? Speed?

Why have you decided to implement this package instead of reusing https://github.com/nspcc-dev/neo-go/tree/master/pkg/io ? Speed?
Author
Member

Counter question: Why did I have to use it? And why not one of the https://github.com/alecthomas/go_serialization_benchmarks#results :)

The real answer is that I don't want to make some additional deps. Also I don't use any or reflect.

Counter question: Why did I have to use it? And why not one of the https://github.com/alecthomas/go_serialization_benchmarks#results :) The real answer is that I don't want to make some additional deps. Also I don't use `any` or `reflect`.
Owner

Why did I have to use it?

You didn't have to, it was a question. My argument: because it is common in our ecosystem, has a nice streaming interface and doesn't count as "new" format we all need to learn.

The real answer is that I don't want to make some additional deps.

That's perfect, I don't mind leaving it as is. My problems with all marshalers is that they seem to be impossible to write correctly (without panics) from the first try. Maybe add some fuzz tests here? It would be quite simple, we just want to be sure that unmarshal doesn't panic.

>Why did I have to use it? You didn't have to, it was a question. My argument: because it is common in our ecosystem, has a nice streaming interface and doesn't count as "new" format we all need to learn. >The real answer is that I don't want to make some additional deps. That's perfect, I don't mind leaving it as is. My problems with all marshalers is that they seem to be impossible to write correctly (without panics) from the first try. Maybe add some fuzz tests here? It would be quite simple, we just want to be sure that unmarshal doesn't panic.
Author
Member

I have added tests for marshal.go, now test coverage is 93.9%.
Tests have been added to the chain marshalling that test a variety of options. Of course, this is not a guarantee, but still.
Fuzzing for a chain looks very complicated: there are too many parameters.
Panics are possible, for example, when allocating memory for a slice, but this panic depends not only on the number of elements, but also on the size of the element, so it is difficult to check.

I have added tests for `marshal.go`, now test coverage is 93.9%. Tests have been added to the chain marshalling that test a variety of options. Of course, this is not a guarantee, but still. Fuzzing for a chain looks very complicated: there are too many parameters. Panics are possible, for example, when allocating memory for a slice, but this panic depends not only on the number of elements, but also on the size of the element, so it is difficult to check.
Owner

No-no, we fuzz just unmarshaling -- it is raw bytes, so should be easy. Marshaling is more believable to not panic.
An example with similar test ef99a7a9e3/pkg/core/transaction/fuzz_test.go (L11)

No-no, we fuzz just unmarshaling -- it is raw bytes, so should be easy. Marshaling is more believable to not panic. An example with similar test https://github.com/nspcc-dev/neo-go/blob/ef99a7a9e33fcae86daaf3a64ce1e9b5ab12c223/pkg/core/transaction/fuzz_test.go#L11
Owner

For example, it seems we do not have any size checks in SliceUnmarshal(): what happens if I use 9999999 length which is followed by just 3 bytes. It is invalid, but I allocate 9999999 bytes (add more nines and it is an easy OOM).
Am I missing something?
In neo-go the problem is solved by restricting array size, I think sth similar is applicable here (chains go in contract, so we may restrict them to 64k in size).

For example, it seems we do not have any size checks in `SliceUnmarshal()`: what happens if I use `9999999` length which is followed by just 3 bytes. It is invalid, but I allocate `9999999` bytes (add more nines and it is an easy OOM). Am I missing something? In neo-go the problem is solved by restricting array size, I think sth similar is applicable here (chains go in contract, so we may restrict them to 64k in size).
Author
Member

problem is solved by restricting array size

Yes, it is. But total slice size depends on item size, so this check is not strict, so OOM may happen even if size limited.

what happens if I use 9999999 length which is followed by just 3 bytes.

Here is the code:

	result := make([]T, size)
	for idx := 0; idx < len(result); idx++ {
		result[idx], offset, err = unmarshalT(buf, offset)
		if err != nil {
			return nil, 0, err
		}
	}

Every unmarshal checks offset and buffer size, so err will be returned. See corrupted slice size and other negative unit test.

> problem is solved by restricting array size Yes, it is. But total slice size depends on item size, so this check is not strict, so OOM may happen even if size limited. > what happens if I use 9999999 length which is followed by just 3 bytes. Here is the code: ``` result := make([]T, size) for idx := 0; idx < len(result); idx++ { result[idx], offset, err = unmarshalT(buf, offset) if err != nil { return nil, 0, err } } ``` Every `unmarshal` checks offset and buffer size, so `err` will be returned. See `corrupted slice size` and other negative unit test.
Owner

result := make([]T, size) -- this will panic, right?

But total slice size depends on item size

Yes, but the size of unmarshaled chain is comparable to the size of the slice of bytes (and our slice fits in memory).

We just cannot allow having panic in unmarshal, it works with untrusted data.

`result := make([]T, size)` -- this will panic, right? >But total slice size depends on item size Yes, but the size of unmarshaled chain is comparable to the size of the slice of bytes (and our slice fits in memory). We just cannot allow having panic in unmarshal, it works with untrusted data.
Author
Member

Agree. Added fuzz test and slice size check.

Agree. Added fuzz test and slice size check.
dstepanov-yadro force-pushed feat/marshal from 19d12b3b27 to dc0e63ddbb 2024-01-23 08:30:09 +00:00 Compare
acid-ant approved these changes 2024-01-23 08:33:40 +00:00
dstepanov-yadro force-pushed feat/marshal from dc0e63ddbb to d56d88c638 2024-01-23 16:38:01 +00:00 Compare
dstepanov-yadro force-pushed feat/marshal from d56d88c638 to 0a28f0a992 2024-01-24 08:19:13 +00:00 Compare
dkirillov approved these changes 2024-01-24 09:36:59 +00:00
fyrchik approved these changes 2024-01-24 09:55:14 +00:00
dstepanov-yadro merged commit 0a28f0a992 into master 2024-01-26 08:57:17 +00:00
dstepanov-yadro deleted branch feat/marshal 2024-01-26 08:57:23 +00:00
Sign in to join this conversation.
No description provided.