Add chain serializers #42

dstepanov-yadro · 2024-01-17T14:55:58Z

dstepanov-yadro commented

2024-01-17 14:55:58 +00:00

Closes #1

For binary marshalling added simple binary marshaller based on encoding/binary package. It could be optimized with unsafe string marshal/unmarshal methods, but it looks error-prone in public package.

For json marshalling added json marshaller based on easyjson. For enum-types added custom marshal/unmarshal methods to save to json human-readable strings.

Also changed ID's type from string to bytes (see discussion below).

Closes #1 For binary marshalling added simple binary marshaller based on `encoding/binary` package. It could be optimized with unsafe string marshal/unmarshal methods, but it looks error-prone in public package. For json marshalling added json marshaller based on `easyjson`. For enum-types added custom marshal/unmarshal methods to save to json human-readable strings. Also changed `ID`'s type from string to bytes (see discussion below).

dstepanov-yadro force-pushed feat/marshal from a5818ce3d0 to 0191c88935

2024-01-17 14:59:53 +00:00

Compare

dstepanov-yadro changed title from ~~Add chain marshallers~~ to Add chain serializers

2024-01-17 15:00:22 +00:00

dstepanov-yadro force-pushed feat/marshal from 0191c88935 to 232bed56b8

2024-01-17 15:15:21 +00:00

Compare

dstepanov-yadro force-pushed feat/marshal from 232bed56b8 to 53022608ed

2024-01-17 15:28:59 +00:00

Compare

dstepanov-yadro reviewed 2024-01-17 15:35:03 +00:00

pkg/chain/marshal_json_test.go Outdated

					
				@ -0,0 +11,4 @@

					"github.com/stretchr/testify/require"

				)

				func TestID(t *testing.T) {

dstepanov-yadro commented

2024-01-17 15:35:03 +00:00

Test from #1 (comment)

I think if ID is defined as string, it means that it should have bytes from utf-8 encoding. So as ID is public, so developer has to set valid value to this field.

Test from https://git.frostfs.info/TrueCloudLab/policy-engine/issues/1#issuecomment-30422 I think if ID is defined as string, it means that it should have bytes from utf-8 encoding. So as ID is public, so developer has to set valid value to this field.

dkirillov commented

2024-01-18 06:31:38 +00:00

Probably yes, but we started use chainID as name in policy contract so we have to fit length limitation and have deterministic id.

Probably yes, but we started use chainID as [name](https://git.frostfs.info/TrueCloudLab/frostfs-contract/src/commit/f2a82aa635aa57d9b05092d8cf15b170b53cc324/policy/policy_contract.go#L89) in policy contract so we have to fit length limitation and have deterministic id.

dstepanov-yadro commented

2024-01-18 06:40:15 +00:00

Looks like then we need to change ID type to []byte.

Looks like then we need to change `ID` type to `[]byte`.

👍 1

dstepanov-yadro commented

2024-01-18 07:38:03 +00:00

Replaced ID marshalling as base64 bytes, so now your test passes.
String has one advantage: it can't be nil.

Replaced `ID` marshalling as `base64` bytes, so now your test passes. String has one advantage: it can't be nil.

requested reviews from storage-core-committers, storage-core-developers, storage-services-committers, storage-services-developers

2024-01-17 15:35:29 +00:00

dstepanov-yadro force-pushed feat/marshal from 53022608ed to 8ec573f89b

2024-01-17 18:43:56 +00:00

Compare

dstepanov-yadro force-pushed feat/marshal from 8ec573f89b to a214196c87

2024-01-18 07:35:26 +00:00

Compare

aarifullin reviewed 2024-01-18 08:58:35 +00:00

pkg/chain/chain_easyjson.go Outdated

					
				@ -0,0 +85,4 @@

					{

						const prefix string = ",\"ID\":"

						out.RawString(prefix[1:])

						out.Base64Bytes(in.ID)

aarifullin commented

2024-01-18 08:58:35 +00:00

The file's header says

// Code generated by easyjson for marshaling/unmarshaling. DO NOT EDIT.

I looked at the last commit and I guess you have edited this on your own. Please, could you leave some warning comment like "this line is hand-written". Because anyone who is going to re-generate it, of course, will forget about Base64 :)

The file's header says ```go // Code generated by easyjson for marshaling/unmarshaling. DO NOT EDIT. ``` I looked at the last commit and I guess you have edited this on your own. Please, could you leave some warning comment like "this line is hand-written". Because anyone who is going to re-generate it, of course, will forget about `Base64` :)

dstepanov-yadro commented

2024-01-18 09:33:26 +00:00

Right, this file was edited with easyjson tool ran with make generate.
I have deleted ID's MarshalEasyJSON and UnmarshalEasyJSON methods, so easyjson changed chain marshal/unmarshal implementation.

Right, this file was edited with easyjson tool ran with `make generate`. I have deleted `ID`'s `MarshalEasyJSON` and `UnmarshalEasyJSON` methods, so easyjson changed chain marshal/unmarshal implementation.

👍 1

dkirillov approved these changes 2024-01-18 08:59:22 +00:00

aarifullin approved these changes 2024-01-18 10:48:24 +00:00

aarifullin left a comment

Excellent work! 👍

acid-ant approved these changes 2024-01-23 07:14:05 +00:00

Makefile Outdated

					
				@ -63,0 +65,4 @@

				easyjson-install:

					@rm -rf $(EASYJSON_DIR)

					@mkdir $(EASYJSON_DIR)

acid-ant commented

2024-01-23 07:06:57 +00:00

-p required.

`-p` required.

dstepanov-yadro commented

2024-01-23 08:30:32 +00:00

Done

acid-ant marked this conversation as resolved

fyrchik reviewed 2024-01-23 08:10:21 +00:00

pkg/chain/chain_easyjson.go Outdated

					
				@ -0,0 +1,463 @@

				// Code generated by easyjson for marshaling/unmarshaling. DO NOT EDIT.

fyrchik commented

2024-01-23 07:27:09 +00:00

What about adding /**/*_easyjson.go -diff -merge in .gitattributes like we do for .pb.go in node?

What about adding `/**/*_easyjson.go -diff -merge` in `.gitattributes` like we do for `.pb.go` in node?

fyrchik commented

2024-01-23 07:27:10 +00:00

What about adding /**/*_easyjson.go -diff -merge in .gitattributes like we do for .pb.go in node?

What about adding `/**/*_easyjson.go -diff -merge` in `.gitattributes` like we do for `.pb.go` in node?

dstepanov-yadro commented

2024-01-23 08:30:21 +00:00

Done

pkg/marshal/marshal.go Outdated

					
				@ -0,0 +91,4 @@

					return offset, nil

				}

				func SliceUnmarshal[T any](buf []byte, offset int, unmarshalT func(buf []byte, offset int) (T, int, error)) ([]T, int, error) {

fyrchik commented

2024-01-23 07:29:01 +00:00

Why have you decided to implement this package instead of reusing https://github.com/nspcc-dev/neo-go/tree/master/pkg/io ? Speed?

dstepanov-yadro commented

2024-01-23 08:26:26 +00:00

Counter question: Why did I have to use it? And why not one of the https://github.com/alecthomas/go_serialization_benchmarks#results :)

The real answer is that I don't want to make some additional deps. Also I don't use any or reflect.

Counter question: Why did I have to use it? And why not one of the https://github.com/alecthomas/go_serialization_benchmarks#results :) The real answer is that I don't want to make some additional deps. Also I don't use `any` or `reflect`.

👍 1

fyrchik commented

2024-01-23 13:54:41 +00:00

Why did I have to use it?

You didn't have to, it was a question. My argument: because it is common in our ecosystem, has a nice streaming interface and doesn't count as "new" format we all need to learn.

The real answer is that I don't want to make some additional deps.

That's perfect, I don't mind leaving it as is. My problems with all marshalers is that they seem to be impossible to write correctly (without panics) from the first try. Maybe add some fuzz tests here? It would be quite simple, we just want to be sure that unmarshal doesn't panic.

>Why did I have to use it? You didn't have to, it was a question. My argument: because it is common in our ecosystem, has a nice streaming interface and doesn't count as "new" format we all need to learn. >The real answer is that I don't want to make some additional deps. That's perfect, I don't mind leaving it as is. My problems with all marshalers is that they seem to be impossible to write correctly (without panics) from the first try. Maybe add some fuzz tests here? It would be quite simple, we just want to be sure that unmarshal doesn't panic.

dstepanov-yadro commented

2024-01-23 16:50:51 +00:00

I have added tests for marshal.go, now test coverage is 93.9%.
Tests have been added to the chain marshalling that test a variety of options. Of course, this is not a guarantee, but still.
Fuzzing for a chain looks very complicated: there are too many parameters.
Panics are possible, for example, when allocating memory for a slice, but this panic depends not only on the number of elements, but also on the size of the element, so it is difficult to check.

I have added tests for `marshal.go`, now test coverage is 93.9%. Tests have been added to the chain marshalling that test a variety of options. Of course, this is not a guarantee, but still. Fuzzing for a chain looks very complicated: there are too many parameters. Panics are possible, for example, when allocating memory for a slice, but this panic depends not only on the number of elements, but also on the size of the element, so it is difficult to check.

fyrchik commented

2024-01-23 18:02:11 +00:00

No-no, we fuzz just unmarshaling -- it is raw bytes, so should be easy. Marshaling is more believable to not panic.
An example with similar test ef99a7a9e3/pkg/core/transaction/fuzz_test.go (L11)

No-no, we fuzz just unmarshaling -- it is raw bytes, so should be easy. Marshaling is more believable to not panic. An example with similar test https://github.com/nspcc-dev/neo-go/blob/ef99a7a9e33fcae86daaf3a64ce1e9b5ab12c223/pkg/core/transaction/fuzz_test.go#L11

fyrchik commented

2024-01-23 18:05:44 +00:00

For example, it seems we do not have any size checks in SliceUnmarshal(): what happens if I use 9999999 length which is followed by just 3 bytes. It is invalid, but I allocate 9999999 bytes (add more nines and it is an easy OOM).
Am I missing something?
In neo-go the problem is solved by restricting array size, I think sth similar is applicable here (chains go in contract, so we may restrict them to 64k in size).

For example, it seems we do not have any size checks in `SliceUnmarshal()`: what happens if I use `9999999` length which is followed by just 3 bytes. It is invalid, but I allocate `9999999` bytes (add more nines and it is an easy OOM). Am I missing something? In neo-go the problem is solved by restricting array size, I think sth similar is applicable here (chains go in contract, so we may restrict them to 64k in size).

dstepanov-yadro commented

2024-01-24 07:17:01 +00:00

problem is solved by restricting array size

Yes, it is. But total slice size depends on item size, so this check is not strict, so OOM may happen even if size limited.

what happens if I use 9999999 length which is followed by just 3 bytes.

Here is the code:

	result := make([]T, size)
	for idx := 0; idx < len(result); idx++ {
		result[idx], offset, err = unmarshalT(buf, offset)
		if err != nil {
			return nil, 0, err
		}
	}

Every unmarshal checks offset and buffer size, so err will be returned. See corrupted slice size and other negative unit test.

> problem is solved by restricting array size Yes, it is. But total slice size depends on item size, so this check is not strict, so OOM may happen even if size limited. > what happens if I use 9999999 length which is followed by just 3 bytes. Here is the code: ``` result := make([]T, size) for idx := 0; idx < len(result); idx++ { result[idx], offset, err = unmarshalT(buf, offset) if err != nil { return nil, 0, err } } ``` Every `unmarshal` checks offset and buffer size, so `err` will be returned. See `corrupted slice size` and other negative unit test.

fyrchik commented

2024-01-24 07:39:23 +00:00

result := make([]T, size) -- this will panic, right?

But total slice size depends on item size

Yes, but the size of unmarshaled chain is comparable to the size of the slice of bytes (and our slice fits in memory).

We just cannot allow having panic in unmarshal, it works with untrusted data.

`result := make([]T, size)` -- this will panic, right? >But total slice size depends on item size Yes, but the size of unmarshaled chain is comparable to the size of the slice of bytes (and our slice fits in memory). We just cannot allow having panic in unmarshal, it works with untrusted data.

dstepanov-yadro commented

2024-01-24 08:20:23 +00:00

Agree. Added fuzz test and slice size check.