Support zstd training mode #105

New issue

Open

opened 2023-03-08 23:34:29 +00:00 by snegurochka · 0 comments

snegurochka commented

2023-03-08 23:34:29 +00:00

Member

Original issue: https://github.com/nspcc-dev/neofs-node/issues/2120

Zstd can use pre-calculated dictionaries to compress data faster with a better ratio https://facebook.github.io/zstd/#small-data

Use-cases:

Compress really small objects. Most of the object data is taken by header, we can use dictionary trained on headers with common attributes/fields.
Compress specific content types. Here it can act as a poor-man deduplication.

The possible problem here is that if we lose dictionary we also lose compressed data.
In this task I suggest discussing possible config file format and checking 2 things:

It should be possible to determine whether we need a dictionary based on zstd header.
Check the validity of the 1st case (really small objects). We can take common object attributes from S3/http gateway and determine the maximum object size that benefits from compressing with dictionary.
If our library does not support this case, see if another one matches the speed of our current implementation.

Original issue: https://github.com/nspcc-dev/neofs-node/issues/2120 Zstd can use pre-calculated dictionaries to compress data faster with a better ratio https://facebook.github.io/zstd/#small-data Use-cases: 1. Compress really small objects. Most of the object data is taken by header, we can use dictionary trained on headers with common attributes/fields. 2. Compress specific content types. Here it can act as a poor-man deduplication. The possible problem here is that if we lose dictionary we also lose compressed data. In this task I suggest discussing possible config file format and checking 2 things: 1. It should be possible to determine whether we need a dictionary based on zstd header. 2. Check the validity of the 1st case (really small objects). We can take common object attributes from S3/http gateway and determine the maximum object size that benefits from compressing with dictionary. 3. If our library does not support this case, see if another one matches the speed of our current implementation.