Support zstd training mode #105

Open
opened 2023-03-08 23:34:29 +00:00 by snegurochka · 0 comments

Original issue: https://github.com/nspcc-dev/neofs-node/issues/2120

Zstd can use pre-calculated dictionaries to compress data faster with a better ratio https://facebook.github.io/zstd/#small-data

Use-cases:

  1. Compress really small objects. Most of the object data is taken by header, we can use dictionary trained on headers with common attributes/fields.
  2. Compress specific content types. Here it can act as a poor-man deduplication.

The possible problem here is that if we lose dictionary we also lose compressed data.
In this task I suggest discussing possible config file format and checking 2 things:

  1. It should be possible to determine whether we need a dictionary based on zstd header.
  2. Check the validity of the 1st case (really small objects). We can take common object attributes from S3/http gateway and determine the maximum object size that benefits from compressing with dictionary.
  3. If our library does not support this case, see if another one matches the speed of our current implementation.
Original issue: https://github.com/nspcc-dev/neofs-node/issues/2120 Zstd can use pre-calculated dictionaries to compress data faster with a better ratio https://facebook.github.io/zstd/#small-data Use-cases: 1. Compress really small objects. Most of the object data is taken by header, we can use dictionary trained on headers with common attributes/fields. 2. Compress specific content types. Here it can act as a poor-man deduplication. The possible problem here is that if we lose dictionary we also lose compressed data. In this task I suggest discussing possible config file format and checking 2 things: 1. It should be possible to determine whether we need a dictionary based on zstd header. 2. Check the validity of the 1st case (really small objects). We can take common object attributes from S3/http gateway and determine the maximum object size that benefits from compressing with dictionary. 3. If our library does not support this case, see if another one matches the speed of our current implementation.
fyrchik added this to the vNext milestone 2023-05-18 08:41:33 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#105
There is no content yet.