container: Add ListStream method #1453

Merged
fyrchik merged 2 commits from elebedeva/frostfs-node:feat/stream-for-list into master 2024-12-17 13:31:34 +00:00
Member

#1452

Before:

$ frostfs-cli container list -c work-cfg.yml --timeout 300s 
rpc error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4214277 vs. 4194304)

After:

$ frostfs-cli container list -c work-cfg.yml --timeout 300s 
11LPQTHbmiRUnsoJhfgnsUZjWjEuiDc1T8myNSyDMq2
11QiSyAn5Rkgdqo1ey6iiqktxEADX2gTBDSvNE4MAVu
11RRxjfQBRFikgkAyQV6Dx6hozLqFUbryn5VZCXPXm6
...
zzKDnULizt6S4ZsQ5RpMiJFWwJSag2LMoPK7RT35Py3
zzcNxWky9HJFerWjBMUusxgqWhT8F1QgMohbRBizKh2
zzrqmcPFste7iTU2eSqthzBEwGVKi6nmzYqNCd7SdcS
$ frostfs-cli container list -c work-cfg.yml --timeout 300s | wc -l
117054

Signed-off-by: Ekaterina Lebedeva ekaterina.lebedeva@yadro.com

#1452 Before: ``` $ frostfs-cli container list -c work-cfg.yml --timeout 300s rpc error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4214277 vs. 4194304) ``` After: ``` $ frostfs-cli container list -c work-cfg.yml --timeout 300s 11LPQTHbmiRUnsoJhfgnsUZjWjEuiDc1T8myNSyDMq2 11QiSyAn5Rkgdqo1ey6iiqktxEADX2gTBDSvNE4MAVu 11RRxjfQBRFikgkAyQV6Dx6hozLqFUbryn5VZCXPXm6 ... zzKDnULizt6S4ZsQ5RpMiJFWwJSag2LMoPK7RT35Py3 zzcNxWky9HJFerWjBMUusxgqWhT8F1QgMohbRBizKh2 zzrqmcPFste7iTU2eSqthzBEwGVKi6nmzYqNCd7SdcS ``` ``` $ frostfs-cli container list -c work-cfg.yml --timeout 300s | wc -l 117054 ``` Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
elebedeva force-pushed feat/stream-for-list from 989336df37 to d816e508ab 2024-11-26 02:07:48 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from d816e508ab to fe3225bbae 2024-11-27 13:49:33 +00:00 Compare
elebedeva changed title from WIP: container: Add ListStream method to container: Add ListStream method 2024-11-27 13:50:26 +00:00
elebedeva added the
frostfs-cli
frostfs-node
blocked
labels 2024-11-27 13:51:00 +00:00
elebedeva force-pushed feat/stream-for-list from fe3225bbae to f24b837331 2024-11-27 14:10:19 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from f24b837331 to 9439cee550 2024-11-27 14:58:37 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 9439cee550 to c66e3b11a2 2024-11-27 15:35:04 +00:00 Compare
Author
Member

Will remove module replacement in go.mod after TrueCloudLab/frostfs-sdk-go#291 is merged.

Will remove module replacement in `go.mod` after TrueCloudLab/frostfs-sdk-go#291 is merged.
elebedeva requested review from storage-core-committers 2024-11-27 17:21:22 +00:00
elebedeva requested review from storage-core-developers 2024-11-27 17:21:23 +00:00
acid-ant reviewed 2024-11-28 12:33:58 +00:00
@ -88,0 +119,4 @@
for {
n, ok = rdr.Read(buf)
for i := range n {
list = append(list, buf[i])
Member

How about to use callback here instead of append?

How about to use callback here instead of append?
Author
Member

Done

Done
a-savchuk reviewed 2024-11-28 13:37:22 +00:00
@ -88,0 +132,4 @@
}
sort.Slice(list, func(i, j int) bool {
lhs, rhs := list[i].EncodeToString(), list[j].EncodeToString()
Member

Can we compare IDs using their byte representation instead of as strings?

Can we compare IDs using their byte representation instead of as strings?
Member

I think, it's better to use slices.SortFunc instead of sort.Slice according to 15102e6dfd

I think, it's better to use `slices.SortFunc` instead of `sort.Slice` according to 15102e6dfd
Author
Member

Replaced sort.Slice with slices.SortFunc.

About comparing IDs as byte arrays: we do have cid.ID.Encode(dst []byte), but IMO string comparison is easier to read.

Replaced `sort.Slice` with `slices.SortFunc`. About comparing IDs as byte arrays: we do have `cid.ID.Encode(dst []byte)`, but IMO string comparison is easier to read.
Owner
@a-savchuk we can after https://git.frostfs.info/TrueCloudLab/frostfs-sdk-go/issues/294
a-savchuk marked this conversation as resolved
@ -842,2 +843,4 @@
fatalOnErr(err)
amount := config.UintSafe(appCfg.Sub("morph"), "container_batch_size")
if amount <= 0 {
Member

It can't be less than zero, if amount == 0 is enough

It can't be less than zero, `if amount == 0` is enough
Author
Member

Removed this entire section as you suggested here.

Removed this entire section as you suggested [here](https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1453#issuecomment-59491).
a-savchuk marked this conversation as resolved
@ -843,1 +844,4 @@
amount := config.UintSafe(appCfg.Sub("morph"), "container_batch_size")
if amount <= 0 {
amount = 1000
Member

How about reading this value in a separate function in cmd/frostfs-node/config/morph.go, similar to how it's done for container_cache_size (examples below)? You can also define the default value as a constant there

// ContainerCacheSizeDefault represents the default size for the container cache.
ContainerCacheSizeDefault = 100


// ContainerCacheSize returns the value of "container_cache_size" config parameter
// from "morph" section.
//
// Returns 0 if the value is not positive integer.
// Returns ContainerCacheSizeDefault if the value is missing.
func ContainerCacheSize(c *config.Config) uint32 {
if c.Sub(subsection).Value("container_cache_size") == nil {
return ContainerCacheSizeDefault
}
return config.Uint32Safe(c.Sub(subsection), "container_cache_size")
}


What do you think?

How about reading this value in a separate function in `cmd/frostfs-node/config/morph.go`, similar to how it's done for `container_cache_size` (examples below)? You can also define the default value as a constant there https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/01acec708fa8f0c536ebc79e8f03ee2420ef3731/cmd/frostfs-node/config/morph/config.go#L34-L35 https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/01acec708fa8f0c536ebc79e8f03ee2420ef3731/cmd/frostfs-node/config/morph/config.go#L109-L119 What do you think?
Member

Moreover, I think you should write a test for reading this new configuration option. You can find an example in cmd/frostfs-node/config/morph/config_test.go.

This test will read the new option from the config file in different formats, so you may need to replicate this option in those formats as mentioned in one of my comments:

We have configuration examples in YAML, JSON, and dotenv formats, which mostly duplicate each other. It'd be very helpful if you could replicate this new configuration option in the other config files we have

Moreover, I think you should write a test for reading this new configuration option. You can find an example in [cmd/frostfs-node/config/morph/config_test.go](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/01acec708fa8f0c536ebc79e8f03ee2420ef3731/cmd/frostfs-node/config/morph/config_test.go). This test will read the new option from the config file in different formats, so you may need to replicate this option in those formats as mentioned in one of my comments: > We have configuration examples in YAML, JSON, and dotenv formats, which mostly duplicate each other. It'd be very helpful if you could replicate this new configuration option in the other config files we have
Member

From your recent commit 591479e92b:

// Returns 0 if the value is not positive integer.
// Returns ContainerBatchSizeDefault if the value is missing.

I think, we expect container_batch_size either to be set in config or have the default value; zero batch size has no sense. What about that?

func ContainerBatchSize(c *config.Config) uint32 {
	size := config.Uint32Safe(c.Sub(subsection), "container_batch_size")
	if size == 0 {
	    return ContainerBatchSizeDefault
	}
	return size
}
From your recent commit 591479e92b: > // Returns 0 if the value is not positive integer. // Returns ContainerBatchSizeDefault if the value is missing. I think, we expect `container_batch_size` either to be set in config or have the default value; zero batch size has no sense. What about that? ```go func ContainerBatchSize(c *config.Config) uint32 { size := config.Uint32Safe(c.Sub(subsection), "container_batch_size") if size == 0 { return ContainerBatchSizeDefault } return size } ```
Author
Member

Done!

Done!
a-savchuk marked this conversation as resolved
a-savchuk reviewed 2024-11-28 13:41:39 +00:00
@ -88,0 +91,4 @@
}
// SortedIDList returns sorted identifiers of the matched containers.
func (x ContainerListStreamRes) SortedIDList() []cid.ID {
Member

I didn't quite catch why we need to sort it here. I see that we already sort it at the end of ListContainersStream. Could you please explain your idea?

I didn't quite catch why we need to sort it here. I see that we already sort it at the end of `ListContainersStream`. Could you please explain your idea?
Author
Member

Oh, I forgot to remove sorting from internal.ListContainersStream(). The idea was that sorting of ContainerListStreamRes looks similar to ContainerListRes. Now it's irrelevant though, as internal.ListContainersStream() returns only error and prints cid directly to cmd.

Oh, I forgot to remove sorting from `internal.ListContainersStream()`. The idea was that sorting of `ContainerListStreamRes` looks similar to `ContainerListRes`. Now it's irrelevant though, as `internal.ListContainersStream()` returns only error and prints `cid` directly to `cmd`.
Member

Thank you. Do we still need ContainerListStreamRes structure?

Thank you. Do we still need `ContainerListStreamRes` structure?
Author
Member

No, we don't. Removed

No, we don't. Removed
a-savchuk marked this conversation as resolved
a-savchuk reviewed 2024-11-28 13:55:03 +00:00
@ -83,6 +83,7 @@ morph:
# Default value: block time. It is recommended to have this value less or equal to block time.
# Cached entities: containers, container lists, eACL tables.
container_cache_size: 100 # container_cache_size is is the maximum number of containers in the cache.
container_batch_size: 1000 # container_batch_size is the maximum amount of containers to send via stream at once
Member

We have configuration examples in YAML, JSON, and dotenv formats, which mostly duplicate each other. It'd be very helpful if you could replicate this new configuration option in the other config files we have

We have configuration examples in YAML, JSON, and dotenv formats, which mostly duplicate each other. It'd be very helpful if you could replicate this new configuration option in the other config files we have
Author
Member

Fixed

Fixed
a-savchuk marked this conversation as resolved
elebedeva force-pushed feat/stream-for-list from c66e3b11a2 to 9a61583e3a 2024-12-10 11:23:36 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 9a61583e3a to 360d230c09 2024-12-10 11:28:50 +00:00 Compare
aarifullin reviewed 2024-12-11 09:46:24 +00:00
@ -58,0 +62,4 @@
if e, ok := status.FromError(err); ok {
switch e.Code() {
case codes.Unimplemented:
resV1, err := internalclient.ListContainers(cmd.Context(), prm)
Member

Could you, please, use another naming for the variable? I mean, ...V1 suffix seems inconvenient as it may be mistaken for old api-go version.
resNonStream could be absolutely fine

Could you, please, use another naming for the variable? I mean, `...V1` suffix seems inconvenient as it may be mistaken for old api-go version. `resNonStream` could be absolutely fine
Author
Member

Renamed it to just res as internal.ListContainersStream() returns only error now.

Renamed it to just `res` as `internal.ListContainersStream()` returns only error now.
aarifullin marked this conversation as resolved
fyrchik reviewed 2024-12-11 10:23:13 +00:00
@ -58,0 +59,4 @@
var containerIDs []cid.ID
res, err := internalclient.ListContainersStream(cmd.Context(), prm)
if err != nil {
if e, ok := status.FromError(err); ok {
Owner

How about

if e, ok := status.FromError(err); ok && e.Code() == codes.Unimplemented {
 ...
}

instead of

if e, ok := status.FromError(err); ok {
  switch e.Code()  {
  case codes.Unimplemented:
  ...
  default:
  ...
  }
}
How about ```go if e, ok := status.FromError(err); ok && e.Code() == codes.Unimplemented { ... } ``` instead of ```go if e, ok := status.FromError(err); ok { switch e.Code() { case codes.Unimplemented: ... default: ... } } ```
Author
Member

Fixed

Fixed
fyrchik marked this conversation as resolved
aarifullin reviewed 2024-12-11 11:33:56 +00:00
@ -203,0 +230,4 @@
r := new(container.ListStreamResponse)
r.SetBody(resBody)
return stream.Send(r)
Member

Could you explain, please, what is difference between ListStream and List then?
The response is being sent via stream and it seems it sends the only message. I don't see any point to use streaming then.
I assumed we would fetch containers by batches and send them via stream - apparently, I am missing something

Could you explain, please, what is difference between `ListStream` and [List](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/360d230c093ff114e59cf698b62cf24866eed18a/pkg/services/container/morph/executor.go#L175-L202) then? The response is being sent via `stream` and it seems it sends the only message. I don't see any point to use streaming then. I assumed we would fetch containers by batches and send them via stream - apparently, I am missing something
Member

Nevermind. I thought the problem is that we can't receive the entire container list on the server side (kind of OOM or something) and I expected we'd receive containers by batches and stream these batches to the client.
I glanced at the issue - the problem is basically with List handler that causes the error for the client side. So, stream.Send gracefully handles this 👍

Nevermind. I thought the problem is that we can't receive the entire container list on the server side (kind of OOM or something) and I expected we'd receive containers by batches and stream these batches to the client. I glanced at the issue - the problem is basically with `List` handler that causes the error for the client side. So, `stream.Send` gracefully handles this 👍
aarifullin marked this conversation as resolved
elebedeva force-pushed feat/stream-for-list from 360d230c09 to 12d8706da2 2024-12-11 15:16:47 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 12d8706da2 to 591479e92b 2024-12-12 12:43:23 +00:00 Compare
fyrchik reviewed 2024-12-12 12:46:40 +00:00
@ -88,0 +92,4 @@
// SortedIDList returns sorted identifiers of the matched containers.
func (x ContainerListStreamRes) SortedIDList() []cid.ID {
list := x.ids
Owner

What do you achieve by introducing a variable list?

What do you achieve by introducing a variable `list`?
Author
Member

This function is removed as it is no longer needed.

This function is removed as it is no longer needed.
fyrchik marked this conversation as resolved
@ -88,0 +110,4 @@
return res, fmt.Errorf("init container list: %w", err)
}
buf := make([]cid.ID, 10)
Owner

What is 10?

What is 10?
Author
Member

It was hardcoded buffer size. Doesn't matter anymore since rdr.Read() that needed buffer to read to was replaced with rdr.Iterate().

It was hardcoded buffer size. Doesn't matter anymore since `rdr.Read()` that needed buffer to read to was replaced with `rdr.Iterate()`.
fyrchik marked this conversation as resolved
@ -88,0 +117,4 @@
var ok bool
for {
n, ok = rdr.Read(buf)
Owner

Why not reuse rdr.Iterate(...)?

Why not reuse `rdr.Iterate(...)`?
Author
Member

Replaced rdr.Read() with rdr.Iterate().

Replaced `rdr.Read()` with `rdr.Iterate()`.
fyrchik marked this conversation as resolved
@ -52,3 +55,3 @@
var prm internalclient.ListContainersPrm
prm.SetClient(cli)
prm.Account = idUser
prm.OwnerID = idUser
Owner

This change should be in a separate commit (unless there are some problems with this).

This change should be in a separate commit (unless there are some problems with this).
Author
Member

There is an issue with moving this one change to a separate commit: in TrueCloudLab/frostfs-sdk-go#291 this field was renamed after ListStream was added.

I can add a line about this change in the commit description.

There is an issue with moving this one change to a separate commit: in TrueCloudLab/frostfs-sdk-go#291 this field was [renamed](https://git.frostfs.info/TrueCloudLab/frostfs-sdk-go/commit/c4463df8d467a49d496ebd9816bcae57ecbf1fbd) **after** `ListStream` was added. I can add a line about this change in the commit description.
Owner

The mere presense of the ListStream in the SDK should affect nothing, right?

The mere presense of the `ListStream` in the SDK should affect nothing, right?
Owner

The problem is that server interface needs to be implemented, thats why

The problem is that server interface needs to be implemented, thats why
fyrchik marked this conversation as resolved
@ -839,9 +840,15 @@ func initContainer(appCfg *config.Config) cfgContainer {
containerWorkerPool, err := ants.NewPool(notificationHandlerPoolSize)
fatalOnErr(err)
amount := config.UintSafe(appCfg.Sub("morph"), "container_batch_size")
Owner

Why have you chosen morph section?
We have object section where put and get section may be configured.
SImilarly we could have container section with list_stream configuration.

Why have you chosen `morph` section? We have `object` section where `put` and `get` section may be configured. SImilarly we could have `container` section with `list_stream` configuration.
Author
Member

Moved from morph to container.

Moved from `morph` to `container`.
fyrchik marked this conversation as resolved
@ -50,6 +50,7 @@ func (c *cfg) initMorphComponents(ctx context.Context) {
var netmapSource netmap.Source
c.cfgMorph.containerCacheSize = morphconfig.ContainerCacheSize(c.appCfg)
c.cfgMorph.containerBatchSize = morphconfig.ContainerBatchSize(c.appCfg)
Owner

Again, why is it in cfgMorph?

Again, why is it in `cfgMorph`?
fyrchik marked this conversation as resolved
elebedeva force-pushed feat/stream-for-list from 591479e92b to bd4c4828f0 2024-12-12 14:16:36 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from bd4c4828f0 to 64b9bf1d7f 2024-12-12 14:42:15 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 64b9bf1d7f to 75ac7278dd 2024-12-12 14:50:50 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 75ac7278dd to 5be8806e4a 2024-12-12 15:06:18 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 5be8806e4a to d477a4acf1 2024-12-12 15:13:03 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from d477a4acf1 to 2bd0ab4cb8 2024-12-12 15:40:49 +00:00 Compare
achuprov reviewed 2024-12-13 09:42:09 +00:00
@ -88,0 +95,4 @@
}
if rdr.Iterate(printCnr) != nil {
return fmt.Errorf("read container list: %w", err)
Member

It seems that err will always be nil here.

It seems that `err` will always be `nil` here.
Author
Member

You're right, fixed

You're right, fixed
achuprov marked this conversation as resolved
elebedeva force-pushed feat/stream-for-list from 2bd0ab4cb8 to 7ffc82a066 2024-12-13 13:08:58 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 7ffc82a066 to 03d5a1d5ad 2024-12-13 14:38:05 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 03d5a1d5ad to ec0fc52e56 2024-12-13 14:42:05 +00:00 Compare
aarifullin approved these changes 2024-12-16 06:40:22 +00:00
Dismissed
aarifullin left a comment
Member

LGTM

LGTM
acid-ant reviewed 2024-12-16 07:43:28 +00:00
@ -85,3 +83,4 @@
return list
}
func ListContainersStream(ctx context.Context, prm ListContainersPrm, printCnr func(id cid.ID) bool) (err error) {
Member

In common, printCnr can do anything, not only print.

In common, `printCnr` can do anything, not only `print`.
Author
Member

Renamed to processCnr.

Renamed to `processCnr`.
fyrchik reviewed 2024-12-16 07:46:39 +00:00
@ -62,0 +61,4 @@
var containerIDs []cid.ID
err := internalclient.ListContainersStream(cmd.Context(), prm, func(id cid.ID) bool {
if flagVarListName == "" && !flagVarListPrintAttr {
Owner

It looks like the code below for "normal" listing.
Could you move it to a separate function? It may be a lambda defined locally.
But both printing routines are the same, I see no situation where one would be changed without the other.

It looks like the code below for "normal" listing. Could you move it to a separate function? It may be a lambda defined locally. But both printing routines _are_ the same, I see no situation where one would be changed without the other.
Owner

Another approach would be to reuse loop and define iterator over the slice.

Another approach would be to reuse loop and define iterator over the slice.
Author
Member

Moved container printing to a separate function.

Moved container printing to a separate function.
fyrchik marked this conversation as resolved
elebedeva force-pushed feat/stream-for-list from ec0fc52e56 to c1627c8eac 2024-12-17 10:15:38 +00:00 Compare
elebedeva dismissed aarifullin's review 2024-12-17 10:15:38 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

elebedeva force-pushed feat/stream-for-list from c1627c8eac to e9971d73c8 2024-12-17 10:33:55 +00:00 Compare
fyrchik reviewed 2024-12-17 12:10:40 +00:00
@ -84,2 +84,4 @@
FROSTFS_REPLICATOR_POOL_SIZE=10
# Container service section
FROSTFS_CONTAINER_LIST_STREAM_BATCH_SIZE=1000
Owner

The value you use in the example is the default. The test is somewhat useless then.
I suggest using some other value, e.g. 500.

The value you use in the example is the default. The test is somewhat useless then. I suggest using some other value, e.g. 500.
Author
Member

Fixed.

Fixed.
Owner

Tests are failing

Tests are failing
Author
Member

Forgot to change the value in TestContainerSection, should be fine now.

Forgot to change the value in `TestContainerSection`, should be fine now.
fyrchik marked this conversation as resolved
elebedeva force-pushed feat/stream-for-list from e9971d73c8 to 06be16b716 2024-12-17 12:58:11 +00:00 Compare
elebedeva force-pushed feat/stream-for-list from 06be16b716 to df05057ed4 2024-12-17 13:22:52 +00:00 Compare
aarifullin approved these changes 2024-12-17 13:29:33 +00:00
aarifullin left a comment
Member

Very nice! 👍

Very nice! 👍
a-savchuk approved these changes 2024-12-17 13:30:02 +00:00
Owner

Exquisite! 🥇

Exquisite! 🥇
fyrchik merged commit df05057ed4 into master 2024-12-17 13:31:34 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-core-committers
No milestone
No project
No assignees
6 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1453
No description provided.