Performance drop during Failover (disk fail, node reboot) #286

Closed
opened 2023-04-26 13:33:36 +00:00 by a.bogatyrev · 2 comments
Owner

When a disk fails, there is a performance degradation on all nodes with errors.

This causes client errors. At the same time, errors appear in the s3-gate that there are no corresponding trees on other nodes. Perhaps the root of the problem is in the field of tree-service.

When a node is turned off, performance drops to zero on 3 out of 4 nodes. Until the node is returned.

апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.379Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.381Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.387Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.392Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.379Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.381Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.382Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.384Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}

image

marker 1 - disk failure
marker 2 - node reboot

Load pattern: s3, 1Mb, 200 threads

When a disk fails, there is a performance degradation on all nodes with errors. This causes client errors. At the same time, errors appear in the s3-gate that there are no corresponding trees on other nodes. Perhaps the root of the problem is in the field of tree-service. When a node is turned off, performance drops to zero on 3 out of 4 nodes. Until the node is returned. ``` апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.379Z debug services/tree_client_grpc.go:341 tree request error {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.381Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.387Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.392Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.379Z debug services/tree_client_grpc.go:341 tree request error {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.381Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.382Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.384Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} ``` ![image](/attachments/1e0d886c-cae9-4d96-9ad2-0bc3d38947fe) marker 1 - disk failure marker 2 - node reboot Load pattern: s3, 1Mb, 200 threads
127 KiB
a.bogatyrev added the
triage
label 2023-04-26 13:33:36 +00:00
snegurochka added the
bug
label 2023-05-03 17:14:40 +00:00
fyrchik added this to the v0.37.0 milestone 2023-05-18 08:31:42 +00:00
alexvanin self-assigned this 2023-05-25 07:55:11 +00:00
Owner

Seems more like S3 issue, similar to TrueCloudLab/frostfs-s3-gw#110

Need to retest with the fix in support branch: TrueCloudLab/frostfs-s3-gw#115

Seems more like S3 issue, similar to https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/issues/110 Need to retest with the fix in support branch: https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/pulls/115
Owner

Close due to retest.

Close due to retest.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#286
No description provided.