Performance drop during Failover (disk fail, node reboot) #286

New issue

Closed

opened 2023-04-26 13:33:36 +00:00 by a.bogatyrev · 2 comments

a.bogatyrev commented

2023-04-26 13:33:36 +00:00

Owner

When a disk fails, there is a performance degradation on all nodes with errors.

This causes client errors. At the same time, errors appear in the s3-gate that there are no corresponding trees on other nodes. Perhaps the root of the problem is in the field of tree-service.

When a node is turned off, performance drops to zero on 3 out of 4 nodes. Until the node is returned.

апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.379Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.381Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.387Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.392Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.379Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.381Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.382Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}
апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.384Z        debug        services/tree_client_grpc.go:341        tree request error        {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"}

marker 1 - disk failure
marker 2 - node reboot

Load pattern: s3, 1Mb, 200 threads

When a disk fails, there is a performance degradation on all nodes with errors. This causes client errors. At the same time, errors appear in the s3-gate that there are no corresponding trees on other nodes. Perhaps the root of the problem is in the field of tree-service. When a node is turned off, performance drops to zero on 3 out of 4 nodes. Until the node is returned. ``` апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.379Z debug services/tree_client_grpc.go:341 tree request error {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.381Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.387Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:48 az frostfs-s3-gw[196406]: 2023-04-26T11:50:48.392Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.379Z debug services/tree_client_grpc.go:341 tree request error {"address": "localhost:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.381Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.22:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.382Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.23:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} апр 26 11:50:50 az frostfs-s3-gw[196406]: 2023-04-26T11:50:50.384Z debug services/tree_client_grpc.go:341 tree request error {"address": "192.168.204.24:8080", "error": "not found: rpc error: code = Unknown desc = tree not found"} ``` ![image](/attachments/1e0d886c-cae9-4d96-9ad2-0bc3d38947fe) marker 1 - disk failure marker 2 - node reboot Load pattern: s3, 1Mb, 200 threads

image.png

127 KiB

a.bogatyrev added the

triage

label 2023-04-26 13:33:36 +00:00

snegurochka added the

bug

label 2023-05-03 17:14:40 +00:00

fyrchik added this to the v0.37.0 milestone 2023-05-18 08:31:42 +00:00

alexvanin self-assigned this 2023-05-25 07:55:11 +00:00

alexvanin commented

2023-05-25 08:12:14 +00:00

Owner

Seems more like S3 issue, similar to TrueCloudLab/frostfs-s3-gw#110

Need to retest with the fix in support branch: TrueCloudLab/frostfs-s3-gw#115

Seems more like S3 issue, similar to https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/issues/110 Need to retest with the fix in support branch: https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/pulls/115

alexvanin commented

2023-06-29 07:35:52 +00:00

Owner

Close due to retest.

alexvanin closed this issue

2023-06-29 07:35:52 +00:00