Different epoch on nodes and log has a lot of errors after big load #344

Closed
opened 2023-05-12 12:38:06 +00:00 by anikeev-yadro · 1 comment
Member

Steps to Reproduce (for bugs)

1.Started two loads on paralell for prefill nodes:

/home/service/xk6-frostfs-alex/k6 run \
-e DURATION=43200 \
-e WRITERS=100 \
-e REGISTRY_FILE=/var/data/tmp/s3_reg_prefill-64-1-110523.json \
-e WRITE_OBJ_SIZE=64 \
-e S3_ENDPOINTS=10.78.70.148:8084,10.78.70.149:8084,10.78.70.150:8084,10.78.70.151:8084 \
-e PREGEN_JSON=/var/data/tmp/s3_110523-prefill-64.json \
-e STREAM_TIMEOUT=60 \
-e SLEEP_READ=0.2 \
-e SLEEP_WRITE=0.2 \
scenarios/s3.js &
/home/service/xk6-frostfs-alex/k6 run \
-e DURATION=43200 \
-e WRITERS=200 \
-e REGISTRY_FILE=/var/data/tmp/grpc_reg_prefill-s3-100000-1-110523.json \
-e WRITE_OBJ_SIZE=100000 \
-e S3_ENDPOINTS=10.78.70.148:8084,10.78.70.149:8084,10.78.70.150:8084,10.78.70.151:8084 \
-e PREGEN_JSON=/var/data/tmp/s3_110523-prefill-100000.json \
-e STREAM_TIMEOUT=60 \
-e SLEEP_READ=0.2 \
-e SLEEP_WRITE=0.5 \
scenarios/s3.js &
  1. After load was completed I saw different epochs on nodes
service@az:~$ frostfs-cli netmap snapshot -r localhost:8080 -g
Epoch: 281

service@buky:~$ frostfs-cli netmap snapshot -r localhost:8080 -g
Epoch: 281

service@vedi:~$ frostfs-cli netmap snapshot -r localhost:8080 -g
Epoch: 181

service@glagoli:~$ frostfs-cli netmap snapshot -r localhost:8080 -g
Epoch: 181
  1. At the moment I saw in log on vedi that epochs was ticks
service@vedi:~$ journalctl -S '2h ago' -u frostfs-ir|grep -i 'new epoch'
May 12 07:45:45 vedi frostfs-ir[20940]: 2023-05-12T07:45:45.527Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 270}
May 12 07:55:53 vedi frostfs-ir[20940]: 2023-05-12T07:55:53.919Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 271}
May 12 08:06:02 vedi frostfs-ir[20940]: 2023-05-12T08:06:02.310Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 272}
May 12 08:16:10 vedi frostfs-ir[20940]: 2023-05-12T08:16:10.700Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 273}
May 12 08:26:19 vedi frostfs-ir[20940]: 2023-05-12T08:26:19.099Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 274}
May 12 08:36:27 vedi frostfs-ir[20940]: 2023-05-12T08:36:27.495Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 275}
May 12 08:46:35 vedi frostfs-ir[20940]: 2023-05-12T08:46:35.890Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 276}
May 12 08:56:44 vedi frostfs-ir[20940]: 2023-05-12T08:56:44.276Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 277}
May 12 09:06:52 vedi frostfs-ir[20940]: 2023-05-12T09:06:52.670Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 278}
May 12 09:17:01 vedi frostfs-ir[20940]: 2023-05-12T09:17:01.056Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 279}
May 12 09:27:09 vedi frostfs-ir[20940]: 2023-05-12T09:27:09.445Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 280}
May 12 09:37:17 vedi frostfs-ir[20940]: 2023-05-12T09:37:17.854Z        info        netmap/handlers.go:29        notification        {"type": "new epoch", "value": 281}

And a lot of errors:

May 11 16:44:36 vedi frostfs-node[21270]: 2023-05-11T16:44:36.982Z        error        policer/check.go:202        receive object header to check policy compliance        {"component": "Object Policer", "object": "BsN3bfgbvMv9JqrPurTU4xYS6MPRhRijuedcbRg3hxJt/BFtbgtaQJkh1AexbKmmgCHQdnNBe3v1ZxqCy476f6sJk", "error": "(*headsvc.RemoteHeader) could not head object in [/ip4/192.168.201.50/tcp/8080 /ip4/192.168.201.150/tcp/8080 /ip4/10.78.70.150/tcp/8080 /ip4/10.78.71.150/tcp/8080]: read object header from FrostFS: status: code = 1024 message = connection lost before registering response channel"}

Version

FrostFS Storage node
Version: v0.0.1-414-gc60029d3
GoVersion: go1.18.4

Your Environment

HW

## Steps to Reproduce (for bugs) 1.Started two loads on paralell for prefill nodes: ``` /home/service/xk6-frostfs-alex/k6 run \ -e DURATION=43200 \ -e WRITERS=100 \ -e REGISTRY_FILE=/var/data/tmp/s3_reg_prefill-64-1-110523.json \ -e WRITE_OBJ_SIZE=64 \ -e S3_ENDPOINTS=10.78.70.148:8084,10.78.70.149:8084,10.78.70.150:8084,10.78.70.151:8084 \ -e PREGEN_JSON=/var/data/tmp/s3_110523-prefill-64.json \ -e STREAM_TIMEOUT=60 \ -e SLEEP_READ=0.2 \ -e SLEEP_WRITE=0.2 \ scenarios/s3.js & ``` ``` /home/service/xk6-frostfs-alex/k6 run \ -e DURATION=43200 \ -e WRITERS=200 \ -e REGISTRY_FILE=/var/data/tmp/grpc_reg_prefill-s3-100000-1-110523.json \ -e WRITE_OBJ_SIZE=100000 \ -e S3_ENDPOINTS=10.78.70.148:8084,10.78.70.149:8084,10.78.70.150:8084,10.78.70.151:8084 \ -e PREGEN_JSON=/var/data/tmp/s3_110523-prefill-100000.json \ -e STREAM_TIMEOUT=60 \ -e SLEEP_READ=0.2 \ -e SLEEP_WRITE=0.5 \ scenarios/s3.js & ``` 2. After load was completed I saw different epochs on nodes ``` service@az:~$ frostfs-cli netmap snapshot -r localhost:8080 -g Epoch: 281 service@buky:~$ frostfs-cli netmap snapshot -r localhost:8080 -g Epoch: 281 service@vedi:~$ frostfs-cli netmap snapshot -r localhost:8080 -g Epoch: 181 service@glagoli:~$ frostfs-cli netmap snapshot -r localhost:8080 -g Epoch: 181 ``` 3. At the moment I saw in log on vedi that epochs was ticks ``` service@vedi:~$ journalctl -S '2h ago' -u frostfs-ir|grep -i 'new epoch' May 12 07:45:45 vedi frostfs-ir[20940]: 2023-05-12T07:45:45.527Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 270} May 12 07:55:53 vedi frostfs-ir[20940]: 2023-05-12T07:55:53.919Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 271} May 12 08:06:02 vedi frostfs-ir[20940]: 2023-05-12T08:06:02.310Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 272} May 12 08:16:10 vedi frostfs-ir[20940]: 2023-05-12T08:16:10.700Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 273} May 12 08:26:19 vedi frostfs-ir[20940]: 2023-05-12T08:26:19.099Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 274} May 12 08:36:27 vedi frostfs-ir[20940]: 2023-05-12T08:36:27.495Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 275} May 12 08:46:35 vedi frostfs-ir[20940]: 2023-05-12T08:46:35.890Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 276} May 12 08:56:44 vedi frostfs-ir[20940]: 2023-05-12T08:56:44.276Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 277} May 12 09:06:52 vedi frostfs-ir[20940]: 2023-05-12T09:06:52.670Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 278} May 12 09:17:01 vedi frostfs-ir[20940]: 2023-05-12T09:17:01.056Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 279} May 12 09:27:09 vedi frostfs-ir[20940]: 2023-05-12T09:27:09.445Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 280} May 12 09:37:17 vedi frostfs-ir[20940]: 2023-05-12T09:37:17.854Z info netmap/handlers.go:29 notification {"type": "new epoch", "value": 281} ``` And a lot of errors: ``` May 11 16:44:36 vedi frostfs-node[21270]: 2023-05-11T16:44:36.982Z error policer/check.go:202 receive object header to check policy compliance {"component": "Object Policer", "object": "BsN3bfgbvMv9JqrPurTU4xYS6MPRhRijuedcbRg3hxJt/BFtbgtaQJkh1AexbKmmgCHQdnNBe3v1ZxqCy476f6sJk", "error": "(*headsvc.RemoteHeader) could not head object in [/ip4/192.168.201.50/tcp/8080 /ip4/192.168.201.150/tcp/8080 /ip4/10.78.70.150/tcp/8080 /ip4/10.78.71.150/tcp/8080]: read object header from FrostFS: status: code = 1024 message = connection lost before registering response channel"} ``` ## Version ``` FrostFS Storage node Version: v0.0.1-414-gc60029d3 GoVersion: go1.18.4 ``` ## Your Environment HW
anikeev-yadro added the
bug
triage
labels 2023-05-12 12:38:06 +00:00
Owner

Could be fixed by #343, need testing.

Could be fixed by #343, need testing.
fyrchik added this to the v0.37.0 milestone 2023-05-18 10:47:57 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#344
No description provided.