Dev env produces a lot of zombie processes #69

Open
opened 2024-06-27 07:22:30 +00:00 by Phaseant · 7 comments

Dev env produces a lot of zombie frostfs-cli and grep processes

Expected Behavior

No zombie processes should be created

Current Behavior

A lot of zombie processes are waiting to end

Possible Solution

Steps to Reproduce (for bugs)

  1. make up
  2. list zombie processes ps -ef | grep defunct
  3. count them ps -ef | grep defunct | wc -l

Context

This issue affects performance of the whole server, CPU usage avg is 100% because of it

Dev env produces a lot of zombie frostfs-cli and grep processes ## Expected Behavior No zombie processes should be created ## Current Behavior A lot of zombie processes are waiting to end ## Possible Solution - ## Steps to Reproduce (for bugs) 1. `make up` 2. list zombie processes `ps -ef | grep defunct` 3. count them `ps -ef | grep defunct | wc -l` ## Context This issue affects performance of the whole server, CPU usage avg is 100% because of it
Phaseant added the
bug
label 2024-06-27 07:22:30 +00:00

parents of this processes are frostfs-node --config /etc/frostfs/storage/config.yml

parents of this processes are `frostfs-node --config /etc/frostfs/storage/config.yml`
It could be a healthcheck https://git.frostfs.info/TrueCloudLab/frostfs-dev-env/src/commit/2b6122192a4c26a84e6f8faa5533ab52fb2a88ef/services/storage/healthcheck.sh#L3
fyrchik self-assigned this 2024-06-27 10:51:11 +00:00

so it is expected behaviour?

so it is expected behaviour?

No, it is not, I was just trying to where these processes have spawned.

It likely happens when healthcheck hasn't been able to execute in a required timeout.
Here we have 1s 2b6122192a/services/storage/docker-compose.yml (L41)
And the default frostfs-cli timeout is 15 seconds.

So what happens is that we execute shell, which spawns subprocesses, then the parent is killed (because of the timeout), and all the children are retained.
The solution is to not spawn any subprocesses. For this to work we need to support this in frostfs-cli directly, I will create a task shortly.

Meanwhile, the solution could be increasing this timeout (from e.g. 1s to 30s), while also providing --timeout 1s argument to the frostfs-cli (so that internal healthcheck.sh script timeout will always be less than the timeout provided to docker compose).

No, it is not, I was just trying to where these processes have spawned. It likely happens when healthcheck hasn't been able to execute in a required timeout. Here we have 1s https://git.frostfs.info/TrueCloudLab/frostfs-dev-env/src/commit/2b6122192a4c26a84e6f8faa5533ab52fb2a88ef/services/storage/docker-compose.yml#L41 And the default `frostfs-cli` timeout is 15 seconds. So what happens is that we execute shell, which spawns subprocesses, then the parent is killed (because of the timeout), and all the children are retained. The solution is to not spawn any subprocesses. For this to work we need to support this in frostfs-cli directly, I will create a task shortly. Meanwhile, the solution could be increasing this timeout (from e.g. 1s to 30s), while also providing `--timeout 1s` argument to the `frostfs-cli` (so that internal `healthcheck.sh` script timeout will always be less than the timeout provided to docker compose).
Depends on https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/1209

@Phaseant given that 1s was not enough on your machine, you might want to have --timeout 5s in the healthcheck.sh and e.g. 10s in docker-compose.yml

@Phaseant given that 1s was not enough on your machine, you might want to have `--timeout 5s` in the `healthcheck.sh` and e.g. `10s` in `docker-compose.yml`

ok, great! Awaiting your solution :)

ok, great! Awaiting your solution :)
fyrchik added the
blocked
label 2024-06-27 12:36:30 +00:00
Sign in to join this conversation.
There is no content yet.