Add OnReStartFailed which makes the health plugin stay up if the Corefile is corrupt and we revert to the previous version. Also needs a fix for the channel handling See #2659 Testing it will log the following when restarting with a corrupted Corefile ~~~ 2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4, CoreDNS-1.5.0 linux/amd64, go1.12.4, [INFO] SIGUSR1: Reloading [INFO] Reloading [ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj' [ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj' ~~~ After which the curl still works. This also needed a change to reset the channel used for the metrics go-routine which gets closed on shutdown, otherwise you'll see: ~~~ ^C[INFO] SIGINT: Shutting down panic: close of closed channel goroutine 90 [running]: github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d) ~~~ Signed-off-by: Miek Gieben <miek@miek.nl> |
||
---|---|---|
.. | ||
health.go | ||
health_test.go | ||
log_test.go | ||
overloaded.go | ||
OWNERS | ||
README.md | ||
setup.go | ||
setup_test.go |
health
Name
health - enables a health check endpoint.
Description
Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK HTTP status code. The health is exported, by default, on port 8080/health .
Syntax
health [ADDRESS]
Optionally takes an address; the default is :8080
. The health path is fixed to /health
. The
health endpoint returns a 200 response code and the word "OK" when this server is healthy.
An extra option can be set with this extended syntax:
health [ADDRESS] {
lameduck DURATION
}
- Where
lameduck
will make the process unhealthy then wait for DURATION before the process shuts down.
If you have multiple Server Blocks, health can only be enabled in one of them (as it is process wide). If you really need multiple endpoints, you must run health endpoints on different ports:
com {
whoami
health :8080
}
net {
erratic
health :8081
}
Doing this is supported but both endponts ":8080" and ":8081" will export the exact same health.
Metrics
If monitoring is enabled (via the prometheus directive) then the following metric is exported:
coredns_health_request_duration_seconds{}
- duration to process a HTTP query to the local/health
endpoint. As this a local operation it should be fast. A (large) increase in this duration indicates the CoreDNS process is having trouble keeping up with its query load.
Note that this metric does not have a server
label, because being overloaded is a symptom of
the running process, not a specific server.
Examples
Run another health endpoint on http://localhost:8091.
. {
health localhost:8091
}
Set a lameduck duration of 1 second:
. {
health localhost:8092 {
lameduck 1s
}
}