plugin/health: remove ability to poll other plugins (#2547)

* plugin/health: remove ability to poll other plugins

This mechanism defeats the purpose any plugin (mostly) caching can still
be alive, we can probably forward queries still. Don't poll plugins,
just tell the world we're up and running.

It was only actually used in kubernetes; and there specifically would
mean any network hiccup would NACK the entire server health.

Fixes: #2534

Signed-off-by: Miek Gieben <miek@miek.nl>

* update docs based on feedback

Signed-off-by: Miek Gieben <miek@miek.nl>
This commit is contained in:
Miek Gieben 2019-03-07 22:13:47 +00:00 committed by GitHub
parent db0b16b615
commit c778b3a67c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
10 changed files with 18 additions and 173 deletions

View file

@ -59,10 +59,7 @@ When exporting metrics the *Namespace* should be `plugin.Namespace` (="coredns")
*Subsystem* should be the name of the plugin. The README.md for the plugin should then also contain *Subsystem* should be the name of the plugin. The README.md for the plugin should then also contain
a *Metrics* section detailing the metrics. a *Metrics* section detailing the metrics.
If the plugin supports dynamic health reporting it should also have *Health* section detailing on If the plugin supports signalling readiness it should have a *Ready* section detailing how it
some of its inner workings.
If the plugins supports signalling readiness it should have a *Ready* section detailing how it
works. works.
## Documentation ## Documentation

View file

@ -33,10 +33,6 @@ erratic {
In case of a zone transfer and truncate the final SOA record *isn't* added to the response. In case of a zone transfer and truncate the final SOA record *isn't* added to the response.
## Health
This plugin implements dynamic health checking. For every dropped query it turns unhealthy.
## Ready ## Ready
This plugin reports readiness to the ready plugin. This plugin reports readiness to the ready plugin.

View file

@ -1,14 +0,0 @@
package erratic
import (
"sync/atomic"
)
// Health implements the health.Healther interface.
func (e *Erratic) Health() bool {
q := atomic.LoadUint64(&e.q)
if e.drop > 0 && q%e.drop == 0 {
return false
}
return true
}

View file

@ -6,9 +6,8 @@
## Description ## Description
By enabling *health* any plugin that implements Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK http
[health.Healther interface](https://godoc.org/github.com/coredns/coredns/plugin/health#Healther) status code. The health is exported, by default, on port 8080/health .
will be queried for it's health. The combined health is exported, by default, on port 8080/health .
## Syntax ## Syntax
@ -17,12 +16,9 @@ health [ADDRESS]
~~~ ~~~
Optionally takes an address; the default is `:8080`. The health path is fixed to `/health`. The Optionally takes an address; the default is `:8080`. The health path is fixed to `/health`. The
health endpoint returns a 200 response code and the word "OK" when this server is healthy. It returns health endpoint returns a 200 response code and the word "OK" when this server is healthy.
a 503. *health* periodically (1s) polls plugins that exports health information. If any of the
plugins signals that it is unhealthy, the server will go unhealthy too. Each plugin that supports
health checks has a section "Health" in their README.
More options can be set with this extended syntax: An extra option can be set with this extended syntax:
~~~ ~~~
health [ADDRESS] { health [ADDRESS] {
@ -33,8 +29,8 @@ health [ADDRESS] {
* Where `lameduck` will make the process unhealthy then *wait* for **DURATION** before the process * Where `lameduck` will make the process unhealthy then *wait* for **DURATION** before the process
shuts down. shuts down.
If you have multiple Server Blocks and need to export health for each of the plugins, you must run If you have multiple Server Blocks, *health* should only be enabled in one of them (as it is process
health endpoints on different ports: wide). If you really need multiple endpoints, you must run health endpoints on different ports:
~~~ corefile ~~~ corefile
com { com {
@ -48,21 +44,6 @@ net {
} }
~~~ ~~~
Note that if you format this in one server block you will get an error on startup, that the second
server can't setup the health plugin (on the same port).
~~~ txt
com net {
whoami
erratic
health :8080
}
~~~
## Plugins
Any plugin that implements the Healther interface will be used to report health.
## Metrics ## Metrics
If monitoring is enabled (via the *prometheus* directive) then the following metric is exported: If monitoring is enabled (via the *prometheus* directive) then the following metric is exported:
@ -96,7 +77,7 @@ Set a lameduck duration of 1 second:
## Bugs ## Bugs
When reloading, the Health handler is stopped before the new server instance is started. When reloading, the health handler is stopped before the new server instance is started. If that
If that new server fails to start, then the initial server instance is still available and DNS queries still served, new server fails to start, then the initial server instance is still available and DNS queries still
but Health handler stays down. served, but health handler stays down. Health will not reply HTTP request until a successful reload
Health will not reply HTTP request until a successful reload or a complete restart of CoreDNS. or a complete restart of CoreDNS.

View file

@ -5,7 +5,6 @@ import (
"io" "io"
"net" "net"
"net/http" "net/http"
"sync"
"time" "time"
clog "github.com/coredns/coredns/plugin/pkg/log" clog "github.com/coredns/coredns/plugin/pkg/log"
@ -22,18 +21,12 @@ type health struct {
nlSetup bool nlSetup bool
mux *http.ServeMux mux *http.ServeMux
// A slice of Healthers that the health plugin will poll every second for their health status. stop chan bool
h []Healther
sync.RWMutex
ok bool // ok is the global boolean indicating an all healthy plugin stack
stop chan bool
pollstop chan bool
} }
// newHealth returns a new initialized health. // newHealth returns a new initialized health.
func newHealth(addr string) *health { func newHealth(addr string) *health {
return &health{Addr: addr, stop: make(chan bool), pollstop: make(chan bool)} return &health{Addr: addr, stop: make(chan bool)}
} }
func (h *health) OnStartup() error { func (h *health) OnStartup() error {
@ -51,12 +44,10 @@ func (h *health) OnStartup() error {
h.nlSetup = true h.nlSetup = true
h.mux.HandleFunc(path, func(w http.ResponseWriter, r *http.Request) { h.mux.HandleFunc(path, func(w http.ResponseWriter, r *http.Request) {
if h.Ok() { // We're always healthy.
w.WriteHeader(http.StatusOK) w.WriteHeader(http.StatusOK)
io.WriteString(w, ok) io.WriteString(w, ok)
return return
}
w.WriteHeader(http.StatusServiceUnavailable)
}) })
go func() { http.Serve(h.ln, h.mux) }() go func() { http.Serve(h.ln, h.mux) }()
@ -72,11 +63,6 @@ func (h *health) OnFinalShutdown() error {
return nil return nil
} }
// Stop polling plugins
h.pollstop <- true
// NACK health
h.SetOk(false)
if h.lameduck > 0 { if h.lameduck > 0 {
log.Infof("Going into lameduck mode for %s", h.lameduck) log.Infof("Going into lameduck mode for %s", h.lameduck)
time.Sleep(h.lameduck) time.Sleep(h.lameduck)
@ -84,8 +70,8 @@ func (h *health) OnFinalShutdown() error {
h.ln.Close() h.ln.Close()
h.stop <- true
h.nlSetup = false h.nlSetup = false
close(h.stop)
return nil return nil
} }

View file

@ -6,43 +6,23 @@ import (
"net/http" "net/http"
"testing" "testing"
"time" "time"
"github.com/coredns/coredns/plugin/erratic"
) )
func TestHealth(t *testing.T) { func TestHealth(t *testing.T) {
h := newHealth(":0") h := newHealth(":0")
h.h = append(h.h, &erratic.Erratic{})
if err := h.OnStartup(); err != nil { if err := h.OnStartup(); err != nil {
t.Fatalf("Unable to startup the health server: %v", err) t.Fatalf("Unable to startup the health server: %v", err)
} }
defer h.OnFinalShutdown() defer h.OnFinalShutdown()
go func() {
<-h.pollstop
return
}()
// Reconstruct the http address based on the port allocated by operating system. // Reconstruct the http address based on the port allocated by operating system.
address := fmt.Sprintf("http://%s%s", h.ln.Addr().String(), path) address := fmt.Sprintf("http://%s%s", h.ln.Addr().String(), path)
// Nothing set should return unhealthy
response, err := http.Get(address) response, err := http.Get(address)
if err != nil { if err != nil {
t.Fatalf("Unable to query %s: %v", address, err) t.Fatalf("Unable to query %s: %v", address, err)
} }
if response.StatusCode != 503 {
t.Errorf("Invalid status code: expecting '503', got '%d'", response.StatusCode)
}
response.Body.Close()
h.poll()
response, err = http.Get(address)
if err != nil {
t.Fatalf("Unable to query %s: %v", address, err)
}
if response.StatusCode != 200 { if response.StatusCode != 200 {
t.Errorf("Invalid status code: expecting '200', got '%d'", response.StatusCode) t.Errorf("Invalid status code: expecting '200', got '%d'", response.StatusCode)
} }
@ -60,18 +40,10 @@ func TestHealth(t *testing.T) {
func TestHealthLameduck(t *testing.T) { func TestHealthLameduck(t *testing.T) {
h := newHealth(":0") h := newHealth(":0")
h.lameduck = 250 * time.Millisecond h.lameduck = 250 * time.Millisecond
h.h = append(h.h, &erratic.Erratic{})
if err := h.OnStartup(); err != nil { if err := h.OnStartup(); err != nil {
t.Fatalf("Unable to startup the health server: %v", err) t.Fatalf("Unable to startup the health server: %v", err)
} }
// Both these things are behind a sync.Once, fake reading from the channels.
go func() {
<-h.pollstop
<-h.stop
return
}()
h.OnFinalShutdown() h.OnFinalShutdown()
} }

View file

@ -1,36 +0,0 @@
package health
// Healther interface needs to be implemented by each plugin willing to provide
// healthhceck information to the health plugin. Note this method should return
// quickly, i.e. just checking a boolean status, as it is called every second
// from the health plugin.
type Healther interface {
// Health returns a boolean indicating the health status of a plugin.
// False indicates unhealthy.
Health() bool
}
// Ok returns the global health status of all plugin configured in this server.
func (h *health) Ok() bool {
h.RLock()
defer h.RUnlock()
return h.ok
}
// SetOk sets the global health status of all plugin configured in this server.
func (h *health) SetOk(ok bool) {
h.Lock()
defer h.Unlock()
h.ok = ok
}
// poll polls all healthers and sets the global state.
func (h *health) poll() {
for _, m := range h.h {
if !m.Health() {
h.SetOk(false)
return
}
}
h.SetOk(true)
}

View file

@ -5,7 +5,6 @@ import (
"net" "net"
"time" "time"
"github.com/coredns/coredns/core/dnsserver"
"github.com/coredns/coredns/plugin" "github.com/coredns/coredns/plugin"
"github.com/coredns/coredns/plugin/metrics" "github.com/coredns/coredns/plugin/metrics"
@ -28,32 +27,6 @@ func setup(c *caddy.Controller) error {
h := newHealth(addr) h := newHealth(addr)
h.lameduck = lame h.lameduck = lame
c.OnStartup(func() error {
plugins := dnsserver.GetConfig(c).Handlers()
for _, p := range plugins {
if x, ok := p.(Healther); ok {
h.h = append(h.h, x)
}
}
return nil
})
c.OnStartup(func() error {
// Poll all middleware every second.
h.poll()
go func() {
for {
select {
case <-time.After(1 * time.Second):
h.poll()
case <-h.pollstop:
return
}
}
}()
return nil
})
c.OnStartup(func() error { c.OnStartup(func() error {
metrics.MustRegister(c, HealthDuration) metrics.MustRegister(c, HealthDuration)
return nil return nil

View file

@ -107,11 +107,6 @@ kubernetes [ZONES...] {
This allows the querying pod to continue searching for the service in the search path. This allows the querying pod to continue searching for the service in the search path.
The search path could, for example, include another Kubernetes cluster. The search path could, for example, include another Kubernetes cluster.
## Health
This plugin implements dynamic health checking. Currently this is limited to reporting healthy when
the API has synced.
## Examples ## Examples
Handle all queries in the `cluster.local` zone. Connect to Kubernetes in-cluster. Also handle all Handle all queries in the `cluster.local` zone. Connect to Kubernetes in-cluster. Also handle all
@ -144,7 +139,6 @@ kubernetes cluster.local {
} }
~~~ ~~~
## stubDomains and upstreamNameservers ## stubDomains and upstreamNameservers
Here we use the *forward* plugin to implement a stubDomain that forwards `example.local` to the nameserver `10.100.0.10:53`. Here we use the *forward* plugin to implement a stubDomain that forwards `example.local` to the nameserver `10.100.0.10:53`.

View file

@ -1,4 +0,0 @@
package kubernetes
// Health implements the health.Healther interface.
func (k *Kubernetes) Health() bool { return k.APIConn.HasSynced() }