Commit graph

16 commits

Author SHA1 Message Date
W. Trevor King
0063d7a80c
plugin/health: Poll localhost by default (#5934)
defaulting to localhost makes things explicit in CoreDNS code, and will give us valid URIs in
the logs

Signed-off-by: W. Trevor King <wking@tremily.us>
2023-03-29 09:57:54 -04:00
Chris O'Haver
d903a963ee
dont lameduck when reloading (#5472)
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
2022-07-06 13:52:18 -04:00
Ondřej Benkovský
a929b0b1ec
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244)
Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com>
2022-04-13 13:09:03 -04:00
Zou Nengren
768ca99c57 Fix reloading in health and ready (#3473)
Signed-off-by: zouyee <zounengren@cmss.chinamobile.com>
2019-11-20 12:14:37 +00:00
Guangming Wang
081e45afa3 cleanup: remove redundant return statement (#3297)
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-09-23 14:40:14 +01:00
xieyanker
9fe7fb95c6 return standardized text for ready and health endpoint (#3195) 2019-08-26 10:31:24 +00:00
Miek Gieben
076b8d4fba plugin/health: add OnRestartFailed (#2812)
Add OnReStartFailed which makes the health plugin stay up if the
Corefile is corrupt and we revert to the previous version.

Also needs a fix for the channel handling

See #2659

Testing it will log the following when restarting with a corrupted
Corefile

~~~
2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4,
CoreDNS-1.5.0
linux/amd64, go1.12.4,
[INFO] SIGUSR1: Reloading
[INFO] Reloading
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
~~~

After which the curl still works.

This also needed a change to reset the channel used for the metrics
go-routine which gets closed on shutdown, otherwise you'll see:

~~~
^C[INFO] SIGINT: Shutting down
panic: close of closed channel

goroutine 90 [running]:
github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d)
~~~

Signed-off-by: Miek Gieben <miek@miek.nl>
2019-05-04 16:06:25 -04:00
Miek Gieben
890cdb5cab plugin/health: cleanups (#2811)
Small, trivial cleanup: got triggered because I saw a comment on how
health plugins polls other plugins which isn't true.

* Remove useless newHealth function
* healthParse -> parse
* Remove useless constants

Net deletion of code.

Signed-off-by: Miek Gieben <miek@miek.nl>
2019-05-04 16:06:04 -04:00
Miek Gieben
c778b3a67c
plugin/health: remove ability to poll other plugins (#2547)
* plugin/health: remove ability to poll other plugins

This mechanism defeats the purpose any plugin (mostly) caching can still
be alive, we can probably forward queries still. Don't poll plugins,
just tell the world we're up and running.

It was only actually used in kubernetes; and there specifically would
mean any network hiccup would NACK the entire server health.

Fixes: #2534

Signed-off-by: Miek Gieben <miek@miek.nl>

* update docs based on feedback

Signed-off-by: Miek Gieben <miek@miek.nl>
2019-03-07 22:13:47 +00:00
Miek Gieben
12b2ff9740
Use logging (#1718)
* update docs

* plugins: use plugin specific logging

Hooking up pkg/log also changed NewWithPlugin to just take a string
instead of a plugin.Handler as that is more flexible and for instance
the Root "plugin" doesn't implement it fully.

Same logging from the reload plugin:

.:1043
2018/04/22 08:56:37 [INFO] CoreDNS-1.1.1
2018/04/22 08:56:37 [INFO] linux/amd64, go1.10.1,
CoreDNS-1.1.1
linux/amd64, go1.10.1,
2018/04/22 08:56:37 [INFO] plugin/reload: Running configuration MD5 = ec4c9c55cd19759ea1c46b8c45742b06
2018/04/22 08:56:54 [INFO] Reloading
2018/04/22 08:56:54 [INFO] plugin/reload: Running configuration MD5 = 9e2bfdd85bdc9cceb740ba9c80f34c1a
2018/04/22 08:56:54 [INFO] Reloading complete

* update docs

* better doc
2018-04-22 21:40:33 +01:00
Miek Gieben
acbcad7b4e
reload: use OnRestart (#1709)
* reload: use OnRestart

Close the listener on OnRestart for health and metrics so the default
setup function can setup the listener when the plugin is "starting up".

Lightly test with some SIGUSR1-ing. Also checked the reload plugin with
this, seems fine:

.com.:1043
.:1043
2018/04/20 15:01:25 [INFO] CoreDNS-1.1.1
2018/04/20 15:01:25 [INFO] linux/amd64, go1.10,
CoreDNS-1.1.1
linux/amd64, go1.10,
2018/04/20 15:01:25 [INFO] Running configuration MD5 = aa8b3f03946fb60546ca1f725d482714
2018/04/20 15:02:01 [INFO] Reloading
2018/04/20 15:02:01 [INFO] Running configuration MD5 = b34a96d99e01db4015a892212560155f
2018/04/20 15:02:01 [INFO] Reloading complete
^C2018/04/20 15:02:06 [INFO] SIGINT: Shutting down

With this corefile:
.com {
  proxy . 127.0.0.1:53
  prometheus :9054
  whoami
  reload
}

. {
  proxy . 127.0.0.1:53
  prometheus :9054
  whoami
  reload
}

The prometheus port was 9053, changed that to 54 so reload would pick it
up.

From a cursory look it seems this also fixes:
Fixes #1604 #1618 #1686 #1492

* At least make it test

* Use onfinalshutdown

* reload: add reload test

This test #1604 adn right now fails.

* Address review comments

* Add bug section explaining things a bit

* compile tests

* Fix tests

* fixes

* slightly less crazy

* try to make prometheus setup less confusing

* Use ephermal port for test

* Don't use the listener

* These are shared between goroutines, just use the boolean in the main
  structure.
* Fix text in the reload README,
* Set addr to TODO once stopping it
* Morph fturb's comment into test, to test reload and scrape health and
  metric endpoint
2018-04-21 17:43:02 +01:00
Miek Gieben
26d1432ae6
Update all plugins to use plugin/pkg/log (#1694)
* Update all plugins to use plugin/pkg/log

I wish this could have been done with sed. Alas manually changed all
callers to use the new plugin/pkg/log package.

* Error -> Info

* Add docs to debug plugin as well
2018-04-19 07:41:56 +01:00
Miek Gieben
804f745951
plugin/health: make reload work (#1585)
* plugin/health: make reload work

Remove the once.Do from the startup, so we can re-bind the HTTP
listener. Also clarify the usage of health in multiple server blocks
(this is not the best approach - but there isn't a generic solution at
this point).

Manual tested as we lack testing infra, i.e kill -SIGUSR1 and some
CURLing of the health endpoint.

* Readme test fix

* update

* dont need this
2018-03-02 21:40:14 -08:00
Miek Gieben
c39e5cd014
plugin/health: add lameduck mode (#1379)
* plugin/health: add lameduck mode

Add a way to configure lameduck more, i.e. set health to false, stop
polling plugins. Then wait for a duration before shutting down. As the
health middleware is configured early on in the plugin list, it will
hold up all other shutdown, meaning we still answer queries.

* Add New

* More tests

* golint

* remove confusing text
2018-01-18 10:40:09 +00:00
Miek Gieben
48059a6c3e
Overloaded (#1364)
* plugin/health: add 'overloaded metrics'

Query our on health endpoint and record (and export as a metric) the
time it takes. The Get has a 5s timeout, that, when reached, will set
the metric duration to 5s. The actually call "I'm I overloaded" is left
to an external entity.

* README

* golint and govet

* and the tests
2018-01-10 11:41:22 +00:00
Miek Gieben
d8714e64e4 Remove the word middleware (#1067)
* Rename middleware to plugin

first pass; mostly used 'sed', few spots where I manually changed
text.

This still builds a coredns binary.

* fmt error

* Rename AddMiddleware to AddPlugin

* Readd AddMiddleware to remain backwards compat
2017-09-14 09:36:06 +01:00
Renamed from middleware/health/health.go (Browse further)