Commit graph

5 commits

Author SHA1 Message Date
Miek Gieben
a7590897fb
plugin/proxy: max the number of upstreams (#1359)
* plugin/proxy: max the number of upstreams

Put a max of 15 on the number of upstreams.
2018-01-08 15:03:42 +00:00
Miek Gieben
e34e2c251f plugin/proxy: kick of HC on every 3rd failure (#1110)
* healthchecks: check on every 3rd failure

Check on every third failure and some cleanups to make this possible. A
failed healthcheck will never increase Fails, a successfull healthceck
will reset Fails to 0. This is a chance this counter now drops below 0,
making the upstream super? healthy.

This removes the okUntil smartness and condences everything back to 1
metrics: Fails; so it's simpler in that regard.

Timout errors are *not* attributed to the local upstream, and don't get
counted into the Fails anymore. Meaning the 'dig any isc.org' won't kill
your upstream.

Added extra test the see if the Fails counter gets reset after 3 failed
connection.

There is still a disconnect beween HTTP healthceck working the proxy (or
lookup) not being able to connect to the upstream.

* Fix tests
2017-10-15 19:38:39 +02:00
Miek Gieben
2a32cd4159 plugin/proxy: decrease health timeouts (#1107)
Turn down the timeouts and numbers a bit:
FailTimeout 10s -> 5s
Future 60s -> 12s
TryDuration 60s -> 16s
The timeout for decrementing the fails in a host: 10s -> 2s

And the biggest change: don't set fails when the error is Timeout(),
meaning we loop for a bit and may try the same server again, but we
don't mark our upstream as bad, see comments in proxy.go. Testing this
with "ANY isc.org" and "MX miek.nl" we see:

~~~
::1 - [24/Sep/2017:08:06:17 +0100] "ANY IN isc.org. udp 37 false 4096" SERVFAIL qr,rd 37 10.001621221s
24/Sep/2017:08:06:17 +0100 [ERROR 0 isc.org. ANY] unreachable backend: read udp 192.168.1.148:37420->8.8.8.8:53: i/o timeout

::1 - [24/Sep/2017:08:06:17 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 35.957284ms

127.0.0.1 - [24/Sep/2017:08:06:18 +0100] "ANY IN isc.org. udp 37 false 4096" SERVFAIL qr,rd 37 10.002051726s
24/Sep/2017:08:06:18 +0100 [ERROR 0 isc.org. ANY] unreachable backend: read udp 192.168.1.148:54901->8.8.8.8:53: i/o timeout

::1 - [24/Sep/2017:08:06:19 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 56.848416ms
127.0.0.1 - [24/Sep/2017:08:06:21 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 48.118349ms
::1 - [24/Sep/2017:08:06:21 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 1.055172915s
~~~

So the ANY isc.org queries show up twice, because we retry internally -
this is I think WAI.

The `miek.nl MX` queries are just processed normally as no backend is
marked as unreachable.

May fix #1035 #486
2017-09-24 20:05:36 +01:00
Miek Gieben
148a99442d healhcheck: various cleanups (#1106)
* healhcheck: various cleanups

Network wasn't used. IgnorePaths wasn't used. Move checkdown function to
common function shared between proxy protocols. And some naming fixed.

Also reset the Fails on a succesful healthcheck back to 0.

remove newlines from log

* compile

* fix test
2017-09-24 19:37:43 +01:00
Miek Gieben
d8714e64e4 Remove the word middleware (#1067)
* Rename middleware to plugin

first pass; mostly used 'sed', few spots where I manually changed
text.

This still builds a coredns binary.

* fmt error

* Rename AddMiddleware to AddPlugin

* Readd AddMiddleware to remain backwards compat
2017-09-14 09:36:06 +01:00
Renamed from middleware/proxy/upstream.go (Browse further)