* plugin/forward: convert the specified domain of health_check to Fqdn
* plugin/forward: update readme for health check
Signed-off-by: vanceli <vanceli@tencent.com>
* trap unsupported FROM cidr notations
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
* make is a warning
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
* plugin/forward Add rcode and rtype to request_duration_seconds metric
Signed-off-by: Maxime Ginters <maxime.ginters@shopify.com>
* Control the cardinality of query type
Signed-off-by: Maxime Ginters <maxime.ginters@shopify.com>
* forward: add example with multiple DoT upstreams
Remove Bugs section as this is a nice work around.
h/t https://twitter.com/mholt6/status/1284250606673080321
Signed-off-by: Miek Gieben <miek@miek.nl>
* Actually remove bugs section
Signed-off-by: Miek Gieben <miek@miek.nl>
sed -i 's/Also See/See Also/' plugin/**/README.md
Some plugins did already use 'See Also', so it's all consistent now.
Fixes: #4196
Signed-off-by: Miek Gieben <miek@miek.nl>
* tweak language
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
* tweak language
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
* typo
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
we hc every 0.5s, doing exp backoff will create a large gap in the
ability to re-use an upstream. Doing a exp. backoff up to (say) 3s,
isn't really exp backoff either.
Remove the wording from the documentation.
Signed-off-by: Miek Gieben <miek@miek.nl>
Cleanup a variety of metric issues.
* Eliminate department of redundancy "count_total" naming.
* Use the plural of the unit when appropriate. (ex, "requests")
* Remove label names from metric names where appropriate. (ex, "rcode")
* Simplify request metrics by consolidating type label in to the base
request counter.
* Re-generate man pages.
Signed-off-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Make the RD-flag in health-checks in the Forward-plugin configurable
Introduces a new configuration flag; `health_check_non_recursive`. This
flag makes the health-checker do non-recursive requests when checking
the health of upstream servers.
Signed-off-by: Geir Haugom <ghagit@haugom.org>
Signed-off-by: Christian Tryti <ctryti@gmail.com>
* Changes after feedback from reviewer
* Better tests of health-checks with and without recursion
* Removed the health_check_non_recursive configuration in favor of
extending the existing health_check configuration. Now supports an
optional `no_rec` argument.
Signed-off-by: Christian Tryti <ctryti@gmail.com>
* Add new test that checks setup of health_check.
Signed-off-by: Christian Tryti <ctryti@gmail.com>
* plugin/pkg/up: make default intervals shorter
I think 15 min is too high, make this lower to react faster.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Update README
Signed-off-by: Miek Gieben <miek@miek.nl>
Move exponential backoff initialization to Start()
Signed-off-by: RickyRajinder <singh.sangh@gmail.com>
Move comment
Increase max interval and update README
Remove trailing whitespace
Change Start() param name back to interval
Caught my eye, we name things directive still, esp when talking about
the prometheus *plugin*. Rename everything that needs to be plugin to
'plugin'. Also make sure Metrics is a H2 section (not H1).
Signed-off-by: Miek Gieben <miek@miek.nl>
* plugin/forward: may Yield not block
Yield may block when we're super busy with creating (and looking) for
connection. Set a small timeout on Yield, to skip putting the connection
back in the queue.
Use persistentConn troughout the socket handling code to be more
consistent.
Signed-off-by: Miek Gieben <miek@miek.nl>
Dont do
Signed-off-by: Miek Gieben <miek@miek.nl>
* Set used in Yield
This gives one central place where we update used in the persistConns
Signed-off-by: Miek Gieben <miek@miek.nl>
After several experiments at SoundCloud we found that the current
minimum read timeout of 10ms is too low. A single request against a
slow/unavailable authoritative server can cause all TCP connections to
get closed. We record a 50th percentile forward/proxy latency of <5ms,
and a 99th percentile latency of 60ms. Using a minimum timeout of 200ms
seems to be a fair trade-off between avoiding unnecessary high
connection churn and reacting to upstream failures in a timely manner.
This change also renames hcDuration to hcInterval to reflect its usage,
and removes the duplicated timeout constant to make code comprehension
easier.
* plugins: Return error for multiple use of some
Return plugin.ErrOnce when a plugin that doesn't support it, is called
mutliple times.
This now adds it for: cache, dnssec, errors, forward, hosts, nsid.
And changes it slightly in kubernetes, pprof, reload, root.
* more tests
* plugin/forward: on demand healtchecking
Only start doing health checks when we encouner an error (any error).
This uses the new pluing/pkg/up package to abstract away the actual
checking. This reduces the LOC quite a bit; does need more testing, unit
testing and tcpdumping a bit.
* fix tests
* Fix readme
* Use pkg/up for healthchecks
* remove unused channel
* more cleanups
* update readme
* * Again do go generate and go build; still referencing the wrong forward
repo? Anyway fixed.
* Use pkg/up for doing the healtchecks to cut back on unwanted queries
* Change up.Func to return an error instead of a boolean.
* Drop the string target argument as it doesn't make sense.
* Add healthcheck test on failing to get an upstream answer.
TODO(miek): double check Forward and Lookup and how they interact with
HC, and if we correctly call close() on those
* actual test
* Tests here
* more tests
* try getting rid of host
* Get rid of the host indirection
* Finish removing hosts
* moar testing
* import fmt
* field is not used
* docs
* move some stuff
* bring back health_check
* maxfails=0 test
* git and merging, bah
* review
* plugin/forward: add it
This moves coredns/forward into CoreDNS. Fixes as a few bugs, adds a
policy option and more tests to the plugin.
Update the documentation, test IPv6 address and add persistent tests.
* Always use random policy when spraying
* include scrub fix here as well
* use correct var name
* Code review
* go vet
* Move logging to metrcs
* Small readme updates
* Fix readme