* Speed up testing
* make notification run in the background, this recudes the test_readme
time from 18s to 0.10s
* reduce time for zone reload
* TestServeDNSConcurrent remove entirely. This took a whopping 58s for
... ? A few minutes staring didn't reveal wth it is actually testing.
Making values smaller revealed race conditions in the tests. Remove
entirely.
* Move many interval values to variables so we can reset them to short
values for the tests.
* test_large_axfr: make the zone smaller. The number used 64K has no
rational, make it 64/10 to speed up.
* TestProxyThreeWay: use client with shorter timeout
A few random tidbits in other tests.
Total time saved: 177s (almost 3m) - which makes it worthwhile again to
run the test locally:
this branch:
~~~
ok github.com/coredns/coredns/test 10.437s
cd plugin; time go t ./...
5,51s user 7,51s system 11,15s elapsed 744%CPU (
~~~
master:
~~~
ok github.com/coredns/coredns/test 35.252s
cd plugin; time go t ./...
157,64s user 15,39s system 50,05s elapsed 345%CPU ()
~~~
tests/ -25s
plugins/ -40s
This brings the total on 20s, and another 10s can be saved by fixing
dnstapio. Moving this to 5s would be even better, but 10s is also nice.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Also 0.01
Signed-off-by: Miek Gieben <miek@miek.nl>
* Make the RD-flag in health-checks in the Forward-plugin configurable
Introduces a new configuration flag; `health_check_non_recursive`. This
flag makes the health-checker do non-recursive requests when checking
the health of upstream servers.
Signed-off-by: Geir Haugom <ghagit@haugom.org>
Signed-off-by: Christian Tryti <ctryti@gmail.com>
* Changes after feedback from reviewer
* Better tests of health-checks with and without recursion
* Removed the health_check_non_recursive configuration in favor of
extending the existing health_check configuration. Now supports an
optional `no_rec` argument.
Signed-off-by: Christian Tryti <ctryti@gmail.com>
* Add new test that checks setup of health_check.
Signed-off-by: Christian Tryti <ctryti@gmail.com>
* plugin/forward: may Yield not block
Yield may block when we're super busy with creating (and looking) for
connection. Set a small timeout on Yield, to skip putting the connection
back in the queue.
Use persistentConn troughout the socket handling code to be more
consistent.
Signed-off-by: Miek Gieben <miek@miek.nl>
Dont do
Signed-off-by: Miek Gieben <miek@miek.nl>
* Set used in Yield
This gives one central place where we update used in the persistConns
Signed-off-by: Miek Gieben <miek@miek.nl>
* Run gostaticheck
Run gostaticcheck on the codebase and fix almost all flagged items.
Only keep
* coremain/run.go:192:2: var appVersion is unused (U1000)
* plugin/chaos/setup.go:54:3: the surrounding loop is unconditionally terminated (SA4004)
* plugin/etcd/setup.go:103:3: the surrounding loop is unconditionally terminated (SA4004)
* plugin/pkg/replacer/replacer.go:274:13: argument should be pointer-like to avoid allocations (SA6002)
* plugin/route53/setup.go:124:28: session.New is deprecated: Use NewSession functions to create sessions instead. NewSession has the same functionality as New except an error can be returned when the func is called instead of waiting to receive an error until a request is made. (SA1019)
* test/grpc_test.go:25:69: grpc.WithTimeout is deprecated: use DialContext and context.WithTimeout instead. Will be supported throughout 1.x. (SA1019)
The first one isn't true, as this is set via ldflags. The rest is
minor. The deprecation should be fixed at some point; I'll file some
issues.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Make sure to plug in the plugins
import the plugins, that file that did this was removed, put it in the
reload test as this requires an almost complete coredns server.
Signed-off-by: Miek Gieben <miek@miek.nl>
* plugin/forward: remove dynamic read timeout
We care about an upstream being there, so we still have a dynamic dial
time out (by way higher then 200ms) of 1s; this should be fairly stable
for an upstream. The read timeout if more variable because of cached and
non cached responses. As such remove his logic entirely.
Drop to 2s read timeout.
Fixes#2306
Signed-off-by: Miek Gieben <miek@miek.nl>
Create plugin/pkg/transport that holds the transport related functions.
This needed to be a new pkg to prevent cyclic import errors.
This cleans up a bunch of duplicated code in core/dnsserver that also
tried to parse a transport (now all done in transport.Parse).
Signed-off-by: Miek Gieben <miek@miek.nl>
* plugin/forward: add HealthChecker interface
Make the HealthChecker interface and morph the current DNS health
checker into that interface.
Remove all whole bunch of method on Forward that didn't make sense.
This is done in preparation of adding a DoH client to forward - which
requires a completely different healthcheck implementation (and more,
but lets start here)
Signed-off-by: Miek Gieben <miek@miek.nl>
* Use protocol
Signed-off-by: Miek Gieben <miek@miek.nl>
* Dial doesnt need to be method an Forward either
Signed-off-by: Miek Gieben <miek@miek.nl>
* Address comments
Address various comments on the PR.
Signed-off-by: Miek Gieben <miek@miek.nl>
After several experiments at SoundCloud we found that the current
minimum read timeout of 10ms is too low. A single request against a
slow/unavailable authoritative server can cause all TCP connections to
get closed. We record a 50th percentile forward/proxy latency of <5ms,
and a 99th percentile latency of 60ms. Using a minimum timeout of 200ms
seems to be a fair trade-off between avoiding unnecessary high
connection churn and reacting to upstream failures in a timely manner.
This change also renames hcDuration to hcInterval to reflect its usage,
and removes the duplicated timeout constant to make code comprehension
easier.
* plugin/forward: erase expired connection by timer
- in previous implementation, the expired connections resided in
cache until new request to the same upstream/protocol came. In
case if the upstream was unhealthy new request may come long time
later or may not come at all. All this time expired connections
held system resources (file descriptors, ephemeral ports). In my
fix the expired connections and related resources are released
by timer
- decreased the complexity of taking connection from cache. The list
of connections is treated as stack (LIFO queue), i.e. the connection
is taken from the end of queue (the most fresh connection) and
returned to the end (as it was implemented before). The remarkable
thing is that all connections in the stack appear to be ordered by
'used' field
- the cleanup() method finds the first good (not expired) connection
in stack with binary search, since all connections are ordered by
'used' field
* fix race conditions
* minor enhancement
* add comments
- connManager() goroutine will stop when Proxy is about to be
garbage collected. This means that no queries are in progress,
and no queries are going to come
Rework the TestProxyClose - close the proxy in the *same* goroutine
as where we started it. Close channels as long as we don't get dataraces
(this may need another fix).
Move the Dial goroutine out of the connManager - this simplifies things
*and* makes another goroutine go away and removes the need for connErr
channels - can now just be dns.Conn.
Also:
Revert "plugin/forward: gracefull stop (#1701)"
This reverts commit 135377bf77.
Revert "rework TestProxyClose (#1735)"
This reverts commit 9e8893a0b5.
* plugin/forward: gracefull stop
- stop connection manager only when no queries in progress
* minor improvement
* prevent healthcheck on stopped proxy
* revert closing channels
* use standard context
- each proxy stores average RTT (round trip time) of last rttCount queries.
For now, I assigned the value 4 to rttCount
- the read timeout is calculated as doubled average RTT, but it cannot
exceed default timeout
- initial avg RTT is set to a half of default timeout, so initial timeout
is equal to default timeout
- the RTT for failed read is considered equal to default timeout, so any
failed read will lead to increasing average RTT (up to default timeout)
- dynamic timeouts will let us react faster on lost UDP packets
- in future, we may develop a low-latency forward policy based on
collected RTT values of proxies
* plugin/forward: on demand healtchecking
Only start doing health checks when we encouner an error (any error).
This uses the new pluing/pkg/up package to abstract away the actual
checking. This reduces the LOC quite a bit; does need more testing, unit
testing and tcpdumping a bit.
* fix tests
* Fix readme
* Use pkg/up for healthchecks
* remove unused channel
* more cleanups
* update readme
* * Again do go generate and go build; still referencing the wrong forward
repo? Anyway fixed.
* Use pkg/up for doing the healtchecks to cut back on unwanted queries
* Change up.Func to return an error instead of a boolean.
* Drop the string target argument as it doesn't make sense.
* Add healthcheck test on failing to get an upstream answer.
TODO(miek): double check Forward and Lookup and how they interact with
HC, and if we correctly call close() on those
* actual test
* Tests here
* more tests
* try getting rid of host
* Get rid of the host indirection
* Finish removing hosts
* moar testing
* import fmt
* field is not used
* docs
* move some stuff
* bring back health_check
* maxfails=0 test
* git and merging, bah
* review
* plugin/forward: add it
This moves coredns/forward into CoreDNS. Fixes as a few bugs, adds a
policy option and more tests to the plugin.
Update the documentation, test IPv6 address and add persistent tests.
* Always use random policy when spraying
* include scrub fix here as well
* use correct var name
* Code review
* go vet
* Move logging to metrcs
* Small readme updates
* Fix readme