Add a NativeHistogramBucketFactor parameter to the use of
`NewHistogramVec` in order to enable use of Prometheus Native
Histograms.
This will store automatically computed sparse buckets in CoreDNS.
If a compatible Prometeus requests native histograms this data will
returned instead of the static buckets.
The default factor of 1.05 should provide high quality resolution data.
Signed-off-by: SuperQ <superq@gmail.com>
* introduce new interface "dnsserver.Viewer", that allows a plugin implementing it to decide if a query should be routed into its server block.
* add new plugin "view", that uses the new interface to enable a user to define expression based conditions that must be met for a query to be routed to its server block.
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
* Metrics: expand coredns_dns_responses_total with plugin label
This adds (somewhat hacky?) code to add a plugin label to the
coredns_dns_responses_total metric. It's completely obvlious to the
plugin as we just check who called the *recorder.WriteMsg method. We use
runtime.Caller( 1 2 3) to get multiple levels of callers, this should be
deep enough, but it depends on the dns.ResponseWriter wrapping that's
occuring.
README.md of metrics updates and test added in test/metrics_test.go to
check for the label being set.
I went through the plugin to see what metrics could be removed, but
actually didn't find any, the plugin push out metrics that make sense.
Due to the path fiddling to figure out the plugin name I doubt this
works (out-of-the-box) for external plugins, but I haven't tested that.
Signed-off-by: Miek Gieben <miek@miek.nl>
* better comment
Signed-off-by: Miek Gieben <miek@miek.nl>
* Metrics: expand coredns_dns_responses_total with plugin label
This adds (somewhat hacky?) code to add a plugin label to the
coredns_dns_responses_total metric. It's completely obvlious to the
plugin as we just check who called the *recorder.WriteMsg method. We use
runtime.Caller( 1 2 3) to get multiple levels of callers, this should be
deep enough, but it depends on the dns.ResponseWriter wrapping that's
occuring.
README.md of metrics updates and test added in test/metrics_test.go to
check for the label being set.
I went through the plugin to see what metrics could be removed, but
actually didn't find any, the plugin push out metrics that make sense.
Due to the path fiddling to figure out the plugin name I doubt this
works (out-of-the-box) for external plugins, but I haven't tested that.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Update core/dnsserver/server.go
Co-authored-by: dilyevsky <ilyevsky@gmail.com>
* Use [3]string
Signed-off-by: Miek Gieben <miek@miek.nl>
* imports
Signed-off-by: Miek Gieben <miek@miek.nl>
* remove dnstest changes
Signed-off-by: Miek Gieben <miek@miek.nl>
* revert
Signed-off-by: Miek Gieben <miek@miek.nl>
* Add some sleeps to make it less flaky
Signed-off-by: Miek Gieben <miek@miek.nl>
* Revert "Add some sleeps to make it less flaky"
This reverts commit b5c6655196.
* Remove forward when not needed
Signed-off-by: Miek Gieben <miek@miek.nl>
* remove newline
Signed-off-by: Miek Gieben <miek@miek.nl>
Co-authored-by: dilyevsky <ilyevsky@gmail.com>
* when no response is written, fallback to status of next plugin in prometheus plugin
Signed-off-by: Ondrej Benkovsky <ondrej.benkovsky@wandera.com>
* fixup! when no response is written, fallback to status of next plugin in prometheus plugin
Signed-off-by: Ondrej Benkovsky <ondrej.benkovsky@wandera.com>
Make normalize return multiple "hosts" (= reverse zones) when a
non-octet boundary cidr is given.
Added pkg/cidr package that holds the cidr calculation routines; felt
they didn't really fit dnsutil.
This change means the IPNet return parameter isn't needed, the hosts are
all correct. The tests that tests this is also removed: TestSplitHostPortReverse
The fallout was that zoneAddr _also_ doesn't need the IPNet member, that
in turn make it visible that zoneAddr in address.go duplicated a bunch
of stuff from register.go; removed/refactored that too.
Created a plugin.OriginsFromArgsOrServerBlock to help plugins do the
right things, by consuming ZONE arguments; this now expands reverse
zones correctly. This is mostly mechanical.
Remove the reverse test in plugin/kubernetes which is a copy-paste from
a core test (which has since been fixed).
Remove MustNormalize as it has no plugin users.
This change is not backwards compatible to plugins that have a ZONE
argument that they parse in the setup util.
All in-tree plugins have been updated.
Signed-off-by: Miek Gieben <miek@miek.nl>
To combat label cardinality explosions remove the type from metrics.
This was most severe in the histogram for request duration, remove it
there.
It's also highlighted difference between grpc and forward code, where
forward did use type and grpc didn't; getting rid of all that "fixes"
that discrepancy
Move monitor.go back into the vars directory and make it private again.
Also name it slightly better
Fixes: #4507
Signed-off-by: Miek Gieben <miek@miek.nl>
* plugin/forward Add rcode and rtype to request_duration_seconds metric
Signed-off-by: Maxime Ginters <maxime.ginters@shopify.com>
* Control the cardinality of query type
Signed-off-by: Maxime Ginters <maxime.ginters@shopify.com>
* Speed up testing
* make notification run in the background, this recudes the test_readme
time from 18s to 0.10s
* reduce time for zone reload
* TestServeDNSConcurrent remove entirely. This took a whopping 58s for
... ? A few minutes staring didn't reveal wth it is actually testing.
Making values smaller revealed race conditions in the tests. Remove
entirely.
* Move many interval values to variables so we can reset them to short
values for the tests.
* test_large_axfr: make the zone smaller. The number used 64K has no
rational, make it 64/10 to speed up.
* TestProxyThreeWay: use client with shorter timeout
A few random tidbits in other tests.
Total time saved: 177s (almost 3m) - which makes it worthwhile again to
run the test locally:
this branch:
~~~
ok github.com/coredns/coredns/test 10.437s
cd plugin; time go t ./...
5,51s user 7,51s system 11,15s elapsed 744%CPU (
~~~
master:
~~~
ok github.com/coredns/coredns/test 35.252s
cd plugin; time go t ./...
157,64s user 15,39s system 50,05s elapsed 345%CPU ()
~~~
tests/ -25s
plugins/ -40s
This brings the total on 20s, and another 10s can be saved by fixing
dnstapio. Moving this to 5s would be even better, but 10s is also nice.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Also 0.01
Signed-off-by: Miek Gieben <miek@miek.nl>
* For caddy v1 in our org
This RP changes all imports for caddyserver/caddy to coredns/caddy. This
is the v1 code of caddy.
For the coredns/caddy repo the following changes have been made:
* anything not needed by us is deleted
* all `telemetry` stuff is deleted
* all its import paths are also changed to point to coredns/caddy
* the v1 branch has been moved to the master branch
* a v1.1.0 tag has been added to signal the latest release
Signed-off-by: Miek Gieben <miek@miek.nl>
* Fix imports
Signed-off-by: Miek Gieben <miek@miek.nl>
* Group coredns/caddy with out plugins
Signed-off-by: Miek Gieben <miek@miek.nl>
* remove this file
Signed-off-by: Miek Gieben <miek@miek.nl>
* Relax import ordering
github.com/coredns is now also a coredns dep, this makes
github.com/coredns/caddy fit more natural in the list.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Fix final import
Signed-off-by: Miek Gieben <miek@miek.nl>
Cleanup a variety of metric issues.
* Eliminate department of redundancy "count_total" naming.
* Use the plural of the unit when appropriate. (ex, "requests")
* Remove label names from metric names where appropriate. (ex, "rcode")
* Simplify request metrics by consolidating type label in to the base
request counter.
* Re-generate man pages.
Signed-off-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
* Move to CODEOWNERS
No change in who own what; just a move to CODEOWNERS. This allows
dreck cleanups.
Added .dreck.yaml for alias and exec.
Fixes: #3486
Signed-off-by: Miek Gieben <miek@miek.nl>
* stickler bot
Signed-off-by: Miek Gieben <miek@miek.nl>
* sort the file
Signed-off-by: Miek Gieben <miek@miek.nl>
Abstract the caddy call and make it simpler.
See #3261 for some part of the discussion.
Go from:
~~~ go
func init() {
caddy.RegisterPlugin("any", caddy.Plugin{
ServerType: "dns",
Action: setup,
})
}
~~~
To:
~~~ go
func init() { plugin.Register("any", setup) }
~~~
This requires some external documents in coredns.io to be updated as
well; the old way still works, so it's backwards compatible.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Update Caddy to 1.0.1, and update import path
This fix updates caddy to 1.0.1 and also
updates the import path to github.com/caddyserver/caddy
This fix fixes 2959
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* Also update plugin.cfg
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
* Update and bump zplugin.go
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
This fixes a data race on the listener(s) that get started in the
metrics plugins.
It also restore pkg/uniq to its former glory and removes and state being
carried in there; this means for metrics that registry.go was to
replicate that behavior *with* locking (as pkg/uniq doesn't do, or need
that).
Also renamed uniqAddr to just u, to make it slightly shorter.
Signed-off-by: Miek Gieben <miek@miek.nl>
Fix metrics endpoint on a failed reload, follows the same lines as the
previous PRs, see for e.g. 076b8d4f. Test with a Corefile with 2 server
blocks and metrics enabled and then introducing a syntax error:
~~~
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'jfkdjk'
~~~
And then curl-ing the metrics endpoint.
See #2659 and as this is the last one.
Fixes: #2659
Getting this all right turns out to be tricky, also it's not easy
testable which is something I should fix.
Signed-off-by: Miek Gieben <miek@miek.nl>
* Don't double report metrics on error
When there is an error use a different function to report the metrics,
in case the plugin chain handled the request the metrics are already
reported.
Fixes: #2717
Signed-off-by: Miek Gieben <miek@miek.nl>
* Compile again
Signed-off-by: Miek Gieben <miek@miek.nl>
* more
Signed-off-by: Miek Gieben <miek@miek.nl>
* Remove server addr from the context
This was added twice, just leave the server which also holds the
address.
Conflicts with #2719 but should be easy to fix.
Signed-off-by: Miek Gieben <miek@miek.nl>
* doesn't need server context
Signed-off-by: Miek Gieben <miek@miek.nl>
* Add a GaugeVec for enabled plugins monitoring.
Signed-off-by: Jiacheng Xu <xjcmaxwellcjx@gmail.com>
* Add server label and zone label for enable_plugin matric.
* Add a test for PluginEnabled metric
* Add description for enabledPlugin metric.
* Change the description for the enabledPlugin metric.
* Reset the enabledPlugin metric when restart the server.
* Add the bug session for enabledPlugin metric.
* Remove the resolveTCPAddr
The [ADDRESS] field in the metrics plugin is not explained in a manner
that makes it immediately obvious, that what we are talking about here
is a listening address.
* Stop importing testing in the main binary
Stop importing "testing" into the main binary:
* test/helpers.go imported it; remote that and change function signature
* update all tests that use this
Signed-off-by: Miek Gieben <miek@miek.nl>
* Drop import testing from metrics plugin
Signed-off-by: Miek Gieben <miek@miek.nl>
* more fiddling
Signed-off-by: Miek Gieben <miek@miek.nl>
This clear out the remaining map[x]bool usage and moves the bool to an
empty struct.
Two note worthy other changes:
* EnableChaos in the server is now also exported to make it show up in
the documentation.
* The auto plugin is left as is, because there the boolean is
explicitaly set to false to signal 'to-be-deleted' and the key is left
as-is.
Signed-off-by: Miek Gieben <miek@miek.nl>
* - UT on metrics verifying that all plugins of all blocs have their metrics collectors declared
* - fix error msg
* - redirect Registry of metric to the one that handle the listener
- allow duplicate of metrics collector on the same Registry (case of same plugin in 2 blocs listening metrics on the same address)
* - fix change of signature
* - ensure cleaning metrics before starting the test (metrics collectors are global vars .. and re-used by several tests)
* - I think I fixed this test. Ensure correct mn of hits and clean metrics before test.
* - fix typo in error msg - proposed at review
* - fix typo in comment
* - remove ResetMetrics functions
- change a way to test the numeric metrics : get the diff between begin and end of test
* - oops. removing debug logs
* Create test to verify correct listener behavior
* Create Unset function to remove todo items
* Reset address for prometheus listener before restarting
* Add inline documentation for Unset function
* Make shutdownTimeout a constant and change to five seconds
* Revert ForEach behavior in uniq package
* Clean up tests logging
This cleans up the travis logs so you can see the failures better.
Older tests in tests/ would call log.SetOutput(ioutil.Discard) in
a haphazard way. This add log.Discard and put an `init` function in each
package's dir (no way to do this globally). The cleanup in tests/ is
clear.
All plugins also got this init function to have some uniformity and kill
any (future) logging there in the tests as well.
There is a one-off in pkg/healthcheck because that does log.
Signed-off-by: Miek Gieben <miek@miek.nl>
* bring back original log_test.go
Signed-off-by: Miek Gieben <miek@miek.nl>
* suppress logging here as well
Signed-off-by: Miek Gieben <miek@miek.nl>
Prevent future; "remove trailing whitespace" PR, but adding a simple
presubmit that checks for this.
This presubmit flagged quite some offenders, remove all trailing
whitespace from. Apart from that there aren't any other changes.
Signed-off-by: Miek Gieben <miek@miek.nl>
* update docs
* plugins: use plugin specific logging
Hooking up pkg/log also changed NewWithPlugin to just take a string
instead of a plugin.Handler as that is more flexible and for instance
the Root "plugin" doesn't implement it fully.
Same logging from the reload plugin:
.:1043
2018/04/22 08:56:37 [INFO] CoreDNS-1.1.1
2018/04/22 08:56:37 [INFO] linux/amd64, go1.10.1,
CoreDNS-1.1.1
linux/amd64, go1.10.1,
2018/04/22 08:56:37 [INFO] plugin/reload: Running configuration MD5 = ec4c9c55cd19759ea1c46b8c45742b06
2018/04/22 08:56:54 [INFO] Reloading
2018/04/22 08:56:54 [INFO] plugin/reload: Running configuration MD5 = 9e2bfdd85bdc9cceb740ba9c80f34c1a
2018/04/22 08:56:54 [INFO] Reloading complete
* update docs
* better doc
* reload: use OnRestart
Close the listener on OnRestart for health and metrics so the default
setup function can setup the listener when the plugin is "starting up".
Lightly test with some SIGUSR1-ing. Also checked the reload plugin with
this, seems fine:
.com.:1043
.:1043
2018/04/20 15:01:25 [INFO] CoreDNS-1.1.1
2018/04/20 15:01:25 [INFO] linux/amd64, go1.10,
CoreDNS-1.1.1
linux/amd64, go1.10,
2018/04/20 15:01:25 [INFO] Running configuration MD5 = aa8b3f03946fb60546ca1f725d482714
2018/04/20 15:02:01 [INFO] Reloading
2018/04/20 15:02:01 [INFO] Running configuration MD5 = b34a96d99e01db4015a892212560155f
2018/04/20 15:02:01 [INFO] Reloading complete
^C2018/04/20 15:02:06 [INFO] SIGINT: Shutting down
With this corefile:
.com {
proxy . 127.0.0.1:53
prometheus :9054
whoami
reload
}
. {
proxy . 127.0.0.1:53
prometheus :9054
whoami
reload
}
The prometheus port was 9053, changed that to 54 so reload would pick it
up.
From a cursory look it seems this also fixes:
Fixes#1604#1618#1686#1492
* At least make it test
* Use onfinalshutdown
* reload: add reload test
This test #1604 adn right now fails.
* Address review comments
* Add bug section explaining things a bit
* compile tests
* Fix tests
* fixes
* slightly less crazy
* try to make prometheus setup less confusing
* Use ephermal port for test
* Don't use the listener
* These are shared between goroutines, just use the boolean in the main
structure.
* Fix text in the reload README,
* Set addr to TODO once stopping it
* Morph fturb's comment into test, to test reload and scrape health and
metric endpoint
* global: move to context
Move from golang.org/x/net/context to std lib's context.
Change done with:
for i in $(grep -l '/context' **/*.go); do sed -e 's|golang.org/x/net/context|context|' -i $i; echo $i; done
for i in **/*.go; do goimports -w $i; done
* drop from dns.pb.go as well
* Update all plugins to use plugin/pkg/log
I wish this could have been done with sed. Alas manually changed all
callers to use the new plugin/pkg/log package.
* Error -> Info
* Add docs to debug plugin as well
* plugin/metrics: add 'server' label
This uses the new WithServer(ctx) to get the current server from the
context.
First in a larger refactor to make all plugins do this.
* compile
* compile
* lala test
* compile and test
* typos
* Dont duplicate the code