From 91413606a31c858f495592ca8f45a57b471c0de6 Mon Sep 17 00:00:00 2001 From: Stephen J Day Date: Thu, 2 Apr 2015 22:09:25 -0700 Subject: [PATCH] Document event notification system An overview the notification system is provided, covering topics on architecture, configuration, implementation and formats. This may need to filled out with further details covering format specifications. Signed-off-by: Stephen J Day --- doc/notifications.md | 308 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 306 insertions(+), 2 deletions(-) diff --git a/doc/notifications.md b/doc/notifications.md index 5dbd8f347..450f17b21 100644 --- a/doc/notifications.md +++ b/doc/notifications.md @@ -1,4 +1,308 @@ # Notifications -**TODO(stevvooe)** Cover use and deployment of webhook notifications. Link to -description in architecture documentation. \ No newline at end of file +> **TODO:** Link out to the architecture document on notification support. + +The Registry supports sending webhook notifications in response to events +happening within the registry. Notifications are sent in response to manifest +pushes and pulls and layer pushes and pulls. These actions are serialized into +events. The events are queued into a registry-internal broadcast system which +queues and dispatches events to [_Endpoints_](#endpoints). + +> **TODO:** Insert diagram of event system. + +## Endpoints + +Notifications are sent to _endpoints_ via HTTP requests. Each configurated +endpoint has isolated queues, retry configuration and http targets within each +instance of a registry. When an action happens within the registry, it is +converted into an event which is dropped into an inmemory queue. When the +event reaches the end of the queue, an http request is made to the endpoint +until the request succeeds. The events are sent serially to each endpoint but +order is not guaranteed. + +## Configuration + +To setup a registry instance to send notifications to endpoints, one must add +them to the configuration. A simple example follows: + +```yaml +notifications: + endpoints: + - name: alistener + url: https://mylistener.example.com/event + headers: + Authorization: [Bearer ] + timeout: 500ms + threshold: 5 + backoff: 1s +``` + +The above would configure the registry with an endpoint to send events to +"https://mylistener.example.com/event", with the header "Authorization: Bearer +". The request would timeout after 500 milliseconds. If +5 failures happen consecutively, the registry will backoff for 1 second before +trying again. + +For details on the fields, please see the [configuration documentation](configuration.md#notifications). + +A properly configured endpoint should lead to a log message from the registry +upon startup: + +``` +INFO[0000] configuring endpoint alistener (https://mylistener.example.com/event), timeout=500ms, headers=map[Authorization:[Bearer ]] app.id=812bfeb2-62d6-43cf-b0c6-152f541618a3 environment=development service=registry +``` + +## Events + +Events have a well-defined JSON structure and are sent as the body of +notification requests. One or more events are sent in a structure called an +envelope. Each event has a unique id that can be used to uniqify incoming +requests, if required. Along with that, an _action_ is provided with a +_target, identifying the object mutated during the event. + +The fields available in an event are described in detail in the +[godoc](http://godoc.org/github.com/docker/distribution/notifications#Event). + +> **TODO:** Let's break out the fields here rather than rely on the godoc. + +The following is an example of a JSON event, sent in response to the push of a +manifest: + +```json +{ + "id": "asdf-asdf-asdf-asdf-0", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.distribution.manifest.v1+json", + "length": 1, + "digest": "sha256:0123456789abcdef0", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } +} +``` + +## Envelope + +The envelope contains one or more events, with the following json structure: + +```json +{ + "events": [ ... ], +} +``` + +While events may be sent in the same envelope, the set of events within that +envelope have no implied relationship. For example, the registry may choose to +group unrelated events and send them in the same envelope to reduce the total +number of requests. + +The full package has the mediatype +"application/vnd.docker.distribution.events.v1+json", which will be set on the +request coming to an endpoint. + +An example of a full event may look as follows: + +```json +GET /callback +Host: application/vnd.docker.distribution.events.v1+json +Authorization: Bearer +Content-Type: application/vnd.docker.distribution.events.v1+json + +{ + "events": [ + { + "id": "asdf-asdf-asdf-asdf-0", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.distribution.manifest.v1+json", + "length": 1, + "digest": "sha256:0123456789abcdef0", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } + }, + { + "id": "asdf-asdf-asdf-asdf-1", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.container.image.rootfs.diff+x-gtar", + "length": 2, + "digest": "tarsum.v2+sha256:0123456789abcdef1", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } + }, + { + "id": "asdf-asdf-asdf-asdf-2", + "timestamp": "2006-01-02T15:04:05Z", + "action": "push", + "target": { + "mediaType": "application/vnd.docker.container.image.rootfs.diff+x-gtar", + "length": 3, + "digest": "tarsum.v2+sha256:0123456789abcdef2", + "repository": "library/test", + "url": "http://example.com/v2/library/test/manifests/latest" + }, + "request": { + "id": "asdfasdf", + "addr": "client.local", + "host": "registrycluster.local", + "method": "PUT", + "useragent": "test/0.1" + }, + "actor": { + "name": "test-actor" + }, + "source": { + "addr": "hostname.local:port" + } + } + ] +} +``` + +## Responses + +The registry is fairly accepting of the response codes from endpoints. If an +endpoint responds with any 2xx or 3xx response code (after following +redirects), the message will be considered delivered and discarded. + +In turn, it is recommended that endpoints are accepting of incoming responses, +as well. While the format of event envelopes are standardized by media type, +any "pickyness" about validation may cause the queue to backup on the +registry. + +## Monitoring + +The state of the endpoints are reported via the debug/vars http interface, +usually configured to "http://localhost:5001/debug/vars". Information such as +configuration and metrics are available by endpoint. + +The following provides and example of a few endpoints that have experience +several failures and have since recovered: + +```json +"notifications":{ + "endpoints":[ + { + "name":"local-8082", + "url":"http://localhost:5003/callback", + "Headers":{ + "Authorization":[ + "Bearer \u003can example token\u003e" + ] + }, + "Timeout":1000000000, + "Threshold":10, + "Backoff":1000000000, + "Metrics":{ + "Pending":76, + "Events":76, + "Successes":0, + "Failures":0, + "Errors":46, + "Statuses":{ + + } + } + }, + { + "name":"local-8083", + "url":"http://localhost:8083/callback", + "Headers":null, + "Timeout":1000000000, + "Threshold":10, + "Backoff":1000000000, + "Metrics":{ + "Pending":0, + "Events":76, + "Successes":76, + "Failures":0, + "Errors":28, + "Statuses":{ + "202 Accepted":76 + } + } + } + ] +} +``` + +If using notification as part of a larger application, it is _critical_ to +monitor the size ("Pending" above) of the endpoint queues. If failures or +queue sizes are increasing, it can indicate a larger problem. + +The logs are also a valuable resource for monitoring problems. A failing +endpoint will lead to messages similar to the following: + +``` +ERRO[0340] retryingsink: error writing events: httpSink{http://localhost:5003/callback}: error posting: Post http://localhost:5003/callback: dial tcp 127.0.0.1:5003: connection refused, retrying +WARN[0340] httpSink{http://localhost:5003/callback} encountered too many errors, backing off +``` + +The above indicates that several errors have led to a backoff and the registry +will wait before retrying. + +## Considerations + +Currently, the queues are inmemory, so endpoints should be _reasonably +reliable_. They are designed to make a best-effort to send the messages but if +an instance is lost, messages may be dropped. If an endpoint goes down, care +should be taken to ensure that the registry instance is not terminated before +the endpoint comes back up or messages will be lost. + +This can be mitigated by running endpoints in close proximity to the registry +instances. One could run an endpoint that pages to disk and then forwards a +request to provide better durability. + +The notification system is designed around a series of interchangeable _sinks_ +which can be wired up to achieve interesting behavior. If this system doesn't +provide acceptable guarantees, adding a transactional `Sink` to the registry +is a possibility, although it may have an effect on request service time. +Please see the +[godoc](http://godoc.org/github.com/docker/distribution/notifications#Sink) +for more information. +