distribution/doc/SPEC.md
2014-12-18 16:09:35 -08:00

27 KiB

Proposal: JSON Registry API V2.1

Abstract

The docker registry is a service to manage information about docker images and enable their distribution. While the current registry is usable, there are several problems with the architecture that have led to this proposal. For relevant details, please see the following issues:

The main driver of this proposal are changes to the docker the image format, covered in docker/docker#8093. The new, self-contained image manifest simplifies the image definition and the underlying backend layout. To reduce bandwidth usage, the new registry will be architected to avoid uploading existing layers and will support resumable layer uploads.

While out of scope for this specification, the URI layout of the new API will be structured to support a rich Authentication and Authorization model by leveraging namespaces.

Furthermore, to bring docker registry in line with docker core, the registry is written in Go.

Scope

This proposal covers the URL layout and protocols of the Docker Registry V2 JSON API. This will affect the docker core registry API and the rewrite of docker-registry.

This includes the following features:

  • Namespace-oriented URI Layout
  • PUSH/PULL registry server for V2 image manifest format
  • Resumable layer PUSH support
  • V2 Client library implementation

While authentication and authorization support will influence this specification, details of the protocol will be left to a future specification. Other features marked as next generation will be incorporated when the initial support is complete. Please see the road map for details.

Use Cases

For the most part, the use cases of the former registry API apply to the new version. Differentiating uses cases are covered below.

Resumable Push

Company X's build servers lose connectivity to docker registry before completing an image layer transfer. After connectivity returns, the build server attempts to re-upload the image. The registry notifies the build server that the upload has already been partially attempted. The build server responds by only sending the remaining data to complete the image file.

Resumable Pull

Company X is having more connectivity problems but this time in their deployment datacenter. When downloading an image, the connection is interrupted before completion. The client keeps the partial data and uses http Range requests to avoid downloading repeated data.

Layer Upload De-duplication

Company Y's build system creates two identical docker layers from build processes A and B. Build process A completes uploading the layer before B. When process B attempts to upload the layer, the registry indicates that its not necessary because the layer is already known.

If process A and B upload the same layer at the same time, both operations will proceed and the first to complete will be stored in the registry (Note: we may modify this to prevent dogpile with some locking mechanism).

Access Control

Company X would like to control which developers can push to which repositories. By leveraging the URI format of the V2 registry, they can control who is able to access which repository, who can pull images and who can push layers.

Dependencies

Initially, a V2 client will be developed in conjunction with the new registry service to facilitate rich testing and verification. Once this is ready, the new client will be used in docker to communicate with V2 registries.

Proposal

This section covers proposed client flows and details of the proposed API endpoints. All endpoints will be prefixed by the API version and the repository name:

/v2/<name>/

For example, an API endpoint that will work with the library/ubuntu repository, the URI prefix will be:

/v2/library/ubuntu/

This scheme will provide rich access control over various operations and methods using the URI prefix and http methods that can be controlled in variety of ways.

Classically, repository names have always been two path components where each path component is less than 30 characters. The V2 registry API does not enforce this. The rules for a repository name are as follows:

  1. A repository name is broken up into path components. A component of a repository name must be at least two characters, optionally separated by periods, dashes or underscores. More strictly, it must match the regular expression [a-z0-9]+(?:[._-][a-z0-9]+)* and the matched result must be 2 or more characters in length.
  2. The name of a repository must have at least two path components, separated by a forward slash.
  3. The total length of a repository name, including slashes, must be less the 256 characters.

These name requirements only apply to the registry API and should accept a superset of what is supported by other docker community components.

API Methods

A detailed list of methods and URIs are covered in the table below:

Method Path Entity Description
GET /v2/ Check Check that the endpoint implements Docker Registry API V2.
GET /v2/<name>/tags/list Tags Fetch the tags under the repository identified by name.
GET /v2/<name>/manifests/<tag> Manifest Fetch the manifest identified by name and tag.
PUT /v2/<name>/manifests/<tag> Manifest Put the manifest identified by name and tag.
DELETE /v2/<name>/manifests/<tag> Manifest Delete the manifest identified by name and tag.
GET /v2/<name>/blobs/<digest> Blob Retrieve the blob from the registry identified by digest.
HEAD /v2/<name>/blobs/<digest> Blob Check if the blob is known to the registry.
POST /v2/<name>/blobs/uploads/ Blob Upload Initiate a resumable blob upload. If successful, an upload location will be provided to complete the upload. Optionally, if the digest parameter is present, the request body will be used to complete the upload in a single request.
GET /v2/<name>/blobs/uploads/<uuid> Blob Upload Retrieve status of upload identified by uuid. The primary purpose of this endpoint is to resolve the current status of a resumable upload.
HEAD /v2/<name>/blobs/uploads/<uuid> Blob Upload Retrieve status of upload identified by uuid. This is identical to the GET request.
PATCH /v2/<name>/blobs/uploads/<uuid> Blob Upload Upload a chunk of data for the specified upload.
PUT /v2/<name>/blobs/uploads/<uuid> Blob Upload Complete the upload specified by uuid, optionally appending the body as the final chunk.
DELETE /v2/<name>/blobs/uploads/<uuid> Blob Upload Cancel outstanding upload processes, releasing associated resources. If this is not called, the unfinished uploads will eventually timeout.

All endpoints should support aggressive http caching, compression and range headers, where appropriate. Details of each method are covered in the following sections.

The new API will attempt to leverage HTTP semantics where possible but may break from standards to implement targeted features.

Errors

Actionable failure conditions, covered in detail in their relevant sections, will be reported as part of 4xx responses, in a json response body. One or more errors will be returned in the following format:

{
    "errors:" [{
            "code": <error identifier>,
            "message": <message describing condition>,
            "detail": <unstructured>
        },
        ...
    ]
}

The code field will be a unique identifier, all caps with underscores by convention. The message field will be a human readable string. The optional detail field may contain arbitrary json data providing information the client can use to resolve the issue.

The error codes encountered via the API are enumerated in the following table:

Code Message Description HTTPStatusCodes
UNKNOWN unknown error Generic error returned when the error does not have an API classification. Any
DIGEST_INVALID provided digest did not match uploaded content When a blob is uploaded, the registry will check that the content matches the digest provided by the client. The error may include a detail structure with the key "digest", including the invalid digest string. This error may also be returned when a manifest includes an invalid layer digest. 400, 404
SIZE_INVALID provided length did not match content length When a layer is uploaded, the provided size will be checked against the uploaded content. If they do not match, this error will be returned. 400
NAME_INVALID manifest name did not match URI During a manifest upload, if the name in the manifest does not match the uri name, this error will be returned. 400, 404
TAG_INVALID manifest tag did not match URI During a manifest upload, if the tag in the manifest does not match the uri tag, this error will be returned. 400, 404
NAME_UNKNOWN repository name not known to registry This is returned if the name used during an operation is unknown to the registry. 404
MANIFEST_UNKNOWN manifest unknown This error is returned when the manifest, identified by name and tag is unknown to the repository. 404
MANIFEST_INVALID manifest invalid During upload, manifests undergo several checks ensuring validity. If those checks fail, this error may be returned, unless a more specific error is included. The detail will contain information the failed validation. 400
MANIFEST_UNVERIFIED manifest failed signature verification During manifest upload, if the manifest fails signature verification, this error will be returned. 400
BLOB_UNKNOWN blob unknown to registry This error may be returned when a blob is unknown to the registry in a specified repository. This can be returned with a standard get or if a manifest references an unknown layer during upload. 400, 404
BLOB_UPLOAD_UNKNOWN blob upload unknown to registry If a blob upload has been cancelled or was never started, this error code may be returned. 404

While the client can take action on certain error codes, the registry may add new error codes over time. All client implementations should treat unknown error codes as UNKNOWN, allowing future error codes to be added without breaking API compatibility. For the purposes of the specification error codes will only be added and never removed.

API Version Check

A minimal endpoint, mounted at /v2/ will provide version support information based on its response statuses. The request format is as follows:

GET /v2/

If a 200 OK response is returned, the registry implements the V2(.1) registry API and the client may proceed safely with other V2 operations. Optionally, the response may contain information about the supported paths in the response body. The client should be prepared to ignore this data.

If a 401 Unauthorized response is returned, the client should take action based on the contents of the "WWW-Authenticate" header and try the endpoint again. Depending on access control setup, the client may still have to authenticate against different resources, even if this check succeeds.

If 404 Not Found response status, or other unexpected status, is returned, the client should proceed with the assumption that the registry does not implement V2 of the API.

Pulling An Image

An "image" is a combination of a JSON manifest and individual layer files. The process of pulling an image centers around retrieving these two components.

The first step in pulling an image is to retrieve the manifest. For reference, the relevant manifest fields for the registry are the following:

field description
name The name of the image.
tag The tag for this version of the image.
fsLayers A list of layer descriptors (including tarsum)
signature A JWS used to verify the manifest content

For more information about the manifest format, please see docker/docker#8093.

When the manifest is in hand, the client must verify the signature to ensure the names and layers are valid. Once confirmed, the client will then use the tarsums to download the individual layers. Layers are stored in as blobs in the V2 registry API, keyed by their tarsum digest.

The API details follow.

Pulling an Image Manifest

The image manifest can be fetched with the following url:

GET /v2/<name>/manifests/<tag>

The "name" and "tag" parameter identify the image and are required.

A 404 Not Found response will be returned if the image is unknown to the registry. If the image exists and the response is successful, the image manifest will be returned, with the following format (see docker/docker#8093 for details):

{
   "name": <name>,
   "tag": <tag>,
   "fsLayers": [
      {
         "blobSum": <tarsum>
      },
      ...
    ]
   ],
   "history": <v1 images>,
   "signature": <JWS>
}

The client should verify the returned manifest signature for authenticity before fetching layers.

Pulling a Layer

Layers are stored in the blob portion of the registry, keyed by tarsum digest. Pulling a layer is carried out by a standard http request. The URL is as follows:

GET /v2/<name>/blobs/<tarsum>

Access to a layer will be gated by the name of the repository but is identified uniquely in the registry by tarsum. The tarsum parameter is an opaque field, to be interpreted by the tarsum library.

This endpoint may issue a 307 (302 for <HTTP 1.1) redirect to another service for downloading the layer and clients should be prepared to handle redirects.

This endpoint should support aggressive HTTP caching for image layers. Support for Etags, modification dates and other cache control headers should be included. To allow for incremental downloads, Range requests should be supported, as well.

Pushing An Image

Pushing an image works in the opposite order as a pull. After assembling the image manifest, the client must first push the individual layers. When the layers are fully pushed into the registry, the client should upload the signed manifest.

The details of each step of the process are covered in the following sections.

Pushing a Layer

All layer uploads use two steps to manage the upload process. The first step starts the upload in the registry service, returning a url to carry out the second step. The second step uses the upload url to transfer the actual data. Uploads are started with a POST request which returns a url that can be used to push data and check upload status.

The Location header will be used to communicate the upload location after each request. While it won't change in the this specification, clients should use the most recent value returned by the API.

Starting An Upload

To begin the process, a POST request should be issued in the following format:

POST /v2/<name>/blobs/uploads/

The parameters of this request are the image namespace under which the layer will be linked. Responses to this request are covered below.

Existing Layers

The existence of a layer can be checked via a HEAD request to the blob store API. The request should be formatted as follows:

HEAD /v2/<name>/blobs/<digest>

If the layer with the tarsum specified in digest is available, a 200 OK response will be received, with no actual body content (this is according to http specification). The response will look as follows:

200 OK
Content-Length: <length of blob>

When this response is received, the client can assume that the layer is already available in the registry under the given name and should take no further action to upload the layer. Note that the binary digests may differ for the existing registry layer, but the tarsums will be guaranteed to match.

Uploading the Layer

If the POST request is successful, a 202 Accepted response will be returned with the upload URL in the Location header:

202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>
Content-Length: 0

The rest of the upload process can be carried out with the returned url, called the "Upload URL" from the Location header. All responses to the upload url, whether sending data or getting status, will be in this format. Though the URI format (/v2/<name>/blobs/uploads/<uuid>) for the Location header is specified, clients should treat it as an opaque url and should never try to assemble the it. While the uuid parameter may be an actual UUID, this proposal imposes no constraints on the format and clients should never impose any.

Upload Progress

The progress and chunk coordination of the upload process will be coordinated through the Range header. While this is a non-standard use of the Range header, there are examples of [similar approaches](https://developers.google.c om/youtube/v3/guides/using_resumable_upload_protocol) in APIs with heavy use. For an upload that just started, for an example with a 1000 byte layer file, the Range header would be as follows:

Range: bytes=0-0

To get the status of an upload, issue a GET request to the upload URL:

GET /v2/<name>/blobs/uploads/<uuid>
Host: <registry host>

The response will be similar to the above, except will return 204 status:

204 No Content
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>

Note that the HTTP Range header byte ranges are inclusive and that will be honored, even in non-standard use cases.

Monolithic Upload

A monolithic upload is simply a chunked upload with a single chunk and may be favored by clients that would like to avoided the complexity of chunking. To carry out a "monolithic" upload, one can simply put the entire content blob to the provided URL:

PUT /v2/<name>/blobs/uploads/<uuid>?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of layer>
Content-Type: application/octet-stream

<Layer Binary Data>

The "digest" parameter must be included with the PUT request. Please see the Completed Upload section for details on the parameters and expected responses.

Additionally, the download can be completed with a single POST request to the uploads endpoint, including the "size" and "digest" parameters:

POST /v2/<name>/blobs/uploads/?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of layer>
Content-Type: application/octet-stream
  
<Layer Binary Data>

On the registry service, this should allocate a download, accept and verify the data and return the same response as the final chunk of an upload. If the POST request fails collecting the data in any way, the registry should attempt to return an error response to the client with the Location header providing a place to continue the download.

The single POST method is provided for convenience and most clients should implement POST + PUT to support reliable resume of uploads.

Chunked Upload

To carry out an upload of a chunk, the client can specify a range header and only include that part of the layer file:

PATCH /v2/<name>/blobs/uploads/<uuid>
Content-Length: <size of chunk>
Content-Range: <start of range>-<end of range>
Content-Type: application/octet-stream

<Layer Chunk Binary Data>

There is no enforcement on layer chunk splits other than that the server must receive them in order. The server may enforce a minimum chunk size. If the server cannot accept the chunk, a 416 Requested Range Not Satisfiable response will be returned and will include a Range header indicating the current status:

416 Requested Range Not Satisfiable
Location: /v2/<name>/blobs/uploads/<uuid>
Range: 0-<last valid range>
Content-Length: 0

If this response is received, the client should resume from the "last valid range" and upload the subsequent chunk. A 416 will be returned under the following conditions:

  • Invalid Content-Range header format
  • Out of order chunk: the range of the next chunk must start after the "last valid range" from the last response.

When a chunk is accepted as part of the upload, a 202 Accepted response will be returned, including a Range header with the current upload status:

202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>
Content-Length: 0
Completed Upload

For an upload to be considered complete, the client must submit a PUT request on the upload endpoint with a digest parameter. If it is not provided, the download will not be considered complete. The format for the final chunk will be as follows:

PUT /v2/<name>/blob/uploads/<uuid>?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of chunk>
Content-Range: <start of range>-<end of range>
Content-Type: application/octet-stream

<Last Layer Chunk Binary Data>

Optionally, if all chunks have already been uploaded, a PUT request with a digest parameter and zero-length body may be sent to complete and validated the upload. Multiple "digest" parameters may be provided with different digests. The server may verify none or all of them but must notify the client if the content is rejected.

When the last chunk is received and the layer has been validated, the client will receive a 201 Created response:

201 Created
Location: /v2/<name>/blobs/<tarsum>
Content-Length: 0

The Location header will contain the registry URL to access the accepted layer file.

Digest Parameter

The "digest" parameter is designed as an opaque parameter to support verification of a successful transfer. The initial version of the registry API will support a tarsum digest, in the standard tarsum format. For example, a HTTP URI parameter might be as follows:

tarsum.v1+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b

Given this parameter, the registry will verify that the provided content does result in this tarsum. Optionally, the registry can support other other digest parameters for non-tarfile content stored as a layer. A regular hash digest might be specified as follows:

sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b

Such a parameter would be used to verify that the binary content (as opposed to the tar content) would be verified at the end of the upload process.

For the initial version, registry servers are only required to support the tarsum format.

Canceling an Upload

An upload can be cancelled by issuing a DELETE request to the upload endpoint. The format will be as follows:

DELETE /v2/<name>/blobs/uploads/<uuid>

After this request is issued, the upload uuid will no longer be valid and the registry server will dump all intermediate data. While uploads will time out if not completed, clients should issue this request if they encounter a fatal error but still have the ability to issue an http request.

Errors

If an 502, 503 or 504 error is received, the client should assume that the download can proceed due to a temporary condition, honoring the appropriate retry mechanism. Other 5xx errors should be treated as terminal.

If there is a problem with the upload, a 4xx error will be returned indicating the problem. After receiving a 4xx response (except 416, as called out above), the upload will be considered failed and the client should take appropriate action.

The following table covers the various error conditions that may be returned after completing a layer upload:

Code Message
DIGEST_INVALID provided digest did not match uploaded content
SIZE_INVALID provided size did not match content size

Note that the upload url will not be available forever. If the upload uuid is unknown to the registry, a 404 Not Found response will be returned and the client must restart the upload process.

Pushing an Image Manifest

Once all of the layers for an image are uploaded, the client can upload the image manifest. An image can be pushed using the following request formats:

PUT /v2/<name>/manifests/<tag>

{
   "name": <name>,
   "tag": <tag>,
   "fsLayers": [
      {
         "blobSum": <tarsum>
      },
      ...
    ]
   ],
   "history": <v1 images>,
   "signature": <JWS>,
   ...
}

The name and tag fields of the response body must match those specified in the URL.

If there is a problem with pushing the manifest, a relevant 4xx response will be returned with a JSON error message. The following table covers the various error conditions and their corresponding codes:

Code Message
NAME_INVALID Manifest name did not match URI
TAG_INVALID Manifest tag did not match URI
MANIFEST_INVALID Returned when an invalid manifest is received
MANIFEST_UNVERIFIED Manifest failed signature validation
BLOB_UNKNOWN Referenced layer not available

For the UNKNOWN_LAYER error, the detail field of the error response will have an "unknown" field with information about the missing layer. For now, that will just be the tarsum. There will be an error returned for each unknown blob. The response format will be as follows:

{
    "errors:" [{
            "code": "UNKNOWN_LAYER",
            "message": "Referenced layer not available",
            "detail": {
                "unknown": {
                    "blobSum": <tarsum>
                 }
            }
        },
        ...
    ]
}

Listing Image Tags

It may be necessary to list all of the tags under a given repository. The tags for an image repository can be retrieved with the following request:

GET /v2/<name>/tags/list

The response will be in the following format:

200 OK
Content-Type: application/json

{
    "name": <name>,
    "tags": [
        <tag>,
        ...
    ]
}

For repositories with a large number of tags, this response may be quite large, so care should be taken by the client when parsing the response to reduce copying.

Deleting an Image

An image may be deleted from the registry via its name and tag. A delete may be issued with the following request format:

DELETE /v2/<name>/manifests/<tag>

If the image exists and has been successfully deleted, the following response will be issued:

202 Accepted
Content-Length: None

If the image had already been deleted or did not exist, a 404 Not Found response will be issued instead.

Roadmap

  • Write Registry REST API V2 proposal
    • Solicit feedback
  • Implement V2 API server
    • Basic Layer API
    • Basic Image API
    • Resumable upload support
  • Implement V2 API client
  • Implement API compliance tests
  • Port docker core to use client from registry project for v2 pushes

Reviewers

  • @dmp42
  • @dmcgowan
  • @jlhawn
  • Docker Community