From a8dd59ac444720a830e006cbf41b9b1d47f05e5a Mon Sep 17 00:00:00 2001
From: Stephen J Day <stephen.day@docker.com>
Date: Mon, 13 Jul 2015 19:40:47 -0700
Subject: [PATCH] Add goals and feature discussion to ROADMAP

Signed-off-by: Stephen J Day <stephen.day@docker.com>
---
 ROADMAP.md | 233 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 207 insertions(+), 26 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 806aaff67..cbf53881e 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -1,11 +1,17 @@
 # Roadmap
 
-The Distribution Project consists of several components, some of which are still being defined. This document defines the high-level goals of the project, identifies the current components, and defines the release-relationship to the Docker Platform.
+The Distribution Project consists of several components, some of which are
+still being defined. This document defines the high-level goals of the
+project, identifies the current components, and defines the release-
+relationship to the Docker Platform.
 
 * [Distribution Goals](#distribution-goals)
 * [Distribution Components](#distribution-components)
 * [Project Planning](#project-planning): release-relationship to the Docker Platform.
 
+This road map is a living document, providing an overview of the goals and
+considerations made in respect of the future of the project.
+
 ## Distribution Goals
 
 - Replace the existing [docker registry](github.com/docker/docker-registry)
@@ -30,41 +36,216 @@ implementation.
 
 ### Registry
 
-Registry 2.0 is the first release of the next-generation registry. This is primarily
-focused on implementing the [new registry
-API](https://github.com/docker/distribution/blob/master/docs/spec/api.md), with
-a focus on security and performance.
+The new Docker registry is the main portion of the distribution repository.
+Registry 2.0 is the first release of the next-generation registry. This was
+primarily focused on implementing the [new registry
+API](https://github.com/docker/distribution/blob/master/docs/spec/api.md),
+with a focus on security and performance. 
 
-#### Registry 2.0
+Following from the Distribution project goals above, we have a set of goals
+for registry v2 that we would like to follow in the design. New features
+should be compared against these goals.
 
-Features:
+#### Data Storage and Distribution First
 
-- Faster push and pull
-- New, more efficient implementation
-- Simplified deployment
-- Full API specification for V2 protocol
-- Pluggable storage system (s3, azure, filesystem and inmemory supported)
-- Immutable manifest references ([#46](https://github.com/docker/distribution/issues/46))
-- Webhook notification system ([#42](https://github.com/docker/distribution/issues/42))
-- Native TLS Support ([#132](https://github.com/docker/distribution/pull/132))
-- Pluggable authentication system
-- Health Checks ([#230](https://github.com/docker/distribution/pull/230))
+The registry's first goal is to provide a reliable, consistent storage
+location for Docker images. The registry should only provide the minimal
+amount of indexing required to fetch image data and no more.
 
-#### Registry 2.1
+This means we should be selective in new features and API additions, including
+those that may require expensive, ever growing indexes. Requests should be
+servable in "constant time".
 
-Planned Features:
+#### Content Addressability
 
-> **NOTE:** This feature list is incomplete at this time.
+All data objects used in the registry API should be content addressable.
+Content identifiers should be secure and verifiable. This provides a secure,
+reliable base from which to build more advanced content distribution systems.
 
-- Support for Manifest V2, Schema 2 and explicit tagging objects ([#62](https://github.com/docker/distribution/issues/62), [#173](https://github.com/docker/distribution/issues/173))
-- Mirroring ([#19](https://github.com/docker/distribution/issues/19))
-- Flexible client package based on distribution interfaces ([#193](https://github.com/docker/distribution/issues/193)
+#### Content Agnostic
 
-#### Registry 2.2
+In the past, changes to the image format would require large changes in Docker
+and the Registry. By decoupling the distribution and image format, we can
+allow the formats to progress without having to coordinate between the two.
+This means that we should be focused on decoupling Docker from the registry
+just as much as decoupling the registry from Docker. Such an approach will
+allow us to unlock new distribution models that haven't been possible before.
 
-TBD
+We can take this further by saying that the new registry should be content
+agnostic. The registry provides a model of names, tags, manifests and content
+addresses and that model can be used to work with content.
 
-***
+#### Simplicity
+
+The new registry should be closer to a microservice component than its
+predecessor. This means it should have a narrower API and a low number of
+service dependencies. It should be easy to deploy.
+
+This means that other solutions should be explored before changing the API or
+adding extra dependencies. If functionality is required, can it be added as an
+extension or companion service.
+
+#### Extensibility
+
+The registry should provide extension points to add functionality. By keeping
+the scope narrow, but providing the ability to add functionality.
+
+Features like search, indexing, synchronization and registry explorers fall
+into this category. No such feature should be added unless we've found it
+impossible to do through an extension.
+
+#### Active Feature Discussions
+
+The following are feature discussions that are currently active.
+
+If you don't see your favorite, unimplemented feature, feel free to contact us
+via IRC or the mailing list and we can talk about adding it. The goal here is
+to make sure that new features go through a rigid design process before
+landing in the registry.
+
+##### Mirroring and Pull-through Caching
+
+Mirroring and pull-through caching are related but slight different. We've
+adopted the term _mirroring_ to be a proper mirror of a registry, meaning it
+has all the content the upstream would have. Providing such mirrors in the
+Docker ecosystem is dependent on a solid trust system, which is still in the
+works.
+
+The more commonly helpful feature is _pull-through caching_, where data is
+fetched from an upstream when not available in a local registry instance.
+
+Please see the following issues:
+
+- https://github.com/docker/distribution/issues/459
+
+##### Peer to Peer transfer
+
+Discussion has started here: https://docs.google.com/document/d/1rYDpSpJiQWmCQy8Cuiaa3NH-Co33oK_SC9HeXYo87QA/edit
+
+##### Indexing, Search and Discovery
+
+The original registry provided some implementation of search for use with
+private registries. Support has been elided from V2 since we'd like to both
+decouple search functionality from the registry. The makes the registry
+simpler to deploy, especially in use cases where search is not needed, and
+let's us decouple the image format from the registry.
+
+There are explorations into using the catalog API and notification system to
+build external indexes. The current line of thought is that we will define a
+common search API to index and query docker images. Such a system could be run
+as a companion to a registry or set of registries to power discovery.
+
+The main issue with search and discovery is that there are so many ways to
+accomplish it. There are two aspects to this project. The first is deciding on
+how it will be done, including an API definition that can work with changing
+data formats. The second is the process of integrating with `docker search`.
+We expect that someone attempts to address the problem with the existing tools
+and propose it as a standard search API or uses it to inform a standardization
+process. Once this has been explored, we integrate with the docker client.
+
+Please see the following for more detail:
+
+- https://github.com/docker/distribution/issues/206
+
+##### Deletes
+
+> __NOTE:__ Deletes are a much asked for feature. Before requesting this
+feature or participating in discussion, we ask that you read this section in
+full and understand the problems behind deletes.
+
+While, at first glance, implementing deleting seems simple, there are a number
+mitigating factors that make many solutions not ideal or even pathological in
+the context of a registry. The following paragraph discuss the background and
+approaches that could be applied to a arrive at a solution.
+
+The goal of deletes in any system is to remove unused or unneeded data. Only
+data requested for deletion should be removed and no other data. Removing
+unintended data is worse than _not_ removing data that was requested for
+removal but ideally, both are supported. Generally, according to this rule, we
+err on holding data longer than needed, ensuring that it is only removed when
+we can be certain that it can be removed. With the current behavior, we opt to
+hold onto the data forever, ensuring that data cannot be incorrectly removed.
+
+To understand the problems with implementing deletes, one must understand the
+data model. All registry data is stored in a filesystem layout, implemented on
+a "storage driver", effectively a _virtual file system_ (VFS). The storage
+system must assume that this VFS layer will be eventually consistent and has
+poor read- after-write consistency, since this is the lower common denominator
+among the storage drivers. This is mitigated by writing values in reverse-
+dependent order, but makes wider transactional operations unsafe.
+
+Layered on the VFS model is a content-addressable _directed, acyclic graph_
+(DAG) made up of blobs. Manifests reference layers. Tags reference manifests.
+Since the same data can be referenced by multiple manifests, we only store
+data once, even if it is in different repositories. Thus, we have a set of
+blobs, referenced by tags and manifests. If we want to delete a blob we need
+to be certain that it is no longer referenced by another manifest or tag. When
+we delete a manifest, we also can try to delete the referenced blobs. Deciding
+whether or not a blob has an active reference is the crux of the problem.
+
+Conceptually, deleting a manifest and its resources is quite simple. Just find
+all the manifests, enumerate the referenced blobs and delete the blobs not in
+that set. An astute observer will recognize this as a garbage collection
+problem. As with garbage collection in programming languages, this is very
+simple when one always has a consistent view. When one adds parallelism and an
+inconsistent view of data, it becomes very challenging.
+
+A simple example can demonstrate this. Let's say we are deleting a manifest
+_A_ in one process. We scan the manifest and decide that all the blobs are
+ready for deletion. Concurrently, we have another process accepting a new
+manifest _B_ referencing one or more blobs from the manifest _A_. Manifest _B_
+is accepted and all the blobs are considered present, so the operation
+proceeds. The original process then deletes the referenced blobs, assuming
+they were unreferenced. The manifest _B_, which we thought had all of its data
+present, can no longer be served by the registry, since the dependent data has
+been deleted.
+
+Deleting data from the registry safely requires some way to coordinate this
+operation. The following approaches are being considered:
+
+- _Reference Counting_ - Maintain a count of references to each blob. This is
+  challenging for a number of reasons: 1. maintaining a consistent consensus
+  of reference counts across a set of Registries and 2. Building the initial
+  list of reference counts for an existing registry. These challenges can be
+  met with a consensus protocol like Paxos or Raft in the first case and a
+  necessary but simple scan in the second..
+- _Lock the World GC_ - Halt all writes to the data store. Walk the data store
+  and find all blob references. Delete all unreferenced blobs. This approach
+  is very simple but requires disabling writes for a period of time while the
+  service reads all data. This is slow and expensive but very accurate and
+  effective.
+- _Generational GC_ - Do something similar to above but instead of blocking
+  writes, writes are sent to another storage backend while reads are broadcast
+  to the new and old backends. GC is then performed on the read-only portion.
+  Because writes land in the new backend, the data in the read-only section
+  can be safely deleted. The main drawbacks of this approach are complexity
+  and coordination.
+- _Centralized Oracle_ - Using a centralized, transactional database, we can
+  know exactly which data is referenced at any given time. This avoids
+  coordination problem by managing this data in a single location. We trade
+  off metadata scalability for simplicity and performance. This is a very good
+  option for most registry deployments. This would create a bottleneck for
+  registry metadata. However, metadata is generally not the main bottleneck
+  when serving images.
+
+Please let us know if other solutions exist that we have yet to enumerate.
+Note that for any approach, implementation is a massive consideration. For
+example, a mark-sweep based solution may seem simple but the amount of work in
+coordination offset the extra work it might take to build a _Centralized
+Oracle_. We'll accept proposals for any solution but please coordinate with us
+before dropping code.
+
+At this time, we have traded off simplicity and ease of deployment for disk
+space. Simplicity and ease of deployment tend to reduce developer involvement,
+which is currently the most expensive resource in software engineering. Taking
+on any solution for deletes will greatly effect these factors, trading off
+very cheap disk space for a complex deployment and operational story.
+
+Please see the following issues for more detail:
+
+- https://github.com/docker/distribution/issues/422
+- https://github.com/docker/distribution/issues/461
+- https://github.com/docker/distribution/issues/462
 
 ### Distribution Package