When uploading segments to Swift, the registry generates a random file,
by taking the hash of the container path and 32-bytes of random data.
The registry attempts to shard across multiple directory paths, by
taking the first three hex characters as leader.
The implementation in registry, unfortunately, takes the hash of
nothing, and appends it to the path and random data. This results in all
segments being created in one directory.
Fixes: #2407Fixes: #2311
Signed-off-by: Terin Stock <terinjokes@gmail.com>
Radosgw does not support S3 `GET Bucket` API v2 API but v1.
This API has backward compatibility, so most of this API is working
correctly but we can not get `KeyCount` in v1 API and which is only
for v2 API.
Signed-off-by: Eohyung Lee <liquidnuker@gmail.com>
Add an interface alongside TagStore that provides API to retreive
digests of all manifests that a tag historically pointed to. It also
includes currently linked tag.
Signed-off-by: Manish Tomar <manish.tomar@docker.com>
It's possible to run into a race condition in which the enumerator lists
lots of repositories and then starts the long process of enumerating through
them. In that time if someone deletes a repo, the enumerator may error out.
Signed-off-by: Ryan Abrams <rdabrams@gmail.com>
by having another interface RepositoryRemover that is implemented by
registry instance and is injected in app context for event tracking
Signed-off-by: Manish Tomar <manish.tomar@docker.com>
OCI Image manifests and indexes are supported both with and without
an embeded MediaType (the field is reserved according to the spec).
Test storing and retrieving both types from the manifest store.
Signed-off-by: Owen W. Taylor <otaylor@fishsoup.net>
In the OCI image specification, the MediaType field is reserved
and otherwise undefined; assume that manifests without a media
in storage are OCI images or image indexes, and determine which
by looking at what fields are in the JSON. We do keep a check
that when unmarshalling an OCI image or image index, if it has
a MediaType field, it must match that media type of the upload.
Signed-off-by: Owen W. Taylor <otaylor@fishsoup.net>
According golang documentation [1]: no goroutine should expect to be
able to acquire a read lock until the initial read lock is released.
[1] https://golang.org/pkg/sync/#RWMutex
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
at the first iteration, only the following metrics are collected:
- HTTP metrics of each API endpoint
- cache counter for request/hit/miss
- histogram of storage actions, including:
GetContent, PutContent, Stat, List, Move, and Delete
Signed-off-by: tifayuki <tifayuki@gmail.com>
This removes the old global walk function, and changes all
the code to use the per-driver walk functions.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This changes the Walk Method used for catalog enumeration. Just to show
how much an effect this has on our s3 storage:
Original:
List calls: 6839
real 3m16.636s
user 0m0.000s
sys 0m0.016s
New:
ListObjectsV2 Calls: 1805
real 0m49.970s
user 0m0.008s
sys 0m0.000s
This is because it no longer performs a list and stat per item, and instead
is able to use the metadata gained from the list as a replacement to stat.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Move the Walk types into registry/storage/driver, and add a Walk method to each
storage driver. Although this is yet another API to implement, there is a fall
back implementation that relies on List and Stat. For some filesystems this is
very slow.
Also, this WalkDir Method conforms better do a traditional WalkDir (a la filepath).
This change is in preparation for refactoring.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
If tenant or tenantid are passed as env variables, we systematically use Sprint to make sure they are string and not integer as it would make mapstructure fail.
Signed-off-by: Raphaël Enrici <raphael@root-42.com>
To simplify the vendoring story for the client, we have now removed the
requirement for `logrus` and the forked `context` package (usually
imported as `dcontext`). We inject the logger via the metrics tracker
for the blob cache and via options on the token handler. We preserve
logs on the proxy cache for that case. Clients expecting these log
messages may need to be updated accordingly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Back in the before time, the best practices surrounding usage of Context
weren't quite worked out. We defined our own type to make usage easier.
As this packaged was used elsewhere, it make it more and more
challenging to integrate with the forked `Context` type. Now that it is
available in the standard library, we can just use that one directly.
To make usage more consistent, we now use `dcontext` when referring to
the distribution context package.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Under certain circumstances, the use of `StorageDriver.GetContent` can
result in unbounded memory allocations. In particualr, this happens when
accessing a layer through the manifests endpoint.
This problem is mitigated by setting a 4MB limit when using to access
content that may have been accepted from a user. In practice, this means
setting the limit with the use of `BlobProvider.Get` by wrapping
`StorageDriver.GetContent` in a helper that uses `StorageDriver.Reader`
with a `limitReader` that returns an error.
When mitigating this security issue, we also noticed that the size of
manifests uploaded to the registry is also unlimited. We apply similar
logic to the request body of payloads that are full buffered.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
In some conditions, regulator.exit may not send a signal to blocked
regulator.enter.
Let's assume we are in the critical section of regulator.exit and r.available
is equal to 0. And there are three more gorotines. One goroutine also executes
regulator.exit and waits for the lock. Rest run regulator.enter and wait for
the signal.
We send the signal, and after releasing the lock, there will be lock
contention:
1. Wait from regulator.enter
2. Lock from regulator.exit
If the winner is Lock from regulator.exit, we will not send another signal to
unlock the second Wait.
Signed-off-by: Oleg Bulatov <obulatov@redhat.com>
A previous inspection of the code surrounding zero-length blobs led to
some interesting question. After inspection, it was found that the hash
was indeed for the empty string (""), and not an empty tar, so the code
was correct. The variable naming and comments have been updated
accordingly.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
The registry uses partial Named values which the named parsers
no longer support. To allow the registry service to continue
to operate without canonicalization, switch to use WithName.
In the future, the registry should start using fully canonical
values on the backend and WithName should no longer support
creating partial values.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
`app.driver.List` on `"/"` is very expensive if registry contains significant amount of images. And the result isn't used anyways.
In most (if not all) storage drivers, `Stat` has a cheaper implementation, so use it instead to achieve the same goal.
Signed-off-by: yixi zhang <yixi@memsql.com>
Modify manifest builder so it can be used to build
manifests with different configuration media types.
Rename config media type const to image config.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
See #2077 for background.
The PR #1438 which was not reviewed by azure folks basically introduced
a race condition around uploads to the same blob by multiple clients
concurrently as it used the "writer" type for PutContent(), introduced in #1438.
This does chunked upload of blobs using "AppendBlob" type, which was not atomic.
Usage of "writer" type and thus AppendBlobs on metadata files is currently not
concurrency-safe and generally, they are not the right type of blob for the job.
This patch fixes PutContent() to use the atomic upload operation that works
for uploads smaller than 64 MB and creates blobs with "BlockBlob" type. To be
backwards compatible, we query the type of the blob first and if it is not
a "BlockBlob" we delete the blob first before doing an atomic PUT. This
creates a small inconsistency/race window "only once". Once the blob is made
"BlockBlob", it is overwritten with a single PUT atomicallly next time.
Therefore, going forward, PutContent() will be producing BlockBlobs and it
will silently migrate the AppendBlobs introduced in #1438 to BlockBlobs with
this patch.
Tested with existing code side by side, both registries with and without this
patch work fine without breaking each other. So this should be good from a
backwards/forward compatiblity perspective, with a cost of doing an extra
HEAD checking the blob type.
Fixes#2077.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
Golint now checks for new lines at the end of go error strings,
remove these unneeded new lines.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Updating to a recent version of Azure Storage SDK to be
able to patch some memory leaks through configurable HTTP client
changes which were made possible by recent patches to it.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
The current code determines the header order for the
"string-to-sign" payload by sorting on the concatenation
of headers and values, whereas it should only happen on the
key.
During multipart uploads, since `x-amz-copy-source-range` and
`x-amz-copy-source` headers are present, V2 signatures fail to
validate since header order is swapped.
This patch reverts to the expected behavior.
Signed-off-by: Pierre-Yves Ritschard <pyr@spootnik.org>
Driver was passing connections by copying. Storing
`swift.Connection` as pointer to fix the warnings.
Ref: #2030.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
In GetContent() we read the bytes from a blob but do not close
the underlying response body.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
To allow generic manifest walking, we define an interface method of
`References` that returns the referenced items in the manifest. The
current implementation does not return the config target from schema2,
making this useless for most applications.
The garbage collector has been modified to show the utility of this
correctly formed `References` method. We may be able to make more
generic traversal methods with this, as well.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Context should use type values instead of strings.
Updated direct calls to WithValue, but still other uses of string keys.
Update Acl to ACL in s3 driver.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
The Redis tests were failing with a "connection pool exhausted" error
from Redigo. Closing the connection used for FLUSHDB fixes the problem.
Signed-off-by: Noah Treuhaft <noah.treuhaft@docker.com>
This change to the S3 Move method uses S3's multipart upload API to copy
objects whose size exceeds a threshold. Parts are copied concurrently.
The level of concurrency, part size, and threshold are all configurable
with reasonable defaults.
Using the multipart upload API has two benefits.
* The S3 Move method can now handle objects over 5 GB, fixing #886.
* Moving most objects, and espectially large ones, is faster. For
example, moving a 1 GB object averaged 30 seconds but now averages 10.
Signed-off-by: Noah Treuhaft <noah.treuhaft@docker.com>
This is already supported by ncw/swift, so we just need to pass the
parameters from the storage driver.
Signed-off-by: Stefan Majewsky <stefan.majewsky@sap.com>
Use the much faster math/rand.Read function where cryptographic
guarantees are not required. The unit test suite should speed up a
little bit but we've already optimized around this, so it may not
matter.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Previous component-wise path comparison is recursive and generates a
large amount of garbage. This more efficient version simply replaces the
path comparison with the zero-value to sort before everything. We do
this by replacing the byte-wise comparison that swaps a single character
inline for the separator comparison, such that separators sort first.
The resulting implementation provides component-wise path comparison
with no cost incurred for allocation or stack frame.
Direction of the comparison is also reversed to match Go style.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
* Allow precomputed stats on cross-mounted blobs
Signed-off-by: Michal Minář <miminar@redhat.com>
* Extended cross-repo mount tests
Signed-off-by: Michal Minář <miminar@redhat.com>
* Add Object ACL Support to the S3 Storage Backend
Signed-off-by: Frank Chen <frankchn@gmail.com>
* Made changes per @RichardScothern's comments
Signed-off-by: Frank Chen <frankchn@gmail.com>
* Fix Typos
Signed-off-by: Frank Chen <frankchn@gmail.com>
Pass the manifestURL directly into the schema2 manifest handler instead of
accessing through the repository as it has since the reference is now an
interface.
Signed-off-by: Richard Scothern <richard.scothern@docker.com>
Until we have some experience hosting foreign layer manifests, the Hub
operators wish to limit foreign layers on Hub. To that end, this change
adds registry configuration options to restrict the URLs that may appear
in pushed manifests.
Signed-off-by: Noah Treuhaft <noah.treuhaft@docker.com>
This fixes errors other than io.EOF from being dropped when a storage driver
lists repositories. For example, filesystem driver may point to a missing
directory and errors, which then gets subsequently dropped.
Signed-off-by: Edgar Lee <edgar.lee@docker.com>
This is similar to waitForSegmentsToShowUp which is called during
Close/Commit. Intuitively, you wouldn't expect missing segments to be a
problem during read operations, since the previous Close/Commit
confirmed that all segments are there.
But due to the distributed nature of Swift, the read request could be
hitting a different storage node of the Swift cluster, where the
segments are still missing.
Load tests on my team's staging Swift cluster have shown this to occur
about once every 100-200 layer uploads when the Swift proxies are under
high load. The retry logic, borrowed from waitForSegmentsToShowUp, fixes
this temporary inconsistency.
Signed-off-by: Stefan Majewsky <stefan.majewsky@sap.com>
This commit refactors base.regulator into the 2.4 interfaces and adds a
filesystem configuration option `maxthreads` to configure the regulator.
By default `maxthreads` is set to 100. This means the FS driver is
limited to 100 concurrent blocking file operations. Any subsequent
operations will block in Go until previous filesystem operations
complete.
This ensures that the registry can never open thousands of simultaneous
threads from os filesystem operations.
Note that `maxthreads` can never be less than 25.
Add test case covering parsable string maxthreads
Signed-off-by: Tony Holdstock-Brown <tony@docker.com>
subsequent close.
When a blob upload is cancelled close the blobwriter before removing
upload state to ensure old hashstates don't persist.
Signed-off-by: Richard Scothern <richard.scothern@docker.com>
It's easily possible for a flood of requests to trigger thousands of
concurrent file accesses on the storage driver. Each file I/O call creates
a new OS thread that is not reaped by the Golang runtime. By limiting it
to only 100 at a time we can effectively bound the number of OS threads
in use by the storage driver.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Signed-off-by: Tony Holdstock-Brown <tony@docker.com>
Not just when Commit()ing the result. This fixes some errors I observed
when the layer (i.e. the DLO) is Stat()ed immediately after closing,
and reports the wrong file size because the container listing is not
yet up-to-date.
Signed-off-by: Stefan Majewsky <stefan.majewsky@sap.com>
If a schema 1 manifest is uploaded with the `disablesignaturestore` option set
to true, then no signatures will exist. Handle this case.
If a schema 1 manifest is pushed, deleted, garbage collected and pushed again, the
repository will contain signature links from the first version, but the blobs will
not exist. Disable the signature store in the garbage-collect command so
signatures are not fetched.
Signed-off-by: Richard Scothern <richard.scothern@docker.com>
In 326c3a9c49, which was only intended to
be a refactoring commit, the behavior of this block subtly changed so
that unknown types of errors would be swallowed instead of propagated.
I noticed this while investigating an error similar to #1539 aka
docker/docker#21290. It appears that during GetContent() for a
hashstate, the Swift proxy produces an error. Since this error was
silently swallowed, an empty []byte is used to restart the hash, then
producing the digest of the empty string instead of the layer's digest.
This PR will not fix the issue, but it should make the actual error more
visible by propagating it into `blobWriter#resumeDigest' and
'blobWriter#validateBlob', respectively.
Signed-off-by: Stefan Majewsky <stefan.majewsky@sap.com>
This commit adds context-specific documentation on StorageDriver,
StorageDriverFactory, and the factory’s Register func, explaining how
the internal registration mechanism should be used.
This documentation follows from the thread starting at
https://github.com/deis/builder/pull/262/files#r56720200.
cc/ @stevvooe
Signed-off-by: Aaron Schlesinger <aschlesinger@deis.com>
Updates registry storage code to use this for better resumable writes.
Implements this interface for the following drivers:
+ Inmemory
+ Filesystem
+ S3
+ Azure
Signed-off-by: Brian Bland <brian.bland@docker.com>
The Move operation is only used to move uploaded blobs
to their final destination. There is no point in implementing
Move on "folders". Apart from simplifying the code, this also
saves an HTTP request.
Signed-off-by: Arthur Baars <arthur@semmle.com>
- Includes a change in the command to run the registry. The registry
server itself is now started up as a subcommand.
- Includes changes to the high level interfaces to support enumeration
of various registry objects.
Signed-off-by: Andrew T Nguyen <andrew.nguyen@docker.com>