This commit updates (writer).Writer() method in S3 storage driver to
handle the case where an append is attempted to a zer-size content.
S3 does not allow appending to already committed content, so we are
optiing to provide the following case as a narrowed down behaviour:
Writer can only append to zero byte content - in that case, a new S3
MultipartUpload is created that will be used for overriding the already
committed zero size content.
Appending to non-zero size content fails with error.
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Several storage drivers and storage middlewares need to introspect the
client HTTP request in order to construct content-redirect URLs. The
request is indirectly passed into the driver interface method URLFor()
through the context argument, which is bad practice. The request should
be passed in as an explicit argument as the method is only called from
request handlers.
Replace the URLFor() method with a RedirectURL() method which takes an
HTTP request as a parameter instead of a context. Drop the options
argument from URLFor() as in practice it only ever encoded the request
method, which can now be fetched directly from the request. No URLFor()
callers ever passed in an "expiry" option, either.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Our context package predates the establishment of current best practices
regarding context usage and it shows. It encourages bad practices such
as using contexts to propagate non-request-scoped values like the
application version and using string-typed keys for context values. Move
the package internal to remove it from the API surface of
distribution/v3@v3.0.0 so we are free to iterate on it without being
constrained by compatibility.
Signed-off-by: Cory Snider <csnider@mirantis.com>
This commit make the S3 driver chunk size constants more straightforward
to understand -- instead of remembering the bit shifts we make this more
explicit.
We are also updating append parameter to the `(writer).Write` to follow
the new convention we are trying to establish.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
This commit changes storagedriver.Filewriter interface
by adding context.Context as an argument to its Commit
func.
We pass the context appropriately where need be throughout
the distribution codebase to all the writers and tests.
S3 driver writer unfortunately must maintain the context
passed down to it from upstream so it contnues to
implement io.Writer and io.Closer interfaces which do not
allow accepting the context in any of their funcs.
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
This commit cleans up and attempts to optimise the performance of image push in S3 driver.
There are 2 main changes:
* we refactor the S3 driver Writer where instead of using separate bytes
slices for ready and pending parts which get constantly appended data
into them causing unnecessary allocations we use optimised bytes
buffers; we make sure these are used efficiently when written to.
* we introduce a memory pool that is used for allocating the byte
buffers introduced above
These changes should alleviate high memory pressure on the push path to S3.
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Only some of the S3 storage driver calls were propagating context to the
S3 API calls. This commit updates the S3 storage drivers so the context
is propagated to all the S3 API calls.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Storage drivers may be able to take advantage of the hint to start
their walk more efficiently.
For S3: The API takes a start-after parameter. Registries with many
repositories can drastically reduce calls to s3 by telling s3 to only
list results lexographically after the last parameter.
For the fallback: We can start deeper in the tree and avoid statting
the files and directories before the hint in a walk. For a filesystem
this improves performance a little, but many of the API based drivers
are currently treated like a filesystem, so this drastically improves
the performance of GCP and Azure blob.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
Other storage drivers will only return children and below, s3 should do
the same. The only reason it was returning was because of the addition
of a / to ensure we treat the from as a directory.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
Microsoft has updated the golang Azure SDK significantly. Update the
azure storage driver to use the new SDK. Add support for client
secret and MSI authentication schemes in addition to shared key
authentication.
Implement rootDirectory support for the azure storage driver to mirror
the S3 driver.
Signed-off-by: Kirat Singh <kirat.singh@beacon.io>
Co-authored-by: Cory Snider <corhere@gmail.com>
This is an edge case when we are trying to upload an empty chunk of data using
a MultiPart upload. As a result we are trying to complete the MultipartUpload
with an empty slice of `completedUploadedParts` which will always lead to 400
being returned from S3 See: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#CompletedMultipartUpload
Solution: we upload an empty i.e. 0 byte part as a single part and then append it
to the completedUploadedParts slice used to complete the Multipart upload.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
The loop that iterates over paginated lists of S3 multipart upload parts
appears to be using the wrong variable in its loop condition. Nothing
inside the loop affects the value of `resp.IsTruncated`, so this loop
will either be wrongly skipped or loop forever.
It looks like this is a regression caused by commit
7736319f2e. The return value of
`ListMultipartUploads` used to be assigned to a variable named `resp`,
but it was renamed to `partsList` without updating the for loop
condition.
I believe this is causing an error we're seeing with large layer uploads
at commit time:
upload resumed at wrong offset: 5242880000 != 5815706782
Missing parts of the multipart S3 upload would cause an incorrect size
calculation in `newWriter`.
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Previously we used a custom Transport in order to modify the user agent header.
This prevented the AWS SDK from being able to customize SSL and other client TLS
parameters since it could not understand the Transport type.
Instead we can simply use the SDK function MakeAddToUserAgentFreeFormHandler to
customize the UserAgent if necessary and leave all the TLS configuration to the
AWS SDK.
The only exception being SkipVerify which we have to handle, but we can set it
onto the standard http.Transport which does not interfere with the SDKs ability
to set other options.
Signed-off-by: Kirat Singh <kirat.singh@gmail.com>
gofumpt (https://github.com/mvdan/gofumpt) provides a supserset of `gofmt` / `go fmt`,
and addresses various formatting issues that linters may be checking for.
We can consider enabling the `gofumpt` linter to verify the formatting in CI, although
not every developer may have it installed, so for now this runs it once to get formatting
in shape.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Instead of first collecting all keys and then batch deleting them,
we will do the incremental delete _online_ per max allowed batch.
Doing this prevents frequent allocations for large S3 keyspaces
and OOM-kills that might happen as a result of those.
This commit introduces storagedriver.Errors type that allows to return
multierrors as a single error from any storage driver implementation.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
If s3accelerate is set to true then we turn on S3 Transfer
Acceleration via the AWS SDK. It defaults to false since this is an
opt-in feature on the S3 bucket.
Signed-off-by: Kirat Singh <kirat.singh@wsq.io>
Signed-off-by: Simone Locci <simonelocci88@gmail.com>
Allow the storage driver to optionally use AWS SDK's dualstack mode.
This allows the registry to communicate with S3 in IPv6 environments.
Signed-off-by: Adam Kaplan <adam.kaplan@redhat.com>
Optimized S3 Walk impl by no longer listing files recursively. Overall gives a huge performance increase both in terms of runtime and S3 calls (up to ~500x).
Fixed a bug in WalkFallback where ErrSkipDir for was not handled as documented for non-directory.
Signed-off-by: Collin Shoop <cshoop@digitalocean.com>
Delete was not working when the subpath immediately followed the given path started with an ascii lower than "/" such as dash "-" and underscore "_" and requests no files to be deleted.
(cherry picked from commit 5d8fa0ce94b68cce70237805db92cdd8d40de282)
Signed-off-by: Collin Shoop <cshoop@digitalocean.com>
Go 1.13 and up enforce import paths to be versioned if a project
contains a go.mod and has released v2 or up.
The current v2.x branches (and releases) do not yet have a go.mod,
and therefore are still allowed to be imported with a non-versioned
import path (go modules add a `+incompatible` annotation in that case).
However, now that this project has a `go.mod` file, incompatible
import paths will not be accepted by go modules, and attempting
to use code from this repository will fail.
This patch uses `v3` for the import-paths (not `v2`), because changing
import paths itself is a breaking change, which means that the
next release should increment the "major" version to comply with
SemVer (as go modules dictate).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
When a given prefix is empty and we attempt to list its content AWS
returns that the prefix contains one object with key defined as the
prefix with an extra "/" at the end.
e.g.
If we call ListObjects() passing to it an existing but empty prefix,
say "my/empty/prefix", AWS will return that "my/empty/prefix/" is an
object inside "my/empty/prefix" (ListObjectsOutput.Contents).
This extra "/" causes the upload purging process to panic. On normal
circunstances we never find empty prefixes on S3 but users may touch
it.
Signed-off-by: Ricardo Maraschini <rmarasch@redhat.com>