This commit cleans up and attempts to optimise the performance of image push in S3 driver.
There are 2 main changes:
* we refactor the S3 driver Writer where instead of using separate bytes
slices for ready and pending parts which get constantly appended data
into them causing unnecessary allocations we use optimised bytes
buffers; we make sure these are used efficiently when written to.
* we introduce a memory pool that is used for allocating the byte
buffers introduced above
These changes should alleviate high memory pressure on the push path to S3.
Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
In case drvr.PutContent fails and returns error we'd have
some extra memory allocated, though in this case
(test with known size of the slice being iterated), that's fine.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Only some of the S3 storage driver calls were propagating context to the
S3 API calls. This commit updates the S3 storage drivers so the context
is propagated to all the S3 API calls.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Storage drivers may be able to take advantage of the hint to start
their walk more efficiently.
For S3: The API takes a start-after parameter. Registries with many
repositories can drastically reduce calls to s3 by telling s3 to only
list results lexographically after the last parameter.
For the fallback: We can start deeper in the tree and avoid statting
the files and directories before the hint in a walk. For a filesystem
this improves performance a little, but many of the API based drivers
are currently treated like a filesystem, so this drastically improves
the performance of GCP and Azure blob.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
This commit removes `oss` storage driver from distribution as well as
`alicdn` storage middleware which only works with the `oss` driver.
There are several reasons for it:
* no real-life expertise among the maintainers
* oss is compatible with S3 API operations required by S3 storage driver
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
The Azure tests fail if there is no Azure configuration available,
instead they should be skipped.
Also, one of the Azure tests is wrong and doesn't match the code.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
Other storage drivers will only return children and below, s3 should do
the same. The only reason it was returning was because of the addition
of a / to ensure we treat the from as a directory.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
This test will only work on an s3 bucket on an s3 outpost. Most
developers won't have access to one of these.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
If we haven't set a storage class there's no point in checking the
storage class applied to the object - s3 will choose one.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
This commit removes swift storage driver from distribution.
There are several reasons for it:
* no real life expertise among the maintainers
* swift is compatible with S3 API operations required by S3 storage driver
This will also remove depedencies that are also hard to keep up with.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
This enables go build tags so the GCS and OSS driver support is
available in the binary distributed via the image build by Dockerfile.
This led to quite a few fixes in the GCS and OSS packages raised as
warning by golang-ci linter.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Something seems broken on azure/azure sdk side - it is currently not
possible to copy a blob of type AppendBlob using `CopyFromURL`.
Using the AppendBlob client via NewAppendBlobClient does not work
either.
According to Azure the correct way to do this is by using
StartCopyFromURL. Because this is an async operation, we need to do
polling ourselves. A simple backoff mechanism is used, where during each
iteration, the configured delay is multiplied by the retry number.
Also introduces two new config options for the Azure driver:
copy_status_poll_max_retry, and copy_status_poll_delay.
Signed-off-by: Flavian Missi <fmissi@redhat.com>
both oss and gcs driver were missing the context parameter that is
required to satisfy the storagedriver.FileWriter interface.
Signed-off-by: Flavian Missi <fmissi@redhat.com>
Stat(ctx, "/") is called by the registry healthcheck.
Also fixes blob name building in the Azure driver so it no longer
returns empty blob names. This was causing errors in the healthcheck
call to Stat for Azure.
Signed-off-by: Flavian Missi <fmissi@redhat.com>
Dot-imports were only used in a couple of places, and replacing them
makes it more explicit what's imported.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Microsoft has updated the golang Azure SDK significantly. Update the
azure storage driver to use the new SDK. Add support for client
secret and MSI authentication schemes in addition to shared key
authentication.
Implement rootDirectory support for the azure storage driver to mirror
the S3 driver.
Signed-off-by: Kirat Singh <kirat.singh@beacon.io>
Co-authored-by: Cory Snider <corhere@gmail.com>
This is an edge case when we are trying to upload an empty chunk of data using
a MultiPart upload. As a result we are trying to complete the MultipartUpload
with an empty slice of `completedUploadedParts` which will always lead to 400
being returned from S3 See: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#CompletedMultipartUpload
Solution: we upload an empty i.e. 0 byte part as a single part and then append it
to the completedUploadedParts slice used to complete the Multipart upload.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
The loop that iterates over paginated lists of S3 multipart upload parts
appears to be using the wrong variable in its loop condition. Nothing
inside the loop affects the value of `resp.IsTruncated`, so this loop
will either be wrongly skipped or loop forever.
It looks like this is a regression caused by commit
7736319f2e. The return value of
`ListMultipartUploads` used to be assigned to a variable named `resp`,
but it was renamed to `partsList` without updating the for loop
condition.
I believe this is causing an error we're seeing with large layer uploads
at commit time:
upload resumed at wrong offset: 5242880000 != 5815706782
Missing parts of the multipart S3 upload would cause an incorrect size
calculation in `newWriter`.
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Previously we used a custom Transport in order to modify the user agent header.
This prevented the AWS SDK from being able to customize SSL and other client TLS
parameters since it could not understand the Transport type.
Instead we can simply use the SDK function MakeAddToUserAgentFreeFormHandler to
customize the UserAgent if necessary and leave all the TLS configuration to the
AWS SDK.
The only exception being SkipVerify which we have to handle, but we can set it
onto the standard http.Transport which does not interfere with the SDKs ability
to set other options.
Signed-off-by: Kirat Singh <kirat.singh@gmail.com>
`registry/storage/driver/inmemory/driver_test.go` times out after ~10min. The slow test is `testsuites.go:TestWriteReadLargeStreams()` which writes a 5GB file.
Root cause is inefficient slice reallocation algorithm. The slice holding file bytes grows only 32K on each allocation. To fix it, this PR grows slice exponentially.
Signed-off-by: Wei Meng <wemeng@microsoft.com>