Storage drivers may be able to take advantage of the hint to start
their walk more efficiently.
For S3: The API takes a start-after parameter. Registries with many
repositories can drastically reduce calls to s3 by telling s3 to only
list results lexographically after the last parameter.
For the fallback: We can start deeper in the tree and avoid statting
the files and directories before the hint in a walk. For a filesystem
this improves performance a little, but many of the API based drivers
are currently treated like a filesystem, so this drastically improves
the performance of GCP and Azure blob.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
Optimized S3 Walk impl by no longer listing files recursively. Overall gives a huge performance increase both in terms of runtime and S3 calls (up to ~500x).
Fixed a bug in WalkFallback where ErrSkipDir for was not handled as documented for non-directory.
Signed-off-by: Collin Shoop <cshoop@digitalocean.com>
It's possible to run into a race condition in which the enumerator lists
lots of repositories and then starts the long process of enumerating through
them. In that time if someone deletes a repo, the enumerator may error out.
Signed-off-by: Ryan Abrams <rdabrams@gmail.com>
Move the Walk types into registry/storage/driver, and add a Walk method to each
storage driver. Although this is yet another API to implement, there is a fall
back implementation that relies on List and Stat. For some filesystems this is
very slow.
Also, this WalkDir Method conforms better do a traditional WalkDir (a la filepath).
This change is in preparation for refactoring.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>