2014-11-18 00:29:42 +00:00
|
|
|
package storage
|
|
|
|
|
|
|
|
import (
|
2015-02-02 21:01:49 +00:00
|
|
|
"fmt"
|
2015-01-05 07:59:29 +00:00
|
|
|
"io"
|
2015-03-24 17:35:01 +00:00
|
|
|
"os"
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
"path"
|
2015-03-24 17:35:01 +00:00
|
|
|
"strconv"
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
"time"
|
2014-11-18 00:29:42 +00:00
|
|
|
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
"github.com/Sirupsen/logrus"
|
2015-02-12 00:49:49 +00:00
|
|
|
"github.com/docker/distribution"
|
2015-02-09 22:44:58 +00:00
|
|
|
ctxu "github.com/docker/distribution/context"
|
2014-12-24 00:01:38 +00:00
|
|
|
"github.com/docker/distribution/digest"
|
2015-02-11 02:14:23 +00:00
|
|
|
storagedriver "github.com/docker/distribution/registry/storage/driver"
|
2014-11-18 00:29:42 +00:00
|
|
|
)
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
var _ distribution.LayerUpload = &layerWriter{}
|
|
|
|
|
|
|
|
// layerWriter is used to control the various aspects of resumable
|
2014-11-18 00:29:42 +00:00
|
|
|
// layer upload. It implements the LayerUpload interface.
|
2015-03-03 16:57:52 +00:00
|
|
|
type layerWriter struct {
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
layerStore *layerStore
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
uuid string
|
|
|
|
startedAt time.Time
|
|
|
|
resumableDigester digest.ResumableDigester
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
// implementes io.WriteSeeker, io.ReaderFrom and io.Closer to satisfy
|
2015-03-03 22:47:07 +00:00
|
|
|
// LayerUpload Interface
|
|
|
|
bufferedFileWriter
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
var _ distribution.LayerUpload = &layerWriter{}
|
2014-11-18 00:29:42 +00:00
|
|
|
|
|
|
|
// UUID returns the identifier for this upload.
|
2015-03-03 16:57:52 +00:00
|
|
|
func (lw *layerWriter) UUID() string {
|
|
|
|
return lw.uuid
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
func (lw *layerWriter) StartedAt() time.Time {
|
|
|
|
return lw.startedAt
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Finish marks the upload as completed, returning a valid handle to the
|
|
|
|
// uploaded layer. The final size and checksum are validated against the
|
|
|
|
// contents of the uploaded layer. The checksum should be provided in the
|
|
|
|
// format <algorithm>:<hex digest>.
|
2015-04-21 01:43:19 +00:00
|
|
|
func (lw *layerWriter) Finish(dgst digest.Digest) (distribution.Layer, error) {
|
2015-03-03 16:57:52 +00:00
|
|
|
ctxu.GetLogger(lw.layerStore.repository.ctx).Debug("(*layerWriter).Finish")
|
2015-03-03 22:47:07 +00:00
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
if err := lw.bufferedFileWriter.Close(); err != nil {
|
2015-03-03 22:47:07 +00:00
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
2015-04-21 01:43:19 +00:00
|
|
|
var (
|
|
|
|
canonical digest.Digest
|
|
|
|
err error
|
|
|
|
)
|
|
|
|
|
|
|
|
// HACK(stevvooe): To deal with s3's lack of consistency, attempt to retry
|
|
|
|
// validation on failure. Three attempts are made, backing off 100ms each
|
|
|
|
// time.
|
|
|
|
for retries := 0; ; retries++ {
|
|
|
|
canonical, err = lw.validateLayer(dgst)
|
|
|
|
if err == nil {
|
|
|
|
break
|
|
|
|
}
|
|
|
|
|
|
|
|
ctxu.GetLoggerWithField(lw.layerStore.repository.ctx, "retries", retries).
|
|
|
|
Errorf("error validating layer: %v", err)
|
|
|
|
|
|
|
|
if retries < 3 {
|
|
|
|
time.Sleep(100 * time.Millisecond)
|
|
|
|
continue
|
|
|
|
}
|
|
|
|
|
2014-11-18 00:29:42 +00:00
|
|
|
return nil, err
|
2015-04-21 01:43:19 +00:00
|
|
|
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
if err := lw.moveLayer(canonical); err != nil {
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
// TODO(stevvooe): Cleanup?
|
2014-11-18 00:29:42 +00:00
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
// Link the layer blob into the repository.
|
2015-04-21 01:43:19 +00:00
|
|
|
if err := lw.linkLayer(canonical, dgst); err != nil {
|
2014-11-18 00:29:42 +00:00
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
if err := lw.removeResources(); err != nil {
|
2014-11-18 00:29:42 +00:00
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
return lw.layerStore.Fetch(canonical)
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Cancel the layer upload process.
|
2015-03-03 16:57:52 +00:00
|
|
|
func (lw *layerWriter) Cancel() error {
|
|
|
|
ctxu.GetLogger(lw.layerStore.repository.ctx).Debug("(*layerWriter).Cancel")
|
|
|
|
if err := lw.removeResources(); err != nil {
|
2014-11-18 00:29:42 +00:00
|
|
|
return err
|
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
lw.Close()
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
return nil
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
func (lw *layerWriter) Write(p []byte) (int, error) {
|
2015-04-16 01:12:45 +00:00
|
|
|
if lw.resumableDigester == nil {
|
|
|
|
return lw.bufferedFileWriter.Write(p)
|
|
|
|
}
|
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
// Ensure that the current write offset matches how many bytes have been
|
|
|
|
// written to the digester. If not, we need to update the digest state to
|
|
|
|
// match the current write position.
|
|
|
|
if err := lw.resumeHashAt(lw.offset); err != nil {
|
|
|
|
return 0, err
|
|
|
|
}
|
|
|
|
|
|
|
|
return io.MultiWriter(&lw.bufferedFileWriter, lw.resumableDigester).Write(p)
|
|
|
|
}
|
|
|
|
|
|
|
|
func (lw *layerWriter) ReadFrom(r io.Reader) (n int64, err error) {
|
2015-04-16 01:12:45 +00:00
|
|
|
if lw.resumableDigester == nil {
|
|
|
|
return lw.bufferedFileWriter.ReadFrom(r)
|
|
|
|
}
|
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
// Ensure that the current write offset matches how many bytes have been
|
|
|
|
// written to the digester. If not, we need to update the digest state to
|
|
|
|
// match the current write position.
|
|
|
|
if err := lw.resumeHashAt(lw.offset); err != nil {
|
|
|
|
return 0, err
|
|
|
|
}
|
|
|
|
|
|
|
|
return lw.bufferedFileWriter.ReadFrom(io.TeeReader(r, lw.resumableDigester))
|
|
|
|
}
|
|
|
|
|
|
|
|
func (lw *layerWriter) Close() error {
|
2015-04-10 22:56:29 +00:00
|
|
|
if lw.err != nil {
|
|
|
|
return lw.err
|
|
|
|
}
|
|
|
|
|
2015-04-16 01:12:45 +00:00
|
|
|
if lw.resumableDigester != nil {
|
|
|
|
if err := lw.storeHashState(); err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2015-03-24 17:35:01 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return lw.bufferedFileWriter.Close()
|
|
|
|
}
|
|
|
|
|
|
|
|
type hashStateEntry struct {
|
|
|
|
offset int64
|
|
|
|
path string
|
|
|
|
}
|
|
|
|
|
|
|
|
// getStoredHashStates returns a slice of hashStateEntries for this upload.
|
|
|
|
func (lw *layerWriter) getStoredHashStates() ([]hashStateEntry, error) {
|
|
|
|
uploadHashStatePathPrefix, err := lw.layerStore.repository.registry.pm.path(uploadHashStatePathSpec{
|
|
|
|
name: lw.layerStore.repository.Name(),
|
|
|
|
uuid: lw.uuid,
|
|
|
|
alg: lw.resumableDigester.Digest().Algorithm(),
|
|
|
|
list: true,
|
|
|
|
})
|
2015-03-10 21:40:58 +00:00
|
|
|
if err != nil {
|
2015-03-24 17:35:01 +00:00
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
|
|
|
|
paths, err := lw.driver.List(uploadHashStatePathPrefix)
|
|
|
|
if err != nil {
|
|
|
|
if _, ok := err.(storagedriver.PathNotFoundError); !ok {
|
|
|
|
return nil, err
|
|
|
|
}
|
|
|
|
// Treat PathNotFoundError as no entries.
|
|
|
|
paths = nil
|
|
|
|
}
|
|
|
|
|
|
|
|
hashStateEntries := make([]hashStateEntry, 0, len(paths))
|
|
|
|
|
|
|
|
for _, p := range paths {
|
|
|
|
pathSuffix := path.Base(p)
|
|
|
|
// The suffix should be the offset.
|
|
|
|
offset, err := strconv.ParseInt(pathSuffix, 0, 64)
|
|
|
|
if err != nil {
|
|
|
|
logrus.Errorf("unable to parse offset from upload state path %q: %s", p, err)
|
|
|
|
}
|
|
|
|
|
|
|
|
hashStateEntries = append(hashStateEntries, hashStateEntry{offset: offset, path: p})
|
2015-03-10 21:40:58 +00:00
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
return hashStateEntries, nil
|
|
|
|
}
|
|
|
|
|
|
|
|
// resumeHashAt attempts to restore the state of the internal hash function
|
|
|
|
// by loading the most recent saved hash state less than or equal to the given
|
|
|
|
// offset. Any unhashed bytes remaining less than the given offset are hashed
|
|
|
|
// from the content uploaded so far.
|
|
|
|
func (lw *layerWriter) resumeHashAt(offset int64) error {
|
|
|
|
if offset < 0 {
|
|
|
|
return fmt.Errorf("cannot resume hash at negative offset: %d", offset)
|
|
|
|
}
|
|
|
|
|
|
|
|
if offset == int64(lw.resumableDigester.Len()) {
|
2015-04-17 12:39:52 +00:00
|
|
|
// State of digester is already at the requested offset.
|
2015-03-24 17:35:01 +00:00
|
|
|
return nil
|
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
// List hash states from storage backend.
|
|
|
|
var hashStateMatch hashStateEntry
|
|
|
|
hashStates, err := lw.getStoredHashStates()
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
if err != nil {
|
2015-03-24 17:35:01 +00:00
|
|
|
return fmt.Errorf("unable to get stored hash states with offset %d: %s", offset, err)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Find the highest stored hashState with offset less than or equal to
|
|
|
|
// the requested offset.
|
|
|
|
for _, hashState := range hashStates {
|
|
|
|
if hashState.offset == offset {
|
|
|
|
hashStateMatch = hashState
|
|
|
|
break // Found an exact offset match.
|
|
|
|
} else if hashState.offset < offset && hashState.offset > hashStateMatch.offset {
|
|
|
|
// This offset is closer to the requested offset.
|
|
|
|
hashStateMatch = hashState
|
|
|
|
} else if hashState.offset > offset {
|
|
|
|
// Remove any stored hash state with offsets higher than this one
|
|
|
|
// as writes to this resumed hasher will make those invalid. This
|
|
|
|
// is probably okay to skip for now since we don't expect anyone to
|
|
|
|
// use the API in this way. For that reason, we don't treat an
|
|
|
|
// an error here as a fatal error, but only log it.
|
|
|
|
if err := lw.driver.Delete(hashState.path); err != nil {
|
|
|
|
logrus.Errorf("unable to delete stale hash state %q: %s", hashState.path, err)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if hashStateMatch.offset == 0 {
|
|
|
|
// No need to load any state, just reset the hasher.
|
|
|
|
lw.resumableDigester.Reset()
|
|
|
|
} else {
|
|
|
|
storedState, err := lw.driver.GetContent(hashStateMatch.path)
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
|
|
|
|
if err = lw.resumableDigester.Restore(storedState); err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Mind the gap.
|
|
|
|
if gapLen := offset - int64(lw.resumableDigester.Len()); gapLen > 0 {
|
|
|
|
// Need to read content from the upload to catch up to the desired
|
|
|
|
// offset.
|
|
|
|
fr, err := newFileReader(lw.driver, lw.path)
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
|
|
|
|
if _, err = fr.Seek(int64(lw.resumableDigester.Len()), os.SEEK_SET); err != nil {
|
|
|
|
return fmt.Errorf("unable to seek to layer reader offset %d: %s", lw.resumableDigester.Len(), err)
|
|
|
|
}
|
|
|
|
|
|
|
|
if _, err := io.CopyN(lw.resumableDigester, fr, gapLen); err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
return nil
|
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
func (lw *layerWriter) storeHashState() error {
|
|
|
|
uploadHashStatePath, err := lw.layerStore.repository.registry.pm.path(uploadHashStatePathSpec{
|
|
|
|
name: lw.layerStore.repository.Name(),
|
|
|
|
uuid: lw.uuid,
|
|
|
|
alg: lw.resumableDigester.Digest().Algorithm(),
|
|
|
|
offset: int64(lw.resumableDigester.Len()),
|
|
|
|
})
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-24 17:35:01 +00:00
|
|
|
hashState, err := lw.resumableDigester.State()
|
2014-11-19 22:39:32 +00:00
|
|
|
if err != nil {
|
2015-03-24 17:35:01 +00:00
|
|
|
return err
|
|
|
|
}
|
|
|
|
|
|
|
|
return lw.driver.PutContent(uploadHashStatePath, hashState)
|
|
|
|
}
|
|
|
|
|
|
|
|
// validateLayer checks the layer data against the digest, returning an error
|
|
|
|
// if it does not match. The canonical digest is returned.
|
|
|
|
func (lw *layerWriter) validateLayer(dgst digest.Digest) (digest.Digest, error) {
|
2015-04-16 01:12:45 +00:00
|
|
|
var (
|
|
|
|
verified, fullHash bool
|
|
|
|
canonical digest.Digest
|
|
|
|
)
|
|
|
|
|
|
|
|
if lw.resumableDigester != nil {
|
|
|
|
// Restore the hasher state to the end of the upload.
|
|
|
|
if err := lw.resumeHashAt(lw.size); err != nil {
|
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
|
|
|
|
canonical = lw.resumableDigester.Digest()
|
|
|
|
|
|
|
|
if canonical.Algorithm() == dgst.Algorithm() {
|
|
|
|
// Common case: client and server prefer the same canonical digest
|
|
|
|
// algorithm - currently SHA256.
|
|
|
|
verified = dgst == canonical
|
|
|
|
} else {
|
|
|
|
// The client wants to use a different digest algorithm. They'll just
|
|
|
|
// have to be patient and wait for us to download and re-hash the
|
|
|
|
// uploaded content using that digest algorithm.
|
|
|
|
fullHash = true
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// Not using resumable digests, so we need to hash the entire layer.
|
|
|
|
fullHash = true
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-04-16 01:12:45 +00:00
|
|
|
if fullHash {
|
|
|
|
digester := digest.NewCanonicalDigester()
|
2015-03-24 17:35:01 +00:00
|
|
|
|
|
|
|
digestVerifier, err := digest.NewDigestVerifier(dgst)
|
|
|
|
if err != nil {
|
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
|
|
|
|
// Read the file from the backend driver and validate it.
|
|
|
|
fr, err := newFileReader(lw.bufferedFileWriter.driver, lw.path)
|
|
|
|
if err != nil {
|
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
|
2015-04-16 01:12:45 +00:00
|
|
|
tr := io.TeeReader(fr, digester)
|
|
|
|
|
|
|
|
if _, err = io.Copy(digestVerifier, tr); err != nil {
|
2015-03-24 17:35:01 +00:00
|
|
|
return "", err
|
|
|
|
}
|
|
|
|
|
2015-04-16 01:12:45 +00:00
|
|
|
canonical = digester.Digest()
|
2015-03-24 17:35:01 +00:00
|
|
|
verified = digestVerifier.Verified()
|
|
|
|
}
|
|
|
|
|
|
|
|
if !verified {
|
2015-04-21 18:34:18 +00:00
|
|
|
ctxu.GetLoggerWithField(lw.layerStore.repository.ctx, "canonical", dgst).
|
|
|
|
Errorf("canonical digest does match provided digest")
|
2015-02-12 00:49:49 +00:00
|
|
|
return "", distribution.ErrLayerInvalidDigest{
|
2015-02-02 21:01:49 +00:00
|
|
|
Digest: dgst,
|
|
|
|
Reason: fmt.Errorf("content does not match digest"),
|
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
return canonical, nil
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
// moveLayer moves the data into its final, hash-qualified destination,
|
2014-12-05 04:55:59 +00:00
|
|
|
// identified by dgst. The layer should be validated before commencing the
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
// move.
|
2015-03-03 16:57:52 +00:00
|
|
|
func (lw *layerWriter) moveLayer(dgst digest.Digest) error {
|
|
|
|
blobPath, err := lw.layerStore.repository.registry.pm.path(blobDataPathSpec{
|
2014-11-25 00:21:02 +00:00
|
|
|
digest: dgst,
|
2014-11-18 00:29:42 +00:00
|
|
|
})
|
|
|
|
|
|
|
|
if err != nil {
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
return err
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Check for existence
|
2015-03-03 16:57:52 +00:00
|
|
|
if _, err := lw.driver.Stat(blobPath); err != nil {
|
2014-11-18 00:29:42 +00:00
|
|
|
switch err := err.(type) {
|
|
|
|
case storagedriver.PathNotFoundError:
|
|
|
|
break // ensure that it doesn't exist.
|
|
|
|
default:
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
return err
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
} else {
|
|
|
|
// If the path exists, we can assume that the content has already
|
|
|
|
// been uploaded, since the blob storage is content-addressable.
|
|
|
|
// While it may be corrupted, detection of such corruption belongs
|
|
|
|
// elsewhere.
|
|
|
|
return nil
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-02-02 21:01:49 +00:00
|
|
|
// If no data was received, we may not actually have a file on disk. Check
|
|
|
|
// the size here and write a zero-length file to blobPath if this is the
|
|
|
|
// case. For the most part, this should only ever happen with zero-length
|
|
|
|
// tars.
|
2015-03-03 16:57:52 +00:00
|
|
|
if _, err := lw.driver.Stat(lw.path); err != nil {
|
2015-02-02 21:01:49 +00:00
|
|
|
switch err := err.(type) {
|
|
|
|
case storagedriver.PathNotFoundError:
|
|
|
|
// HACK(stevvooe): This is slightly dangerous: if we verify above,
|
|
|
|
// get a hash, then the underlying file is deleted, we risk moving
|
|
|
|
// a zero-length blob into a nonzero-length blob location. To
|
|
|
|
// prevent this horrid thing, we employ the hack of only allowing
|
|
|
|
// to this happen for the zero tarsum.
|
2015-03-05 04:26:56 +00:00
|
|
|
if dgst == digest.DigestSha256EmptyTar {
|
2015-03-03 16:57:52 +00:00
|
|
|
return lw.driver.PutContent(blobPath, []byte{})
|
2015-02-02 21:01:49 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// We let this fail during the move below.
|
|
|
|
logrus.
|
2015-03-03 16:57:52 +00:00
|
|
|
WithField("upload.uuid", lw.UUID()).
|
2015-02-02 21:01:49 +00:00
|
|
|
WithField("digest", dgst).Warnf("attempted to move zero-length content with non-zero digest")
|
|
|
|
default:
|
|
|
|
return err // unrelated error
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
return lw.driver.Move(lw.path, blobPath)
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2014-11-25 00:21:02 +00:00
|
|
|
// linkLayer links a valid, written layer blob into the registry under the
|
|
|
|
// named repository for the upload controller.
|
2015-03-03 16:57:52 +00:00
|
|
|
func (lw *layerWriter) linkLayer(canonical digest.Digest, aliases ...digest.Digest) error {
|
2015-03-05 00:31:31 +00:00
|
|
|
dgsts := append([]digest.Digest{canonical}, aliases...)
|
2014-11-18 00:29:42 +00:00
|
|
|
|
2015-03-05 00:31:31 +00:00
|
|
|
// Don't make duplicate links.
|
|
|
|
seenDigests := make(map[digest.Digest]struct{}, len(dgsts))
|
|
|
|
|
|
|
|
for _, dgst := range dgsts {
|
|
|
|
if _, seen := seenDigests[dgst]; seen {
|
|
|
|
continue
|
|
|
|
}
|
|
|
|
seenDigests[dgst] = struct{}{}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
layerLinkPath, err := lw.layerStore.repository.registry.pm.path(layerLinkPathSpec{
|
|
|
|
name: lw.layerStore.repository.Name(),
|
2015-03-05 00:31:31 +00:00
|
|
|
digest: dgst,
|
|
|
|
})
|
|
|
|
|
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
if err := lw.layerStore.repository.registry.driver.PutContent(layerLinkPath, []byte(canonical)); err != nil {
|
2015-03-05 00:31:31 +00:00
|
|
|
return err
|
|
|
|
}
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
2015-03-05 00:31:31 +00:00
|
|
|
return nil
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
// removeResources should clean up all resources associated with the upload
|
|
|
|
// instance. An error will be returned if the clean up cannot proceed. If the
|
|
|
|
// resources are already not present, no error will be returned.
|
2015-03-03 16:57:52 +00:00
|
|
|
func (lw *layerWriter) removeResources() error {
|
|
|
|
dataPath, err := lw.layerStore.repository.registry.pm.path(uploadDataPathSpec{
|
|
|
|
name: lw.layerStore.repository.Name(),
|
|
|
|
uuid: lw.uuid,
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
})
|
2014-11-18 00:29:42 +00:00
|
|
|
|
|
|
|
if err != nil {
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
return err
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
// Resolve and delete the containing directory, which should include any
|
|
|
|
// upload related files.
|
|
|
|
dirPath := path.Dir(dataPath)
|
2015-01-05 07:59:29 +00:00
|
|
|
|
2015-03-03 16:57:52 +00:00
|
|
|
if err := lw.driver.Delete(dirPath); err != nil {
|
Spool layer uploads to remote storage
To smooth initial implementation, uploads were spooled to local file storage,
validated, then pushed to remote storage. That approach was flawed in that it
present easy clustering of registry services that share a remote storage
backend. The original plan was to implement resumable hashes then implement
remote upload storage. After some thought, it was found to be better to get
remote spooling working, then optimize with resumable hashes.
Moving to this approach has tradeoffs: after storing the complete upload
remotely, the node must fetch the content and validate it before moving it to
the final location. This can double bandwidth usage to the remote backend.
Modifying the verification and upload code to store intermediate hashes should
be trivial once the layer digest format has settled.
The largest changes for users of the storage package (mostly the registry app)
are the LayerService interface and the LayerUpload interface. The LayerService
now takes qualified repository names to start and resume uploads. In corallry,
the concept of LayerUploadState has been complete removed, exposing all aspects
of that state as part of the LayerUpload object. The LayerUpload object has
been modified to work as an io.WriteSeeker and includes a StartedAt time, to
allow for upload timeout policies. Finish now only requires a digest, eliding
the requirement for a size parameter.
Resource cleanup has taken a turn for the better. Resources are cleaned up
after successful uploads and during a cancel call. Admittedly, this is probably
not completely where we want to be. It's recommend that we bolster this with a
periodic driver utility script that scans for partial uploads and deletes the
underlying data. As a small benefit, we can leave these around to better
understand how and why these uploads are failing, at the cost of some extra
disk space.
Many other changes follow from the changes above. The webapp needs to be
updated to meet the new interface requirements.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
2015-01-08 22:24:02 +00:00
|
|
|
switch err := err.(type) {
|
|
|
|
case storagedriver.PathNotFoundError:
|
|
|
|
break // already gone!
|
|
|
|
default:
|
|
|
|
// This should be uncommon enough such that returning an error
|
|
|
|
// should be okay. At this point, the upload should be mostly
|
|
|
|
// complete, but perhaps the backend became unaccessible.
|
|
|
|
logrus.Errorf("unable to delete layer upload resources %q: %v", dirPath, err)
|
|
|
|
return err
|
2014-11-18 00:29:42 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return nil
|
|
|
|
}
|