[#170] Support tar.gz archives #177
No reviewers
Labels
No labels
P0
P1
P2
P3
good first issue
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-http-gw#177
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "nzinkevich/frostfs-http-gw:tar_download"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Add handler DownloadTar for downloading tar.gz archives with objects by prefix (same as
zip
). Make methods more universal for using in both implementations.Add feature for unpacking archive during upload via
/upload
handler (to do thatExplode-Archive
http header should be set)[#170] Support downloading tar.gz archiveto [#170] Support tar.gz archives downloading@ -130,2 +130,2 @@
// Zip compression.
cfgZipCompression = "zip.compression"
// Archive compression.
cfgArchiveCompression = "archive.compression"
I'm not sure if we can remove
zip.compression
parameter. Probably we should keep it for one release.cc @alexvanin
@ -65,0 +61,4 @@
| Common headers | See [bearer token](#bearer-token). |
| `X-Attribute-System-*` | Used to set system FrostFS object attributes <br/> (e.g. use "X-Attribute-System-Expiration-Epoch" to set `__SYSTEM__EXPIRATION_EPOCH` attribute). |
| `X-Attribute-*` | Used to set regular object attributes <br/> (e.g. use "X-Attribute-My-Tag" to set `My-Tag` attribute). |
| `X-Attribute-Explode-Archive` | If this header is set, gate reads files from uploaded `tar.gz` archive and creates object for each file in it. Set FilePath attribute as relative path from archive root. |
This should be
X-Explode-Archive
@ -115,3 +140,1 @@
c.Response.Header.Set(fasthttp.HeaderContentType, "application/zip")
c.Response.Header.Set(fasthttp.HeaderContentDisposition, "attachment; filename=\"archive.zip\"")
c.Response.SetStatusCode(http.StatusOK)
c.Response.Header.Set(fasthttp.HeaderContentType, "application/x-gzip")
Shouldn't it be
application/x-gtar
?https://en.wikipedia.org/wiki/List_of_archive_formats
According to this only
application/gzip
is presented. So i think it's more common@ -165,1 +197,3 @@
}
func (h *Handler) putObjectToArchive(ctx context.Context, log *zap.Logger, cnrID cid.ID, bufZip *[]byte, createArchiveHeader func(obj *object.Object) (io.Writer, error)) func(id oid.ID) bool {
return func(id oid.ID) bool {
log := log.With(zap.String("oid", id.EncodeToString()))
Should be
@ -86,2 +88,4 @@
}
if header := c.Request.Header.Peek(utils.UserAttributeHeaderPrefix + explodeArchiveHeader); header != nil {
h.explodeGzip(c, log, bktInfo, file)
Is this different from #176 ?
If no than we should either:
If yes than why do we ever need #176 ?
I accidently include changes from #176 here. For now I think I should remain this PR with commits for download/upload
967ed1d6ae
to8281a02d3a
8281a02d3a
to6886fc6f92
6886fc6f92
toe7238eff9c
I'm not sure what should return upload handler with Explode-Archive option. Should it be only status code or array of object addresses (it seems that the latter would be inefficient if archive contains huge amount of files) @dkirillov @alexvanin
e7238eff9c
to27d9b02be4
27d9b02be4
tof8290c3603
[#170] Support tar.gz archives downloadingto [#170] Support tar.gz archivesf8290c3603
to6166505ccd
@ -656,6 +656,7 @@ func (a *app) configureRouter(handler *handler.Handler) {
r.OPTIONS("/get_by_attribute/{cid}/{attr_key}/{attr_val:*}", a.addPreflight())
a.log.Info(logs.AddedPathGetByAttributeCidAttrKeyAttrVal)
r.GET("/zip/{cid}/{prefix:*}", a.addMiddlewares(handler.DownloadZipped))
r.GET("/tar/{cid}/{prefix:*}", a.addMiddlewares(handler.DownloadTar))
Please add
r.OPTIONS
also@ -68,0 +95,4 @@
if errIter != nil {
log.Error(logs.IteratingOverSelectedObjectsFailed, zap.Error(errIter))
} else if bufZip == nil {
log.Error(logs.ObjectsNotFound)
Why do we need log this? I don't think this is error by the way.
And probably there is more straightforward way to know if no objects were found:
The same for
tar
I can change log to Warn, but I think creating new variable for counting is redundant whilst I can only check for
bufzip == nil
(orlen(bufzip) == 0
for more readability)UPD: I forgot that I can omit usage of *[]byte in case of using separate counter, so this suggestion seems valid
@ -68,0 +97,4 @@
} else if bufZip == nil {
log.Error(logs.ObjectsNotFound)
}
if err := zipWriter.Close(); err != nil {
We shouldn't invoke
zipWriter.Close
on writing error. This methodfinishes writing the zip file by writing the central directory
I suppose this in't what we want when writing zipped object to client failed@ -123,0 +151,4 @@
if h.config.ArchiveCompression() {
gzipWriter, _ = gzip.NewWriterLevel(w, gzip.DefaultCompression)
} else {
gzipWriter, _ = gzip.NewWriterLevel(w, gzip.NoCompression)
Matter of taste, but can we use something like this:
@ -163,3 +197,1 @@
},
Address: addr,
}
func (h *Handler) putObjectToArchive(ctx context.Context, log *zap.Logger, cnrID cid.ID, bufZip *[]byte, createArchiveHeader func(obj *object.Object) (io.Writer, error)) func(id oid.ID) bool {
Do we really need
bufZip *[]byte
?It seem we can use just
bufZip []byte
and init this buffer ingetZipResponseWriter/getTarResponseWriter
@ -176,1 +254,4 @@
return resSearch, nil
}
func (h *Handler) writeToArchive(resGet *Object, objWriter io.Writer, bufZip []byte) error {
It seems this we don't use
h
. So function should be@ -152,0 +224,4 @@
// default reader - without gzip decompression
accReader := newAccumulatedReader(formFile)
var reader io.Reader
if gzipReader, err := gzip.NewReader(accReader); err == nil {
Maybe we can use
Content-Encoding
header for this purpose?https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding
@ -0,0 +1 @@
package handler
Empty file
Still
Good feature for this PR is to support asynchronous object upload. We can implement a pool of go-routines that are accessible by channels. During unzipping uploader can extract file and send it to the channel for asynchronous upload. Then wait acknowledge signal from all goroutines to finish upload.
6166505ccd
toa589a33a1b
a589a33a1b
to4c50d54ead
4c50d54ead
tob353b2cd81
Unfortunately, we cannot get several Readers simultaneously. In
http-gw
we calltarReader.Next()
and pass it asio.Reader
to storage. But callingtarReader.Next()
once again will reset Reader and previous file upload would be interrupted. Thus, the uploading of files directly to storage can be made only sequentially.We can try to temporary upload files to http-gw (sequentially) and then upload them to storage (via workerpool). But I'm not sure in performance benefits from it.
@ -179,0 +239,4 @@
return
}
// add filepath attribute
attributes = append(attributes, *newAttribute(utils.UserAttributeHeaderPrefix+object.AttributeFilePath, obj.Name))
The prefix "X-Attribute-" isn't needed for the FilePath attribute
We also should check (and probably drop if it exists) if the
FilePath
attribute already presence inattributes
. Otherwise (as I remember) we will get errorduplicate attributes
from storage.@ -95,0 +133,4 @@
return
} else if objectsWritten == 0 {
log.Warn(logs.ObjectsNotFound)
return
Probably we shouldn't
return
here. I would expect that empty archive be downloaded@ -213,3 +300,1 @@
return fmt.Errorf("flush zip writer: %v", err)
}
// TODO: проверить zip.Flush
Debug?
If not we should add link to issue and write comment in English
@ -15,2 +16,4 @@
ObjectsNotFound = "objects not found" // Error in ../../downloader/download.go
CloseZipWriter = "close zip writer" // Error in ../../downloader/download.go
CloseGzipWriter = "close gzip writer" // Error in ../../downloader/download.go
CloseTarWriter = "close tar writer" // Error in ../../downloader/download.go
Please don't add comments at the line end
Shall we remove the rest of the comments in file?
Yes, we should remove this either:
@ -65,0 +61,4 @@
| Common headers | See [bearer token](#bearer-token). |
| `X-Attribute-System-*` | Used to set system FrostFS object attributes <br/> (e.g. use "X-Attribute-System-Expiration-Epoch" to set `__SYSTEM__EXPIRATION_EPOCH` attribute). |
| `X-Attribute-*` | Used to set regular object attributes <br/> (e.g. use "X-Attribute-My-Tag" to set `My-Tag` attribute). |
| `X-Explode-Archive` | If set, gate tries to read files from uploading `tar` archive and creates object for each file in it. Uploading `tar` could be compressed via Gzip. Sets FilePath attribute as relative path from archive root. |
We should mention behavior with
Content-Encoding
header@ -179,0 +232,4 @@
}
fileName := filepath.Base(obj.Name)
attributes, err := h.extractAttributes(c, log, fileName)
Probably we can extract attributes (at least common) once before the loop.
b353b2cd81
to6cf5fe295f
6cf5fe295f
toed65df3533
ed65df3533
to9b403da852
9b403da852
to6d0398ef39
@ -93,1 +100,4 @@
}
// setIfNotExist sets key value to map if key is not present yet.
func setIfNotExist(m map[string]string, key, value string) {
thoughts: I'm not sure if we need this function. But this matter of taste. I'm ok to keep it
@ -179,0 +187,4 @@
return attributes, nil
}
func newAttribute(key string, val string) *object.Attribute {
Let's use
@ -179,0 +217,4 @@
attrlen := len(attributes)
filePathIndex, fileNameIndex := attrlen-1, attrlen-2
var reader = file
Please, write:
@ -179,0 +212,4 @@
}
attributes = append(attributes,
*newAttribute(object.AttributeFileName, ""),
*newAttribute(object.AttributeFilePath, ""),
Suggestion:
@ -150,3 +191,2 @@
var bufZip []byte
var addr oid.Address
gzipWriter, _ := gzip.NewWriterLevel(w, compressionLevel)
Can we add comment why do we ignore the error?
Overall LGTM, but see comment #177 (comment)
cc @alexvanin
6d0398ef39
toe20b2bf22d
New commits pushed, approval review dismissed automatically according to repository settings
0f8b77b67b
tod42465f494
d42465f494
to5373f156c3
@ -179,3 +179,3 @@
func (s *appSettings) update(v *viper.Viper, l *zap.Logger) {
defaultTimestamp := v.GetBool(cfgUploaderHeaderEnableDefaultTimestamp)
zipCompression := v.GetBool(cfgZipCompression)
archiveCompression := v.GetBool(cfgArchiveCompression)
I had a chat with developers who configure gateways and received a feedback that deprecated version of old settings will be nice for one minor release. This also was mentioned by @dkirillov in the comments.
I suggest:
cfgZipCompression
constant and declare it deprecated in the commentif v.IsSet(cfgZipCompression)
then use it as value forarchiveCompression
variablezip
section in gate-configuration.md but mention this is deprecated@ -189,3 +234,1 @@
},
Address: addr,
}
func (h *Handler) putObjectToArchive(ctx context.Context, log *zap.Logger, cnrID cid.ID, bufZip []byte, createArchiveHeader func(obj *object.Object) (io.Writer, error)) func(id oid.ID) bool {
bufZip
may confuse in the future. I think it's just abuf
now for both tar ang zip archives.@ -266,3 +266,3 @@
addr.SetObject(objID)
handler(ctx, h.newRequest(c, log), addr)
handler(ctx, newRequest(c, log), addr)
Can you describe this change? In my opinion we should either:
newRequest
and remove(h *Handler) newRequest
newRequest
at all.Both are okay for me.
5373f156c3
to617b150845
617b150845
to36bd3e2d43
New commits pushed, approval review dismissed automatically according to repository settings