[#151] Index page for FrostFS containers #153

Merged
alexvanin merged 1 commit from nzinkevich/frostfs-http-gw:feature/index_page_native into master 2024-11-20 06:32:34 +00:00
Member

Closes #151. Make index page work not only via s3-like requests like /get/<bucketName>/<prefix> but also with ordinary http-gw requests with cid and oid: /get/<cid>/<oid>. Native protocol index page returns when cid (as a FrostFS container ID) parses correctly but oid is empty, incorrect or not found

Closes #151. Make index page work not only via s3-like requests like `/get/<bucketName>/<prefix>` but also with ordinary http-gw requests with `cid` and `oid`: `/get/<cid>/<oid>`. Native protocol index page returns when `cid` (as a FrostFS container ID) parses correctly but `oid` is empty, incorrect or not found
nzinkevich added 2 commits 2024-10-11 13:46:41 +00:00
Signed-off-by: Nikita Zinkevich <n.zinkevich@yadro.com>
[#151] index page: Move getDirObjects handlers to browse.go
Some checks failed
/ DCO (pull_request) Failing after 1m2s
/ Vulncheck (pull_request) Successful in 1m30s
/ Builds (pull_request) Successful in 1m0s
/ Lint (pull_request) Successful in 2m20s
/ Tests (pull_request) Successful in 1m7s
7d4c03712a
nzinkevich changed title from feature/index_page_native to [#151] feature/index_page_native 2024-10-11 13:46:59 +00:00
nzinkevich requested review from storage-services-committers 2024-10-11 13:47:44 +00:00
nzinkevich requested review from storage-services-developers 2024-10-11 13:47:46 +00:00
nzinkevich changed title from [#151] feature/index_page_native to [#151] Index page for FrostFS containers 2024-10-11 13:49:13 +00:00
nzinkevich force-pushed feature/index_page_native from 7d4c03712a to 9c1842f899 2024-10-14 06:27:57 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 9c1842f899 to 33b3c8ac9d 2024-10-14 06:40:34 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 33b3c8ac9d to 675c76a9c8 2024-10-14 06:45:40 +00:00 Compare
alexvanin reviewed 2024-10-14 13:54:02 +00:00
@ -366,0 +369,4 @@
| Parameter | Type | SIGHUP reload | Default value | Description |
|------------------------|----------|---------------|---------------|----------------------------------------------------------------------------------------------|
| `enabled` | `bool` | yes | `false` | Flag to enable index_page return if no object with specified S3-name was found. |
| `template_path.s3` | `string` | yes | `""` | Path to .gotmpl file with html templates for S3 index_page (which is using tree service fr . |
Owner

Unfinished sentence in description?

Unfinished sentence in description?
Author
Member

I placed this info above the table, but forgot to delete it here. Fixed

I placed this info above the table, but forgot to delete it here. Fixed
alexvanin marked this conversation as resolved
nzinkevich force-pushed feature/index_page_native from 675c76a9c8 to 8845571b2c 2024-10-15 06:34:27 +00:00 Compare
dkirillov reviewed 2024-10-16 11:19:28 +00:00
@ -209,2 +210,3 @@
v.SetDefault(cfgIndexPageEnabled, false)
v.SetDefault(cfgIndexPageTemplatePath, "")
v.SetDefault(cfgIndexPageS3TemplatePath, "")
v.SetDefault(cfgIndexPageNativeTemplatePath, "")
Member

Actually we can skip setting such defaults.

Actually we can skip setting such defaults.
dkirillov marked this conversation as resolved
@ -41,0 +58,4 @@
func newListObjectsResponseNative(attrs map[string]string) ResponseObject {
return ResponseObject{
OID: attrs[attrOID],
Created: attrs[object.AttributeTimestamp] + "000",
Member

Maybe it's better to save to fields value in appropriate format and don't invoke any functions from template?

Maybe it's better to save to fields value in appropriate format and don't invoke any functions from template?
Member

Will we change this? It still invokes formatTimestamp from template

Will we change this? It still invokes `formatTimestamp` from template
dkirillov marked this conversation as resolved
@ -117,0 +172,4 @@
basePath := strings.TrimRightFunc(prefix, func(r rune) bool {
return r != '/'
})
objectIDs, err := h.search(ctx, bucketInfo.CID, object.AttributeFilePath, prefix, object.MatchCommonPrefix)
Member

Can we search all object if prefix is empty to be able to handle objects that have FileName rather than FilePath?

Can we search all object if prefix is empty to be able to handle objects that have `FileName` rather than `FilePath`?
Author
Member

Fixed

Fixed
Member

I meant write something like this:

filters := object.NewSearchFilters()
filters.AddRootFilter()

if prefix != "" {
    filters.AddFilter(object.AttributeFilePath, prefix, object.MatchCommonPrefix)
}
I meant write something like this: ```golang filters := object.NewSearchFilters() filters.AddRootFilter() if prefix != "" { filters.AddFilter(object.AttributeFilePath, prefix, object.MatchCommonPrefix) } ```
dkirillov marked this conversation as resolved
@ -117,0 +189,4 @@
const initialSliceCapacity = 100
var (
log = h.log.With(
Member

Please use := syntax where possible

Please use `:=` syntax where possible
dkirillov marked this conversation as resolved
@ -117,0 +222,4 @@
go func() {
defer wg.Done()
addr.SetObject(id)
Member

We cannot use addr from other goroutine.

We cannot use `addr` from other goroutine.
dkirillov marked this conversation as resolved
@ -117,0 +240,4 @@
dirname := getNextDir(attrs[object.AttributeFilePath], basePath)
if dirname == "" {
mu.Lock()
objects = append(objects, newListObjectsResponseNative(attrs))
Member

Why don't we use channel for this?

Why don't we use channel for this?
dkirillov marked this conversation as resolved
@ -117,0 +254,4 @@
}
}()
return false
Member

If I understand correctly, we stop iteration if any error is occurred. So here we should return true if context is canceled

If I understand correctly, we stop iteration if any error is occurred. So here we should `return true` if context is canceled
dkirillov marked this conversation as resolved
@ -28,2 +26,2 @@
err := id.DecodeString(test)
if err != nil {
testCID, _ := c.UserValue("cid").(string)
var cntID cid.ID
Member

We usually use cnrID name. So let's write:

cnrIDStr, _ := c.UserValue("cid").(string)
var cnrID cid.ID
We usually use `cnrID` name. So let's write: ```golang cnrIDStr, _ := c.UserValue("cid").(string) var cnrID cid.ID ```
dkirillov marked this conversation as resolved
@ -205,6 +218,28 @@ func (h *Handler) byAddress(c *fasthttp.RequestCtx, f func(context.Context, requ
addr.SetContainer(bktInfo.CID)
addr.SetObject(*objID)
_, err = h.frostfs.GetObject(ctx, PrmObjectGet{
Member

We do the same GetObject request in f function. It's better to use HeadObject.

By the way, do we really need this? If oid is actual object id why do we list objects?

We do the same `GetObject` request in `f` function. It's better to use `HeadObject`. By the way, do we really need this? If `oid` is actual object id why do we list objects?
Author
Member

We only checked before that idObjstring is decodable to oid type, but I think we should also output index page if no object found. That's why I try to GetObject (Head is better, I agree)

We only checked before that `idObj`string is decodable to oid type, but I think we should also output index page if no object found. That's why I try to `GetObject` (Head is better, I agree)
Member

I think we should also output index page if no object found

Could you explain why?

> I think we should also output index page if no object found Could you explain why?
Author
Member

Well, I thought the S3 index page worked similarly, but I forgot that I removed this feature in the final implementation. Apache also does not return an index page if page not found. So I agree, it shouldn't return index page in that case

Well, I thought the S3 index page worked similarly, but I forgot that I removed this feature in the final implementation. Apache also does not return an index page if page not found. So I agree, it shouldn't return index page in that case
dkirillov marked this conversation as resolved
@ -0,0 +1,101 @@
{{$container := .BucketInfo.CID}}
Member

Do we really need separate template?

Do we really need separate template?
dkirillov marked this conversation as resolved
nzinkevich force-pushed feature/index_page_native from 8845571b2c to 4fe34909aa 2024-10-18 08:14:48 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 4fe34909aa to be8b89c487 2024-10-18 08:18:38 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from be8b89c487 to e0a0f53fb8 2024-10-18 08:21:13 +00:00 Compare
dkirillov reviewed 2024-10-21 12:23:12 +00:00
@ -222,2 +221,4 @@
path := a.cfg.GetString(cfgIndexPageTemplatePath)
tmpl, err := a.readTemplate(path)
if err != nil {
a.settings.setIndexTemplate("")
Member

By the way, why do we set default template if we couldn't load new one? Maybe it's worth don't change current template?

By the way, why do we set default template if we couldn't load new one? Maybe it's worth don't change current template?
dkirillov marked this conversation as resolved
@ -117,0 +191,4 @@
zap.String("cid", bucketInfo.CID.EncodeToString()),
zap.String("prefix", prefix),
)
basePath := strings.TrimRightFunc(prefix, func(r rune) bool {
Member

Matter of taste but

	var basePath string
	if ind := strings.LastIndex(prefix, "/"); ind != -1 {
		basePath = prefix[:ind+1]
	}
Matter of taste but ```golang var basePath string if ind := strings.LastIndex(prefix, "/"); ind != -1 { basePath = prefix[:ind+1] } ```
dkirillov marked this conversation as resolved
@ -117,0 +250,4 @@
func (h *Handler) headDirObjects(ctx context.Context, p headDirParams) {
wg := sync.WaitGroup{}
dirs := sync.Map{}
Member

Please, write:

var wg sync.WaitGroup
var dirs sync.Map

Or

var (
    wg sync.WaitGroup
    dirs sync.Map
)
Please, write: ```golang var wg sync.WaitGroup var dirs sync.Map ``` Or ```golang var ( wg sync.WaitGroup dirs sync.Map ) ```
dkirillov marked this conversation as resolved
@ -117,0 +256,4 @@
err := p.objectIDs.Iterate(func(id oid.ID) bool {
wg.Add(1)
go func(id oid.ID) {
Member

It's better to restrict number of goroutines. See for example s3-gw listing

It's better to restrict number of goroutines. See for example s3-gw listing https://git.frostfs.info/TrueCloudLab/frostfs-s3-gw/src/commit/e35b582fe25a97c018367ef9244dfe93d4500ad9/api/layer/listing.go#L530
dkirillov marked this conversation as resolved
@ -194,3 +194,3 @@
idCnr, _ = c.UserValue("cid").(string)
idObj, _ = c.UserValue("oid").(string)
idObj, _ = url.PathUnescape(c.UserValue("oid").(string))
log = h.log.With(zap.String("cid", idCnr), zap.String("oid", idObj))
Member

Please, write:

		idCnr, _ := c.UserValue("cid").(string)
		idObj, _ := url.PathUnescape(c.UserValue("oid").(string))
		log      := h.log.With(zap.String("cid", idCnr), zap.String("oid", idObj))
Please, write: ```golang idCnr, _ := c.UserValue("cid").(string) idObj, _ := url.PathUnescape(c.UserValue("oid").(string)) log := h.log.With(zap.String("cid", idCnr), zap.String("oid", idObj)) ```
dkirillov marked this conversation as resolved
@ -207,2 +207,4 @@
objID := new(oid.ID)
if err = objID.DecodeString(idObj); err != nil {
if h.config.IndexPageEnabled() {
var addr oid.Address
Member

This variable is unused

This variable is unused
dkirillov marked this conversation as resolved
@ -61,3 +61,3 @@
return ""
}
func (c *configMock) IndexPageTemplate() string {
func (c *configMock) IndexPageNativeTemplate() string {
Member

Please, add new line

Please, add new line
dkirillov marked this conversation as resolved
@ -59,4 +72,3 @@
</tr>
{{end}}
{{range .Objects}}
<tr>
Member

I don't understand why do we need conditiond If $isNative. We show objects and dirs, in both cases objects and dirs have the same attibutes (name, size, timestamp, OID)

I don't understand why do we need conditiond `If $isNative`. We show objects and dirs, in both cases objects and dirs have the same attibutes (name, size, timestamp, OID)
Member

I meant something like

diff --git a/internal/handler/browse.go b/internal/handler/browse.go
index a2373d1..b55d349 100644
--- a/internal/handler/browse.go
+++ b/internal/handler/browse.go
@@ -45,12 +45,14 @@ type (
 		FilePath string
 		Size     string
 		IsDir    bool
+		GetURL   string
 	}
 )
 
 func newListObjectsResponseS3(attrs map[string]string) ResponseObject {
 	return ResponseObject{
 		Created:  formatTimestamp(attrs[attrCreated]),
+		OID:      attrs[attrOID],
 		FileName: attrs[attrFileName],
 		Size:     attrs[attrSize],
 		IsDir:    attrs[attrOID] == "",
@@ -171,6 +173,7 @@ func (h *Handler) getDirObjectsS3(ctx context.Context, bucketInfo *data.BucketIn
 		}
 		obj := newListObjectsResponseS3(attrs)
 		obj.FilePath = prefix + obj.FileName
+		obj.GetURL = "/get/" + bucketInfo.Name + "/" + urlencode(obj.FilePath)
 		objects = append(objects, obj)
 	}
 
@@ -202,7 +205,15 @@ func (h *Handler) getDirObjectsNative(ctx context.Context, bucketInfo *data.Buck
 	}
 	defer objectIDs.Close()
 
-	return h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath)
+	res, err := h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath)
+	if err != nil {
+		return nil, err
+	}
+	for i := range res {
+		res[i].GetURL = "/get/" + bucketInfo.CID.EncodeToString() + "/" + res[i].OID
+	}
+
+	return res, nil
 }
 
 type workerParams struct {
diff --git a/internal/templates/index.gotmpl b/internal/templates/index.gotmpl
index a7ea5a9..d20fb13 100644
--- a/internal/templates/index.gotmpl
+++ b/internal/templates/index.gotmpl
@@ -71,17 +71,12 @@
             <td>
                 {{if .IsDir}}
                     🗀
-                    <a href="/get/{{$container}}{{ urlencode .FilePath }}/">
+                    <a href="{{ .GetURL }}/">
                         {{.FileName}}/
                     </a>
-                {{else if .OID}}
-                    🗎
-                    <a href="/get/{{$container}}/{{ .OID }}">
-                        {{.FileName}}
-                    </a>
                 {{else}}
                     🗎
-                    <a href="/get/{{$container}}{{ urlencode .FilePath }}">
+                    <a href="{{ .GetURL }}">
                         {{.FileName}}
                     </a>
                 {{end}}
@@ -90,15 +85,9 @@
             <td>{{if .Size}}{{ formatSize .Size }}{{end}}</td>
             <td>{{ .Created }}</td>
             <td>
-                {{if .OID}}
-                    <a href="/get/{{$container}}/{{ .OID }}?download=true">
-                        Link
-                    </a>
-                {{else}}
-                    <a href="/get/{{$container}}{{ urlencode .FilePath }}?download=true">
-                        Link
-                    </a>
-                {{ end }}
+                <a href="{{ .GetURL }}?download=true">
+                    Link
+                </a>
             </td>
         </tr>
     {{end}}

I meant something like ```diff diff --git a/internal/handler/browse.go b/internal/handler/browse.go index a2373d1..b55d349 100644 --- a/internal/handler/browse.go +++ b/internal/handler/browse.go @@ -45,12 +45,14 @@ type ( FilePath string Size string IsDir bool + GetURL string } ) func newListObjectsResponseS3(attrs map[string]string) ResponseObject { return ResponseObject{ Created: formatTimestamp(attrs[attrCreated]), + OID: attrs[attrOID], FileName: attrs[attrFileName], Size: attrs[attrSize], IsDir: attrs[attrOID] == "", @@ -171,6 +173,7 @@ func (h *Handler) getDirObjectsS3(ctx context.Context, bucketInfo *data.BucketIn } obj := newListObjectsResponseS3(attrs) obj.FilePath = prefix + obj.FileName + obj.GetURL = "/get/" + bucketInfo.Name + "/" + urlencode(obj.FilePath) objects = append(objects, obj) } @@ -202,7 +205,15 @@ func (h *Handler) getDirObjectsNative(ctx context.Context, bucketInfo *data.Buck } defer objectIDs.Close() - return h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath) + res, err := h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath) + if err != nil { + return nil, err + } + for i := range res { + res[i].GetURL = "/get/" + bucketInfo.CID.EncodeToString() + "/" + res[i].OID + } + + return res, nil } type workerParams struct { diff --git a/internal/templates/index.gotmpl b/internal/templates/index.gotmpl index a7ea5a9..d20fb13 100644 --- a/internal/templates/index.gotmpl +++ b/internal/templates/index.gotmpl @@ -71,17 +71,12 @@ <td> {{if .IsDir}} 🗀 - <a href="/get/{{$container}}{{ urlencode .FilePath }}/"> + <a href="{{ .GetURL }}/"> {{.FileName}}/ </a> - {{else if .OID}} - 🗎 - <a href="/get/{{$container}}/{{ .OID }}"> - {{.FileName}} - </a> {{else}} 🗎 - <a href="/get/{{$container}}{{ urlencode .FilePath }}"> + <a href="{{ .GetURL }}"> {{.FileName}} </a> {{end}} @@ -90,15 +85,9 @@ <td>{{if .Size}}{{ formatSize .Size }}{{end}}</td> <td>{{ .Created }}</td> <td> - {{if .OID}} - <a href="/get/{{$container}}/{{ .OID }}?download=true"> - Link - </a> - {{else}} - <a href="/get/{{$container}}{{ urlencode .FilePath }}?download=true"> - Link - </a> - {{ end }} + <a href="{{ .GetURL }}?download=true"> + Link + </a> </td> </tr> {{end}} ```
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-10-21 13:55:42 +00:00
@ -117,0 +198,4 @@
if basePath == "" {
filters = append(filters, object.MatchNotPresent)
}
objCh := make(chan ResponseObject)
Member

It's better to create such channel somewhere in headDirObjects and close it there when no more objects be available

It's better to create such channel somewhere in `headDirObjects` and close it there when no more objects be available
dkirillov marked this conversation as resolved
nzinkevich force-pushed feature/index_page_native from e0a0f53fb8 to 3f421c6d12 2024-10-23 08:41:32 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 3f421c6d12 to b900a915dd 2024-10-23 09:19:29 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from b900a915dd to f437d74273 2024-10-23 09:31:07 +00:00 Compare
nzinkevich self-assigned this 2024-10-24 07:19:19 +00:00
dkirillov reviewed 2024-10-28 06:44:51 +00:00
@ -230,3 +231,1 @@
a.settings.setIndexTemplate("")
a.log.Warn(logs.FailedToReadIndexPageTemplate, zap.Error(err))
return
a.log.Fatal(logs.FailedToReadIndexPageTemplate, zap.Error(err))
Member

Please, don't use fatal. Otherwise by sighup server can be stopped

Please, don't use fatal. Otherwise by sighup server can be stopped
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-10-28 06:51:03 +00:00
@ -116,0 +252,4 @@
done <- struct{}{}
}()
pool, err := ants.NewPool(runtime.NumCPU())
Member

Pool size should be configurable. And default should be at least 100 (maybe 1000)

Pool size should be configurable. And default should be at least 100 (maybe 1000)
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-10-28 08:15:17 +00:00
@ -116,0 +276,4 @@
close(p.errCh)
close(p.objCh)
<-done
<-done
Member

This 4 lines seems confusing to me.
Can we rewrite this code to make it more strightforward?
For example:

diff --git a/internal/handler/browse.go b/internal/handler/browse.go
index a2373d1..d3f84ae 100644
--- a/internal/handler/browse.go
+++ b/internal/handler/browse.go
@@ -12,7 +12,6 @@ import (
 	"time"
 
 	"git.frostfs.info/TrueCloudLab/frostfs-http-gw/internal/data"
-	"git.frostfs.info/TrueCloudLab/frostfs-http-gw/internal/logs"
 	"git.frostfs.info/TrueCloudLab/frostfs-http-gw/utils"
 	cid "git.frostfs.info/TrueCloudLab/frostfs-sdk-go/container/id"
 	"git.frostfs.info/TrueCloudLab/frostfs-sdk-go/object"
@@ -202,116 +201,118 @@ func (h *Handler) getDirObjectsNative(ctx context.Context, bucketInfo *data.Buck
 	}
 	defer objectIDs.Close()
 
-	return h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath)
-}
+	ctx, cancel := context.WithCancel(ctx)
+	defer cancel()
 
-type workerParams struct {
-	cnrID     cid.ID
-	objectIDs ResObjectSearch
-	basePath  string
-	errCh     chan error
-	objCh     chan ResponseObject
-	cancel    context.CancelFunc
-}
+	resp, err := h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath)
+	if err != nil {
+		return nil, err
+	}
 
-func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) ([]ResponseObject, error) {
-	const initialSliceCapacity = 100
+	log := utils.GetReqLogOrDefault(ctx, h.log)
 
-	var wg sync.WaitGroup
-	var dirs sync.Map
-	log := h.log.With(
-		zap.String("cid", cnrID.EncodeToString()),
-		zap.String("path", basePath),
-	)
-	done := make(chan struct{})
-	objects := make([]ResponseObject, 0, initialSliceCapacity)
-	ctx, cancel := context.WithCancel(ctx)
-	p := workerParams{
-		cnrID:     cnrID,
-		objectIDs: objectIDs,
-		basePath:  basePath,
-		errCh:     make(chan error, 1),
-		objCh:     make(chan ResponseObject, 1),
-		cancel:    cancel,
-	}
-	defer cancel()
+	dirs := make(map[string]struct{})
+	objects := make([]ResponseObject, 0, 100)
+	for objExt := range resp {
+		if objExt.Error != nil {
+			log.Error("error", zap.Error(objExt.Error))
+			cancel()
+			continue
+		}
 
-	go func() {
-		for err := range p.errCh {
-			if err != nil {
-				log.Error(logs.FailedToHeadObject, zap.Error(err))
+		if objExt.Object.IsDir {
+			if _, ok := dirs[objExt.Object.FileName]; ok {
+				continue
 			}
+			dirs[objExt.Object.FileName] = struct{}{}
 		}
-		done <- struct{}{}
-	}()
 
-	go func() {
-		for obj := range p.objCh {
-			objects = append(objects, obj)
-		}
-		done <- struct{}{}
-	}()
+		objects = append(objects, objExt.Object)
+	}
 
+	return objects, nil
+}
+
+type ResponseObjectExtended struct {
+	Object ResponseObject
+	Error  error
+}
+
+func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) (chan ResponseObjectExtended, error) {
 	pool, err := ants.NewPool(runtime.NumCPU())
 	if err != nil {
 		return nil, err
 	}
-	defer pool.Release()
-	err = objectIDs.Iterate(func(id oid.ID) bool {
-		wg.Add(1)
-		if err = pool.Submit(func() {
-			defer wg.Done()
-			h.headDirObject(ctx, id, &dirs, p)
-		}); err != nil {
-			p.errCh <- err
-		}
-		select {
-		case <-ctx.Done():
-			return true
-		default:
-			return false
+
+	res := make(chan ResponseObjectExtended)
+
+	go func() {
+		defer func() {
+			pool.Release()
+			close(res)
+		}()
+
+		log := utils.GetReqLogOrDefault(ctx, h.log).With(
+			zap.String("cid", cnrID.EncodeToString()),
+			zap.String("path", basePath),
+		)
+
+		var wg sync.WaitGroup
+		err = objectIDs.Iterate(func(id oid.ID) bool {
+			wg.Add(1)
+			err = pool.Submit(func() {
+				defer wg.Done()
+
+				var obj ResponseObjectExtended
+				obj.Object, obj.Error = h.headDirObject(ctx, cnrID, id, basePath)
+				res <- obj
+			})
+			if err != nil {
+				wg.Done()
+				log.Warn("FailedToSubmitTaskToPool", zap.Error(err))
+			}
+
+			select {
+			case <-ctx.Done():
+				return true
+			default:
+				return false
+			}
+		})
+		if err != nil {
+			log.Error("iterate", zap.Error(err))
 		}
-	})
-	wg.Wait()
-	close(p.errCh)
-	close(p.objCh)
-	<-done
-	<-done
 
-	if err != nil {
-		return nil, err
-	}
+		wg.Wait()
+	}()
 
-	return objects, nil
+	return res, nil
 }
 
-func (h *Handler) headDirObject(ctx context.Context, objID oid.ID, dirs *sync.Map, p workerParams) {
-	addr := newAddress(p.cnrID, objID)
+func (h *Handler) headDirObject(ctx context.Context, cnrID cid.ID, objID oid.ID, basePath string) (ResponseObject, error) {
+	addr := newAddress(cnrID, objID)
 	obj, err := h.frostfs.HeadObject(ctx, PrmObjectHead{
 		PrmAuth: PrmAuth{BearerToken: bearerToken(ctx)},
 		Address: addr,
 	})
 	if err != nil {
-		p.errCh <- err
-		p.cancel()
-		return
+		return ResponseObject{}, err
 	}
 
 	attrs := loadAttributes(obj.Attributes())
 	attrs[attrOID] = objID.EncodeToString()
 	attrs[attrSize] = strconv.FormatUint(obj.PayloadSize(), 10)
 
-	dirname := getNextDir(attrs[object.AttributeFilePath], p.basePath)
+	dirname := getNextDir(attrs[object.AttributeFilePath], basePath)
 	if dirname == "" {
-		p.objCh <- newListObjectsResponseNative(attrs)
-	} else if _, ok := dirs.Load(dirname); !ok {
-		p.objCh <- ResponseObject{
-			FileName: dirname,
-			FilePath: p.basePath + dirname,
-			IsDir:    true,
-		}
-		dirs.Store(dirname, true)
+		return newListObjectsResponseNative(attrs), nil
 	}
+
+	return ResponseObject{
+		FileName: dirname,
+		FilePath: basePath + dirname,
+		IsDir:    true,
+	}, nil
 }
 
 type browseParams struct {

This 4 lines seems confusing to me. Can we rewrite this code to make it more strightforward? For example: ```diff diff --git a/internal/handler/browse.go b/internal/handler/browse.go index a2373d1..d3f84ae 100644 --- a/internal/handler/browse.go +++ b/internal/handler/browse.go @@ -12,7 +12,6 @@ import ( "time" "git.frostfs.info/TrueCloudLab/frostfs-http-gw/internal/data" - "git.frostfs.info/TrueCloudLab/frostfs-http-gw/internal/logs" "git.frostfs.info/TrueCloudLab/frostfs-http-gw/utils" cid "git.frostfs.info/TrueCloudLab/frostfs-sdk-go/container/id" "git.frostfs.info/TrueCloudLab/frostfs-sdk-go/object" @@ -202,116 +201,118 @@ func (h *Handler) getDirObjectsNative(ctx context.Context, bucketInfo *data.Buck } defer objectIDs.Close() - return h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath) -} + ctx, cancel := context.WithCancel(ctx) + defer cancel() -type workerParams struct { - cnrID cid.ID - objectIDs ResObjectSearch - basePath string - errCh chan error - objCh chan ResponseObject - cancel context.CancelFunc -} + resp, err := h.headDirObjects(ctx, bucketInfo.CID, objectIDs, basePath) + if err != nil { + return nil, err + } -func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) ([]ResponseObject, error) { - const initialSliceCapacity = 100 + log := utils.GetReqLogOrDefault(ctx, h.log) - var wg sync.WaitGroup - var dirs sync.Map - log := h.log.With( - zap.String("cid", cnrID.EncodeToString()), - zap.String("path", basePath), - ) - done := make(chan struct{}) - objects := make([]ResponseObject, 0, initialSliceCapacity) - ctx, cancel := context.WithCancel(ctx) - p := workerParams{ - cnrID: cnrID, - objectIDs: objectIDs, - basePath: basePath, - errCh: make(chan error, 1), - objCh: make(chan ResponseObject, 1), - cancel: cancel, - } - defer cancel() + dirs := make(map[string]struct{}) + objects := make([]ResponseObject, 0, 100) + for objExt := range resp { + if objExt.Error != nil { + log.Error("error", zap.Error(objExt.Error)) + cancel() + continue + } - go func() { - for err := range p.errCh { - if err != nil { - log.Error(logs.FailedToHeadObject, zap.Error(err)) + if objExt.Object.IsDir { + if _, ok := dirs[objExt.Object.FileName]; ok { + continue } + dirs[objExt.Object.FileName] = struct{}{} } - done <- struct{}{} - }() - go func() { - for obj := range p.objCh { - objects = append(objects, obj) - } - done <- struct{}{} - }() + objects = append(objects, objExt.Object) + } + return objects, nil +} + +type ResponseObjectExtended struct { + Object ResponseObject + Error error +} + +func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) (chan ResponseObjectExtended, error) { pool, err := ants.NewPool(runtime.NumCPU()) if err != nil { return nil, err } - defer pool.Release() - err = objectIDs.Iterate(func(id oid.ID) bool { - wg.Add(1) - if err = pool.Submit(func() { - defer wg.Done() - h.headDirObject(ctx, id, &dirs, p) - }); err != nil { - p.errCh <- err - } - select { - case <-ctx.Done(): - return true - default: - return false + + res := make(chan ResponseObjectExtended) + + go func() { + defer func() { + pool.Release() + close(res) + }() + + log := utils.GetReqLogOrDefault(ctx, h.log).With( + zap.String("cid", cnrID.EncodeToString()), + zap.String("path", basePath), + ) + + var wg sync.WaitGroup + err = objectIDs.Iterate(func(id oid.ID) bool { + wg.Add(1) + err = pool.Submit(func() { + defer wg.Done() + + var obj ResponseObjectExtended + obj.Object, obj.Error = h.headDirObject(ctx, cnrID, id, basePath) + res <- obj + }) + if err != nil { + wg.Done() + log.Warn("FailedToSubmitTaskToPool", zap.Error(err)) + } + + select { + case <-ctx.Done(): + return true + default: + return false + } + }) + if err != nil { + log.Error("iterate", zap.Error(err)) } - }) - wg.Wait() - close(p.errCh) - close(p.objCh) - <-done - <-done - if err != nil { - return nil, err - } + wg.Wait() + }() - return objects, nil + return res, nil } -func (h *Handler) headDirObject(ctx context.Context, objID oid.ID, dirs *sync.Map, p workerParams) { - addr := newAddress(p.cnrID, objID) +func (h *Handler) headDirObject(ctx context.Context, cnrID cid.ID, objID oid.ID, basePath string) (ResponseObject, error) { + addr := newAddress(cnrID, objID) obj, err := h.frostfs.HeadObject(ctx, PrmObjectHead{ PrmAuth: PrmAuth{BearerToken: bearerToken(ctx)}, Address: addr, }) if err != nil { - p.errCh <- err - p.cancel() - return + return ResponseObject{}, err } attrs := loadAttributes(obj.Attributes()) attrs[attrOID] = objID.EncodeToString() attrs[attrSize] = strconv.FormatUint(obj.PayloadSize(), 10) - dirname := getNextDir(attrs[object.AttributeFilePath], p.basePath) + dirname := getNextDir(attrs[object.AttributeFilePath], basePath) if dirname == "" { - p.objCh <- newListObjectsResponseNative(attrs) - } else if _, ok := dirs.Load(dirname); !ok { - p.objCh <- ResponseObject{ - FileName: dirname, - FilePath: p.basePath + dirname, - IsDir: true, - } - dirs.Store(dirname, true) + return newListObjectsResponseNative(attrs), nil } + + return ResponseObject{ + FileName: dirname, + FilePath: basePath + dirname, + IsDir: true, + }, nil } type browseParams struct { ```
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-10-28 08:15:46 +00:00
@ -116,0 +263,4 @@
defer wg.Done()
h.headDirObject(ctx, id, &dirs, p)
}); err != nil {
p.errCh <- err
Member

Here we also should add wg.Done

Here we also should add `wg.Done`
dkirillov marked this conversation as resolved
Member

Please, rebase branch

Please, rebase branch
nzinkevich force-pushed feature/index_page_native from f437d74273 to c56eafac0e 2024-10-31 12:42:08 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from c56eafac0e to 1972f69099 2024-10-31 12:49:40 +00:00 Compare
alexvanin reviewed 2024-11-02 09:04:42 +00:00
@ -29,1 +26,3 @@
if err != nil {
cnrIDStr, _ := c.UserValue("cid").(string)
var cnrID cid.ID
if err := cnrID.DecodeString(cnrIDStr); err != nil {
Owner

What's the difference between checking oid before and cid now?

What's the difference between checking `oid` before and `cid` now?
alexvanin requested changes 2024-11-02 09:30:31 +00:00
Dismissed
@ -116,0 +242,4 @@
}
func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) (chan ResponseObjectExtended, error) {
pool, err := ants.NewPool(h.config.IndexPagePoolSize())
Owner

As far as I understand, every simultaneous browse request will produce new pool with up to 1000 routines by default. That might increase goroutine usage very very fast under the load.

I suggest to use one pool of workers in the handler.

As far as I understand, every simultaneous browse request will produce new pool with up to 1000 routines by default. That might increase goroutine usage very very fast under the load. I suggest to use one pool of workers in the handler.
alexvanin marked this conversation as resolved
nzinkevich force-pushed feature/index_page_native from 1972f69099 to f8369145c9 2024-11-02 10:24:03 +00:00 Compare
nzinkevich requested review from alexvanin 2024-11-02 10:27:08 +00:00
nzinkevich force-pushed feature/index_page_native from f8369145c9 to 558609d311 2024-11-02 10:30:42 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 558609d311 to 6622110f8c 2024-11-02 10:44:13 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 6622110f8c to 12819b6bd3 2024-11-02 10:47:10 +00:00 Compare
alexvanin approved these changes 2024-11-02 13:57:08 +00:00
Dismissed
@ -22,2 +28,4 @@
attrFileName = "FileName"
attrSize = "Size"
defaultPoolSize = 1000
Owner

I think this is basically unused and can be moved to test file. In the app you defined default pool size in settings.go

I think this is basically unused and can be moved to test file. In the app you defined default pool size in settings.go
dkirillov reviewed 2024-11-05 08:25:38 +00:00
@ -76,2 +84,2 @@
<td>{{if not .IsDir}}{{ formatSize .Size }}{{end}}</td>
<td>{{if not .IsDir}}{{ formatTimestamp .Created }}{{end}}</td>
<td>{{.OID}}</td>
<td>{{if .Size}}{{ formatSize .Size }}{{end}}</td>
Member

Why do we change condition from if not .IsDir to if .Size?

Why do we change condition from `if not .IsDir` to `if .Size`?
Author
Member

Oh, .IsDir would be better to mark empty files with '0B' size

Oh, .IsDir would be better to mark empty files with '0B' size
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-11-05 08:28:48 +00:00
@ -27,3 +26,1 @@
var id oid.ID
err := id.DecodeString(test)
if err != nil {
cnrIDStr, _ := c.UserValue("cid").(string)
Member

Why did we change oid to cid?

Why did we change `oid` to `cid`?
Member

Oh. It seems I got it. Then we should rename h.byObjectName and h.byAddress so that it reflect reality.
And/or be able to handle urls like get/<bucket-name>/<oid>

Oh. It seems I got it. Then we should rename `h.byObjectName` and `h.byAddress` so that it reflect reality. And/or be able to handle urls like `get/<bucket-name>/<oid>`
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-11-05 08:29:25 +00:00
@ -31,6 +32,7 @@ type Config interface {
ZipCompression() bool
ClientCut() bool
IndexPageEnabled() bool
WorkerPoolSize() int
Member

This is unused

This is unused
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-11-05 08:40:19 +00:00
@ -245,0 +248,4 @@
func (s *appSettings) WorkerPoolSize() int {
s.mu.RLock()
defer s.mu.RUnlock()
return s.workerPoolSize
Member

Is this parameter reloadable? It seems not. Then we don't need this getter. And we should place the s.workerPoolSize field above s.mu field in appSettings struct

Is this parameter reloadable? It seems not. Then we don't need this getter. And we should place the `s.workerPoolSize` field above `s.mu` field in `appSettings` struct
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-11-05 08:56:23 +00:00
@ -116,0 +240,4 @@
Error error
}
func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) (chan ResponseObjectExtended, error) {
Member

Let's write

func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) (<-chan ResponseObjectExtended, error) {
Let's write ```golang func (h *Handler) headDirObjects(ctx context.Context, cnrID cid.ID, objectIDs ResObjectSearch, basePath string) (<-chan ResponseObjectExtended, error) { ```
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-11-05 08:59:59 +00:00
@ -116,0 +217,4 @@
objects := make([]ResponseObject, 0, 100)
for objExt := range resp {
if objExt.Error != nil {
log.Error(logs.FailedToHeadObject, zap.Error(objExt.Error))
Member

What do we want to do on such error?

  • Return error to client?
  • Ignore and list other?
  • Stop further heading and return client current objects?

cc @alexvanin

What do we want to do on such error? * Return error to client? * Ignore and list other? * Stop further heading and return client current objects? cc @alexvanin
dkirillov marked this conversation as resolved
nzinkevich force-pushed feature/index_page_native from 12819b6bd3 to 264e095137 2024-11-05 13:13:28 +00:00 Compare
nzinkevich dismissed alexvanin's review 2024-11-05 13:13:29 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

nzinkevich force-pushed feature/index_page_native from 264e095137 to b9e77b64c1 2024-11-05 13:18:55 +00:00 Compare
alexvanin approved these changes 2024-11-06 07:35:24 +00:00
Dismissed
nzinkevich force-pushed feature/index_page_native from b9e77b64c1 to e18e5b4fc5 2024-11-07 15:13:19 +00:00 Compare
nzinkevich dismissed alexvanin's review 2024-11-07 15:13:19 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

nzinkevich force-pushed feature/index_page_native from e18e5b4fc5 to 181568f442 2024-11-07 15:16:04 +00:00 Compare
dkirillov reviewed 2024-11-08 10:55:41 +00:00
@ -96,6 +97,7 @@ type (
clientCut bool
returnIndexPage bool
indexPageTemplate string
workerPoolSize int
Member

Please place this field right below dialerSource

Please place this field right below `dialerSource`
dkirillov marked this conversation as resolved
dkirillov reviewed 2024-11-08 11:01:12 +00:00
@ -116,0 +165,4 @@
return nil, err
}
var result = &GetObjectsResponse{
Member

Please don't use var when we initialize struct:

result:= &GetObjectReponse{
    objects:   make([]ResponseObject, 0, len(nodes)),
    hasErrors: false,
}

Additionally, we can skip setting hasErrors: false

Please don't use `var` when we initialize struct: ```golang result:= &GetObjectReponse{ objects: make([]ResponseObject, 0, len(nodes)), hasErrors: false, } ``` Additionally, we can skip setting `hasErrors: false`
dkirillov approved these changes 2024-11-08 11:03:06 +00:00
Dismissed
dkirillov left a comment
Member

Overall LGTM (see minor comments)

Overall LGTM (see minor comments)
alexvanin requested changes 2024-11-08 11:45:09 +00:00
Dismissed
@ -191,2 +193,3 @@
// byNativeAddress is a wrapper for function (e.g. request.headObject, request.receiveFile) that
// prepares request and object address to it.
func (h *Handler) byAddress(c *fasthttp.RequestCtx, f func(context.Context, request, oid.Address)) {
func (h *Handler) byNativeAddress(c *fasthttp.RequestCtx, f func(context.Context, request, oid.Address)) {
Owner

After internal demo, let's modify this code a bit.

We want to determine which browse function to use based on tree data.
If tree data is present, then use h.getDirObjectsS3,
If tree data is absent, then use h.getDirObjectsNative.

After internal demo, let's modify this code a bit. We want to determine which browse function to use based on tree data. If tree data is present, then use `h.getDirObjectsS3`, If tree data is absent, then use `h.getDirObjectsNative`.
alexvanin marked this conversation as resolved
nzinkevich added 1 commit 2024-11-14 10:26:35 +00:00
[#151] index page: Add browse via native protocol
All checks were successful
/ DCO (pull_request) Successful in 1m50s
/ Vulncheck (pull_request) Successful in 1m58s
/ Builds (pull_request) Successful in 1m24s
/ Lint (pull_request) Successful in 2m40s
/ Tests (pull_request) Successful in 1m24s
339d944bef
Signed-off-by: Nikita Zinkevich <n.zinkevich@yadro.com>
nzinkevich dismissed dkirillov's review 2024-11-14 10:26:35 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

nzinkevich force-pushed feature/index_page_native from 339d944bef to 3b26cf921c 2024-11-14 10:27:31 +00:00 Compare
nzinkevich force-pushed feature/index_page_native from 3b26cf921c to ec30e9c008 2024-11-14 10:32:06 +00:00 Compare
alexvanin approved these changes 2024-11-18 11:32:03 +00:00
Dismissed
dkirillov reviewed 2024-11-18 13:41:51 +00:00
@ -33,0 +27,4 @@
downloadQueryParam := c.QueryArgs().GetBool("download")
switch {
case isObjectID(oidURLParam):
Member

Why don't we check cid also?

Why don't we check `cid` also?
Author
Member

There may be also container identifier from NNS, and we want to resolve them too. So I check only oid, as it was before

There may be also container identifier from NNS, and we want to resolve them too. So I check only `oid`, as it was before
Member

Then if we create object 5rnoerMhK3kSxDEUh2FrSQvSW86P2deahSifFLxFBF9v into bucket my-bucket and I wanted to get it by http-gw like curl http://<http-gw>/get/my-bucket/5rnoerMhK3kSxDEUh2FrSQvSW86P2deahSifFLxFBF9v I wil got 404. Is it ok?

Then if we create object `5rnoerMhK3kSxDEUh2FrSQvSW86P2deahSifFLxFBF9v` into bucket `my-bucket` and I wanted to get it by http-gw like `curl http://<http-gw>/get/my-bucket/5rnoerMhK3kSxDEUh2FrSQvSW86P2deahSifFLxFBF9v` I wil got `404`. Is it ok?
dkirillov reviewed 2024-11-18 13:49:40 +00:00
@ -415,3 +414,3 @@
}
return objects, nil
c.SetStatusCode(fasthttp.StatusOK)
Member

Why do we need this if we have the same on line 433 ?

Why do we need this if we have the same on line 433 ?
Member

And why we set status here rather than in h.browseObjects?

And why we set status here rather than in `h.browseObjects`?
Member

By the way, why do we ever need to set success status?

By the way, why do we ever need to set success status?
dkirillov marked this conversation as resolved
nzinkevich force-pushed feature/index_page_native from ec30e9c008 to 6c21977ccb 2024-11-19 12:00:51 +00:00 Compare
nzinkevich dismissed alexvanin's review 2024-11-19 12:00:51 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

nzinkevich force-pushed feature/index_page_native from 6c21977ccb to 8ef4ac1f9b 2024-11-19 12:01:24 +00:00 Compare
alexvanin approved these changes 2024-11-19 12:45:29 +00:00
Dismissed
mbiryukova reviewed 2024-11-19 14:03:15 +00:00
@ -75,18 +75,21 @@ request_timeout: 5s
rebalance_timer: 30s
pool_error_threshold: 100
reconnect_interval: 1m
worker_pool_size: 1000
Member

It should be added to config files too I suppose

It should be added to config files too I suppose
nzinkevich force-pushed feature/index_page_native from 8ef4ac1f9b to aa0a7d86c6 2024-11-19 14:07:25 +00:00 Compare
nzinkevich dismissed alexvanin's review 2024-11-19 14:07:26 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

nzinkevich force-pushed feature/index_page_native from aa0a7d86c6 to 43764772aa 2024-11-19 14:33:30 +00:00 Compare
alexvanin approved these changes 2024-11-19 14:42:56 +00:00
alexvanin merged commit 43764772aa into master 2024-11-20 06:32:34 +00:00
alexvanin deleted branch feature/index_page_native 2024-11-20 06:32:49 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-services-developers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-http-gw#153
No description provided.