Implement S3 lifecycle job fetcher #3

Closed
opened 2024-06-27 10:53:44 +00:00 by alexvanin · 1 comment
Owner

Overview

Implement package that returns list of containers that has to be processed in current epoch. To do that, package must:

  1. Find all available users from all available namespaces in FrostFS ID contract
  2. Pick subset of users to process in this epoch
  3. Find all available containers for these users
  4. Iterate over each container to find tree nodes in FrostFS tree service with lifecycle data

Pick subset of users

Lifecycle should support sharding. With large storage network, lyfecycler wouldn't be able to process all events during single or multiple epoch.

Lifecycler is going to use HRW to split all tasks across all lyfecycler instances without communication between them. To do so, lifecycler must get from config set of public keys of all other instances.

func PickUsers(instancePrivateKey, []allPublicKeys, []users) -> []users

Find tree nodes in FrostFS tree service

Structure

Lifecycler should define public constants of tree node name, public tree node structure of lifecycle data and data converters.

Lifecycle data structure must be expandable in terms of lifecycle events and its settings.

For the initial implementation, lifecycle structure should support these rule fields:

Access control

To access tree service, lifecycler requires policies to do that. To do these requests, lifecycler must obtain private key of the user (container owner) and create bearer token with overriden policies, allowing specific operations (get, delete).

Credential source must be hidden behind interface

interface Source {
    Credentials(publicKey) -> (privateKey, error)
}

In this implementation, credential source must be a set of wallet files. This also could be a separate service that manages secrets, etc.

# Overview Implement package that returns list of containers that has to be processed in current epoch. To do that, package must: 1) Find all available users from all available namespaces in FrostFS ID contract 2) Pick subset of users to process in this epoch 3) Find all available containers for these users 4) Iterate over each container to find tree nodes in FrostFS tree service with lifecycle data # Pick subset of users Lifecycle should support sharding. With large storage network, lyfecycler wouldn't be able to process all events during single or multiple epoch. Lifecycler is going to use HRW to split all tasks across all lyfecycler instances without communication between them. To do so, lifecycler must get from config set of public keys of all other instances. ``` func PickUsers(instancePrivateKey, []allPublicKeys, []users) -> []users ``` # Find tree nodes in FrostFS tree service ## Structure Lifecycler should define public constants of tree node name, public tree node structure of lifecycle data and data converters. Lifecycle data structure must be expandable in terms of lifecycle events and its settings. For the initial implementation, lifecycle structure should support these rule [fields](https://docs.aws.amazon.com/AmazonS3/latest/API/API_LifecycleRule.html#AmazonS3-Type-LifecycleRule-AbortIncompleteMultipartUpload): * Status * [AbortIncompleteMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_AbortIncompleteMultipartUpload.html) * [Expiration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_LifecycleExpiration.html) * [Filter](https://docs.aws.amazon.com/AmazonS3/latest/API/API_LifecycleRuleFilter.html) * ID * [NoncurrentVersionExpiration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_NoncurrentVersionExpiration.html) ## Access control To access tree service, lifecycler requires policies to do that. To do these requests, lifecycler must obtain private key of the user (container owner) and create bearer token with overriden policies, allowing specific operations (get, delete). Credential source must be hidden behind interface ``` interface Source { Credentials(publicKey) -> (privateKey, error) } ``` In this implementation, **credential source must be a set of wallet files**. This also could be a separate service that manages secrets, etc.
dkirillov self-assigned this 2024-06-28 13:36:27 +00:00
Member

Some clarifications:

  • HRW - it's more about approach (not particular algorithm/dependency)
  • Lifecycle data structure - just representation of Lifecycle configuration payload from spec. Can be used type from s3-gw
  • Access Control Interface - just inner interface so anyone can fork this repo and without pain add own implementation. Method returns private key as more generic approach. But despite this we must do requests to FrostfFS Storage using bearer token because we can use some allowed rules inside it (that have higher priority over container policies)
Some clarifications: * HRW - it's more about approach (not particular algorithm/dependency) * Lifecycle data structure - just representation of Lifecycle configuration payload from [spec](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html#API_PutBucketLifecycleConfiguration_RequestSyntax). Can be used [type from s3-gw](https://git.frostfs.info/mbiryukova/frostfs-s3-gw/src/commit/d54b26dbfdbae4b06b536a7264d4623b7656c2c7/api/data/lifecycle.go#L11) * Access Control Interface - just inner interface so anyone can fork this repo and without pain add own implementation. Method returns private key as more generic approach. But despite this we must do requests to FrostfFS Storage using bearer token because we can use some allowed rules inside it (that have higher priority over container policies)
alexvanin referenced this issue from a commit 2024-07-22 12:40:57 +00:00
Sign in to join this conversation.
No description provided.