frostfs-sdk-go/doc/policy.md

445 lines
16 KiB
Markdown
Raw Permalink Normal View History

# Placement Policy
This document describes placement policies, their purpose, syntax and semantics.
## Index
- [Introduction](#introduction)
- [Operations](#operations)
- [Basic Expressions](#basic-expressions)
- [`FILTER`](#filter)
- [`SELECT`](#select)
- [`REP`](#rep)
- [Policies](#policies)
- [The policy playground](#the-policy-playground)
- [`CBF`](#cbf)
- [`UNIQUE`](#unique)
- [More examples](#more-examples)
- [Appendix 1: Operators](#appendix-1-operators)
- [Appendix 2: Policy playground commands](#appendix-2-policy-playground-commands)
## Introduction
The purpose of a **placement policy** is to determine whichs node(s) of a frostfs system will store an object. Namely, given a **netmap** (a set of nodes) and a placement policy, a subset of those nodes is selected to store a given object. An important aspect is that since nodes in a netmap come and go due to distributed nature of the system, this selection must be deterministic and consistent, i.e. different nodes must come to the same conclusion as long as they share the same view of the netmap.
> ℹ️ Throughout this document, we will consider each node as a dictionary of attributes and a global unique ID.
One way to think about the placement policy, is as a pipeline of operations which begins with a set of nodes (the entire netmap) and gradually refine it into only the nodes that will be in charge of storing the object. More specifically, each operation in this pipeline takes a set of nodes and transforms it into a subset of those nodes. The transformation is done purely on the basis of the node attributes.
![Placement policy as a pipeline](./image/placement_policy.svg)
The three main operations are:
1. `FILTER`: filters a set of nodes based on their attributes.
2. `SELECT`: selects a specific amount of nodes from a set of nodes based on certain conditions.
3. `REP`: specifies how many nodes (and which ones) from a set of nodes are used to store an object.
In the next sections, we will explore each of them in detail.
## Operations
### Basic Expressions
Before exploring the operations in detail, we must get acquainted with the basic expressions that appear in a placement policy. As mentioned above, the placement policy operates solely on the basis of node attributes, and as such, basic expressions mostly revolve around node attribute comparison.
A comparison expression expresses whether a node attribute equals a specified value:
```
AttributeName Operation AttributeValue
```
For example, the following expression
```sql
City EQ 'Moscow'
```
asserts that the node attribute `City` equals the value `Moscow`.
Comparison expressions can be nested via boolean operators and parentheses. For example, the following expression:
```sql
(City EQ 'Moscow') AND (Disks GT 2)
```
asserts that the node attribute `City` equals the value `Moscow` and the node attribute `Disks` must be greater than `2`. Note that the arguments can be either a string or a number.
See [Appendix 1](#appendix-1-operators) for a complete list of supported operators.
### `FILTER`
A `FILTER` operation takes as input a set of nodes and returns a subset of those nodes. It's useful for selecting nodes that have (or lack) specific attributes. Its basic syntax is as follows:
```bnf
FILTER <expr> AS <id>
```
For example, the following filter
```sql
FILTER Color EQ 'Red' AS RedNodes
```
selects those nodes for which the `Color` attribute equals `Red`, and discards the rest. The filter's identifier is `RedNodes`, which can be used to reference it in other parts of the placement policy. For example, you could reference the above filter in another filter as follows
```sql
FILTER @RedNodes AND (City EQ 'Moscow') AS RedMoscowNodes
```
which would select those nodes for which the `Color` attribute equals `Red` and the `City` attribute equals `Moscow`. You can think of the `@` operator as embedding the referenced filter expression verbatim where it's used. This makes it easy to compose filters. However, filters can be referenced via `@` only within filter expressions. In other places you can simply use the filter identifier directly.
> ⚠️ Every filter requires a unique identifier. What would be the use of a filter that you cannot reference?
The following diagram illustrates the filter operation
![FILTER](./image/filter_illustration.svg)
where the nodes are represented as colored circles, with their color representing the value of their `Color` attribute, respectively.
> ℹ️ A filter referring to all nodes in the netmap always exists and can be referenced by `*`.
### `SELECT`
A `SELECT` operation specifies how many and which nodes from a subset previously obtained from a `FILTER` will be available to build replica groups for object storage. It's not that different from a `FILTER` in that it transforms a set of nodes into a subset of those, but while a `FILTER` cannot control the size of the resulting subset and other characteristics, a `SELECT` can.
Its basic syntax is as follows:
```bnf
SELECT <count> {IN (SAME|DISTINCT) <attribute>} FROM <filter> {AS <id>}
```
In a nutshell, a `SELECT` takes a filter result as input and outputs a specific number of nodes, optionally enforcing that all output nodes must either share or differ in a specific attribute. Note that only the output node count and the source filter are required.
Let's see some examples
```sql
-- Selects exactly one node from the entire netmap
SELECT 1 FROM *
-- Same as above, but with an identifier for the selection
SELECT 1 FROM * AS ONE
-- Selects two nodes from the RedOrBlueNodes filter, such that both selected nodes
-- share the same value for the Color attribute, i.e. both red or both blue.
SELECT 2 IN SAME Color FROM RedOrBlueNodes
-- Selects two nodes from the RedOrBlueNodes filter, such that the selected nodes
-- have distinct values for the Color attribute, i.e. one red and one blue.
-- The selection is also given an identifier.
SELECT 2 IN DISTINCT Color FROM RedOrBlueNodes AS MyNodes
```
The last example is illustrated in the following diagram:
![SELECT](./image/select_illustration.svg)
> ℹ️ At this point, notice that while `FILTER`'s output is always unique (namely, every node in the input is either filtered in or out), that is not always the case for `SELECT`. In the last example above, there is more than one way to select two nodes with distinct `Color` attribute. Because we require the output to be deterministic and consistent (so that all nodes agree on which nodes to store a given object without having to commmunicate with each other), we need a way to reach this consensus efficiently. Internally, the policy engine uses [Rendezvouz Hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing) to ensure this. If you want more control over what nodes are actually selected, you can always use narrower filters/selections to ensure this.
### `REP`
A `REP` operation specifies how many copies of an object need to be stored (`REP` stands for "replica"). A placement policy can contain multiple replica operations, with each of them representing a replica group, i.e. a group of objects associated with the same replica. Following our analogy with a pipeline, `REP` operations are the sink or output nodes.
Its basic syntax is as follows:
```bnf
REP <count> {IN <select>}
```
If a select is not specified, then the entire netmap is used as input. The resulting nodes will be used to actually store objects and they constitute a replica group (or simply, "a replica").
Examples
```sql
-- A replica consisting of a single copy, stored in an arbitrary node of the netmap.
REP 1
-- A replica consisting of three copies, each stored in a different node from the selection
-- identified as 'MyNodes'.
REP 3 IN MyNodes
```
The following diagram illustrates the `REP` operation:
![REP](./image/rep_illustration.svg)
> ⚠️ Notice that although we use `REP 1` in the examples, in real life scenarios you almost always want to have more than a single node in each replica for redundancy.
## Policies
In order to specify a complete placement policy, we just need to assemble it from the operations described above.
Its basic (simplified) syntax is as follows:
```bnf
<rep>+ <select>* <filter>*
```
We begin by stating all our `REP` operations, followed by all the `SELECT` operations and finally all the `FILTER` operations. Note that this is the reverse order in which they are applied. Also note that at least one `REP` operation is required.
Here's a complete example:
```sql
REP 1 IN MyNodes
SELECT 2 IN DISTINCT Color FROM RedOrBlueNodes AS MyNodes
FILTER Color EQ 'Red' AS RedNodes
FILTER Color EQ 'Blue' AS BlueNodes
FILTER @RedNodes OR @BlueNodes AS RedOrBlueNodes
```
In additional to this basic syntax, there are a couple of additional useful options to specify which nodes and how many nodes are actually selected to store objects. We explore these in the next sections.
### The policy playground
> ℹ️ This section assumes you have an up-to-date version of the `frostfs-cli`.
While simple placement policies have predictable results that can be understood at a glance, more complex ones need careful consideration before deployment. In order to simplify understanding a policy's outcome and experimenting while learning, a builtin tool is provided as part of the `frostfs-cli` for this purpose: the policy playground.
For the remainder of this guide, we will use the policy playground to setup a virtual netmap (that is, one that doesn't require any networking or deployment) and test various policies. In order to visualize this netmap easily, each node will have three attributes: a character, a shape and a color
![Sample Netmap](./image/sample_netmap.svg)
We can start the policy playground as follows:
```sh
$ frostfs-cli container policy-playground
>
```
Since we didn't pass any endpoint, the initial netmap is empty, which we can verify with the `ls` command (to list the nodes in the netmap):
```sh
> ls
>
```
Nows let's add virtual nodes to represent our test netmap in the figure above
```sh
> add 01 Char:A Shape:Circle Color:Blue
> add 02 Char:B Shape:Circle Color:Green
> add 03 Char:C Shape:Circle Color:Red
> add 04 Char:D Shape:Square Color:Blue
> add 05 Char:E Shape:Square Color:Green
> add 06 Char:F Shape:Square Color:Red
> add 07 Char:G Shape:Diamond Color:Blue
> add 08 Char:H Shape:Diamond Color:Green
> add 09 Char:I Shape:Diamond Color:Red
```
and verify that the netmap now contains what we expect
```sh
> ls
1: id=06 attrs={Char:F Shape:Square Color:Red}
2: id=08 attrs={Char:H Shape:Diamond Color:Green}
3: id=01 attrs={Char:A Shape:Circle Color:Blue}
4: id=04 attrs={Char:D Shape:Square Color:Blue}
5: id=05 attrs={Char:E Shape:Square Color:Green}
6: id=09 attrs={Char:I Shape:Diamond Color:Red}
7: id=02 attrs={Char:B Shape:Circle Color:Green}
8: id=03 attrs={Char:C Shape:Circle Color:Red}
9: id=07 attrs={Char:G Shape:Diamond Color:Blue}
```
With our sample netmap setup, we can now continue.
### `CBF`
Consider the following policy:
```sql
REP 1
```
It builds a replica consisting of one copy, selected from the entire netmap. If we evaluate this policy in our sample netmap, we obtain a result which is probably unexpected:
```sh
> eval REP 1
1: [06 05 02]
```
The `eval` commands evaluates a policy and lists in a separate line the nodes selected for each `REP` operation, in the order they appear in the policy. We were expecting a single node, but we got three instead. The reason is that there's a policy-wide parameter called **container backup factor** (or CBF). This parameter is a multiplier which controls the maximum number of storage nodes: for example, if a policy requires 10 nodes and the CBF is 3, it means that the policy can store an object in up to 10 × 3 nodes. However, if there are not enough nodes and fewer (but at least 10) are used, this is not considered an error.
The default value for CBF is `3`, which explains our result above, given than every node in the netmap agrees with the policy. The CBF can be explicitly set in the policy right after the `REP` operations. For example
```sh
> eval REP 1 CBF 1
1: [06]
```
results in what we expected in the first example. On the other hand
```sh
> eval REP 1 IN MyNodes SELECT 1 IN SAME Char FROM * AS MyNodes
1: [01]
```
results in a single node despite the default CBF, because there are not enough nodes compatible with the selection.
### `UNIQUE`
Consider the following policy:
```sql
REP 1
REP 1
CBF 2
```
If we evaluate it
```sh
> eval REP 1 REP 1 CBF 2
1: [06 05]
2: [06 05]
```
we find that each replica gets two nodes, in accordance with the CBF. However, these nodes are exactly the same and this might not be desirable. In order to force the policy engine to select different nodes for each replica, we can use the `UNIQUE` option, which is specified right before the `REP` operations.
In our example, if we change it to
```sql
UNIQUE
REP 1
REP 1
CBF 2
```
and evaluate it
```sh
> eval UNIQUE REP 1 REP 1 CBF 2
1: [06 05]
2: [02 03]
```
we now find that the nodes selected for each replica are now distinct from each other.
### More examples
This section presents some more examples of placement policies and their result when applied to the sample netmap. Try to figure out the result before looking at it or evaluating the policy.
#### Example #1
```sql
REP 1 IN TwoRedNodes
SELECT 2 FROM RedNodes AS TwoRedNodes
FILTER Color EQ 'Red' AS RedNodes
```
<details>
<summary>Result</summary>
```sh
> eval REP 1 IN TwoRedNodes SELECT 2 FROM RedNodes AS TwoRedNodes FILTER Color EQ 'Red' AS RedNodes
1: [06 09 03]
```
</details>
#### Example #2
```sql
REP 1 IN TwoRedNodes
REP 1 IN TwoRedNodes
SELECT 2 FROM RedNodes AS TwoRedNodes
FILTER Color EQ 'Red' AS RedNodes
```
<details>
<summary>Result</summary>
```sh
> eval REP 1 REP 1 IN TwoRedNodes SELECT 2 FROM RedNodes AS TwoRedNodes FILTER Color EQ 'Red' AS RedNodes
1: [06 09 03]
2: [06 09 03]
```
</details>
#### Example #3
```sql
REP 2 IN MyNodes
REP 2 IN MyNodes
SELECT 2 FROM RedOrBlueNodes AS MyNodes
FILTER Color EQ 'Red' AS RedNodes
FILTER Color EQ 'Blue' AS BlueNodes
FILTER @RedNodes OR @BlueNodes AS RedOrBlueNodes
```
<details>
<summary>Result</summary>
```sh
> eval REP 2 IN MyNodes REP 2 IN MyNodes SELECT 2 FROM RedOrBlueNodes AS MyNodes FILTER Color EQ 'Red' AS RedNodes FILTER Color EQ 'Blue' AS BlueNodes FILTER @RedNodes OR @BlueNodes AS RedOrBlueNodes
1: [06 01 04 03 09 07]
2: [06 01 04 03 09 07]
```
</details>
#### Example #4
```sql
REP 2 IN MyRedNodes
REP 2 IN MyBlueNodes
CBF 1
SELECT 2 FROM RedNodes AS MyRedNodes
SELECT 2 FROM BlueNodes AS MyBlueNodes
FILTER Color EQ 'Red' AS RedNodes
FILTER Color EQ 'Blue' AS BlueNodes
```
<details>
<summary>Result</summary>
```sh
> eval REP 2 IN MyRedNodes REP 2 IN MyBlueNodes CBF 1 SELECT 2 FROM RedNodes AS MyRedNodes SELECT 2 FROM BlueNodes AS MyBlueNodes FILTER Color EQ 'Red' AS RedNodes FILTER Color EQ 'Blue' AS BlueNodes
1: [06 03]
2: [01 04]
```
</details>
#### Example #5
```sql
UNIQUE
REP 1 IN MyGreenNodes
REP 1 IN MyGreenNodes
REP 1 IN MyGreenNodes
CBF 1
SELECT 1 FROM GreenNodes AS MyGreenNodes
FILTER Color EQ 'Green' AS GreenNodes
```
<details>
<summary>Result</summary>
```sh
> eval UNIQUE REP 1 IN MyGreenNodes REP 1 IN MyGreenNodes REP 1 IN MyGreenNodes CBF 1 SELECT 1 FROM GreenNodes AS MyGreenNodes FILTER Color EQ 'Green' AS GreenNodes
1: [05]
2: [02]
3: [08]
```
</details>
#### Example #6
```sql
REP 1 IN MyNodes
REP 2
CBF 2
SELECT 1 FROM CuteNodes AS MyNodes
FILTER (Color EQ 'Blue') AND NOT (Shape EQ 'Circle' OR Shape EQ 'Square') AS CuteNodes
```
<details>
<summary>Result</summary>
```sh
eval REP 1 IN MyNodes REP 2 CBF 2 SELECT 1 FROM CuteNodes AS MyNodes FILTER (Color EQ 'Blue') AND NOT (Shape EQ 'Circle' OR Shape EQ 'Square') AS CuteNodes
1: [07]
2: [06 05 02 03]
```
</details>
## Appendix 1: Operators
Comparison operators (all binary):
- `EQ`: equals
- `NE`: not equal
- `GE`: greater or equal
- `GT`: greater than
- `LE`: less or equal
- `LT`: less than
Logical operators:
- `NOT`: negation (unary)
- `AND`: conjunction (binary)
- `OR`: disjunction (binary)
Others:
- `@`: filter reference
- `(`: left parenthesis
- `)`: right parenthesis
## Appendix 2: Policy playground commands
- `ls`: list nodes in the current netmap and their attributes
- `add`: add a node to the current netmap. If it already exists, it will be overwritten.
- `remove`: remove a node from the current netmap.
- `eval`: evaluate a placement policy on the current netmap.