17 KiB
Placement Policy
This document describes placement policies, their purpose, syntax and semantics.
Index
Introduction
The purpose of a placement policy is to determine whichs node(s) of a frostfs system will store an object. Namely, given a netmap (a set of nodes) and a placement policy, a subset of those nodes is selected to store a given object. An important aspect is that since nodes in a netmap come and go due to distributed nature of the system, this selection must be deterministic and consistent, i.e. different nodes must come to the same conclusion as long as they share the same view of the netmap.
ℹ️ Throughout this document, we will consider each node as a dictionary of attributes and a global unique ID.
One way to think about the placement policy, is as a pipeline of operations which begins with a set of nodes (the entire netmap) and gradually refine it into only the nodes that will be in charge of storing the object. More specifically, each operation in this pipeline takes a set of nodes and transforms it into a subset of those nodes. The transformation is done purely on the basis of the node attributes.
The three main operations are:
FILTER
: filters a set of nodes based on their attributes.SELECT
: selects a specific amount of nodes from a set of nodes based on certain conditions.REP
: specifies how many nodes (and which ones) from a set of nodes are used to store an object.
In the next sections, we will explore each of them in detail.
Operations
Basic Expressions
Before exploring the operations in detail, we must get acquainted with the basic expressions that appear in a placement policy. As mentioned above, the placement policy operates solely on the basis of node attributes, and as such, basic expressions mostly revolve around node attribute comparison.
A comparison expression expresses whether a node attribute equals a specified value:
AttributeName Operation AttributeValue
For example, the following expression
City EQ 'Moscow'
asserts that the node attribute City
equals the value Moscow
.
Comparison expressions can be nested via boolean operators and parentheses. For example, the following expression:
(City EQ 'Moscow') AND (Disks GT 2)
asserts that the node attribute City
equals the value Moscow
and the node attribute Disks
must be greater than 2
. Note that the arguments can be either a string or a number.
See Appendix 1 for a complete list of supported operators.
FILTER
A FILTER
operation takes as input a set of nodes and returns a subset of those nodes. It's useful for selecting nodes that have (or lack) specific attributes. Its basic syntax is as follows:
FILTER <expr> AS <id>
For example, the following filter
FILTER Color EQ 'Red' AS RedNodes
selects those nodes for which the Color
attribute equals Red
, and discards the rest. The filter's identifier is RedNodes
, which can be used to reference it in other parts of the placement policy. For example, you could reference the above filter in another filter as follows
FILTER @RedNodes AND (City EQ 'Moscow') AS RedMoscowNodes
which would select those nodes for which the Color
attribute equals Red
and the City
attribute equals Moscow
. You can think of the @
operator as embedding the referenced filter expression verbatim where it's used. This makes it easy to compose filters. However, filters can be referenced via @
only within filter expressions. In other places you can simply use the filter identifier directly.
⚠️ Every filter requires a unique identifier. What would be the use of a filter that you cannot reference?
The following diagram illustrates the filter operation
where the nodes are represented as colored circles, with their color representing the value of their Color
attribute, respectively.
ℹ️ A filter referring to all nodes in the netmap always exists and can be referenced by
*
.
SELECT
A SELECT
operation specifies how many and which nodes from a subset previously obtained from a FILTER
will be available to build replica groups for object storage. It's not that different from a FILTER
in that it transforms a set of nodes into a subset of those, but while a FILTER
cannot control the size of the resulting subset and other characteristics, a SELECT
can.
Its basic syntax is as follows:
SELECT <count> {IN (SAME|DISTINCT) <attribute>} FROM <filter> {AS <id>}
In a nutshell, a SELECT
takes a filter result as input and outputs a specific number of nodes, optionally enforcing that all output nodes must either share or differ in a specific attribute. Note that only the output node count and the source filter are required.
Let's see some examples
-- Selects exactly one node from the entire netmap
SELECT 1 FROM *
-- Same as above, but with an identifier for the selection
SELECT 1 FROM * AS ONE
-- Selects two nodes from the RedOrBlueNodes filter, such that both selected nodes
-- share the same value for the Color attribute, i.e. both red or both blue.
SELECT 2 IN SAME Color FROM RedOrBlueNodes
-- Selects two nodes from the RedOrBlueNodes filter, such that the selected nodes
-- have distinct values for the Color attribute, i.e. one red and one blue.
-- The selection is also given an identifier.
SELECT 2 IN DISTINCT Color FROM RedOrBlueNodes AS MyNodes
The last example is illustrated in the following diagram:
ℹ️ At this point, notice that while
FILTER
's output is always unique (namely, every node in the input is either filtered in or out), that is not always the case forSELECT
. In the last example above, there is more than one way to select two nodes with distinctColor
attribute. Because we require the output to be deterministic and consistent (so that all nodes agree on which nodes to store a given object without having to commmunicate with each other), we need a way to reach this consensus efficiently. Internally, the policy engine uses Rendezvouz Hashing to ensure this. If you want more control over what nodes are actually selected, you can always use narrower filters/selections to ensure this.
REP
A REP
operation specifies how many copies of an object need to be stored (REP
stands for "replica"). A placement policy can contain multiple replica operations, with each of them representing a replica group, i.e. a group of objects associated with the same replica. Following our analogy with a pipeline, REP
operations are the sink or output nodes.
Its basic syntax is as follows:
REP <count> {IN <select>}
If a select is not specified, then the entire netmap is used as input. The only exception to this rule is when exactly 1 replica and 1 selector are being present: in this case the only selector is being used instead of the whole netmap. The resulting nodes will be used to actually store objects and they constitute a replica group (or simply, "a replica").
Examples
-- A replica consisting of a single copy, stored in an arbitrary node of the netmap.
REP 1
-- A replica consisting of three copies, each stored in a different node from the selection
-- identified as 'MyNodes'.
REP 3 IN MyNodes
The following diagram illustrates the REP
operation:
⚠️ Notice that although we use
REP 1
in the examples, in real life scenarios you almost always want to have more than a single node in each replica for redundancy.
Policies
In order to specify a complete placement policy, we just need to assemble it from the operations described above.
Its basic (simplified) syntax is as follows:
<rep>+ <select>* <filter>*
We begin by stating all our REP
operations, followed by all the SELECT
operations and finally all the FILTER
operations. Note that this is the reverse order in which they are applied. Also note that at least one REP
operation is required.
Here's a complete example:
REP 1 IN MyNodes
SELECT 2 IN DISTINCT Color FROM RedOrBlueNodes AS MyNodes
FILTER Color EQ 'Red' AS RedNodes
FILTER Color EQ 'Blue' AS BlueNodes
FILTER @RedNodes OR @BlueNodes AS RedOrBlueNodes
In additional to this basic syntax, there are a couple of additional useful options to specify which nodes and how many nodes are actually selected to store objects. We explore these in the next sections.
The policy playground
ℹ️ This section assumes you have an up-to-date version of the
frostfs-cli
.
While simple placement policies have predictable results that can be understood at a glance, more complex ones need careful consideration before deployment. In order to simplify understanding a policy's outcome and experimenting while learning, a builtin tool is provided as part of the frostfs-cli
for this purpose: the policy playground.
For the remainder of this guide, we will use the policy playground to setup a virtual netmap (that is, one that doesn't require any networking or deployment) and test various policies. In order to visualize this netmap easily, each node will have three attributes: a character, a shape and a color
We can start the policy playground as follows:
$ frostfs-cli container policy-playground
>
Since we didn't pass any endpoint, the initial netmap is empty, which we can verify with the ls
command (to list the nodes in the netmap):
> ls
>
Nows let's add virtual nodes to represent our test netmap in the figure above
> add 01 Char:A Shape:Circle Color:Blue
> add 02 Char:B Shape:Circle Color:Green
> add 03 Char:C Shape:Circle Color:Red
> add 04 Char:D Shape:Square Color:Blue
> add 05 Char:E Shape:Square Color:Green
> add 06 Char:F Shape:Square Color:Red
> add 07 Char:G Shape:Diamond Color:Blue
> add 08 Char:H Shape:Diamond Color:Green
> add 09 Char:I Shape:Diamond Color:Red
and verify that the netmap now contains what we expect
> ls
1: id=06 attrs={Char:F Shape:Square Color:Red}
2: id=08 attrs={Char:H Shape:Diamond Color:Green}
3: id=01 attrs={Char:A Shape:Circle Color:Blue}
4: id=04 attrs={Char:D Shape:Square Color:Blue}
5: id=05 attrs={Char:E Shape:Square Color:Green}
6: id=09 attrs={Char:I Shape:Diamond Color:Red}
7: id=02 attrs={Char:B Shape:Circle Color:Green}
8: id=03 attrs={Char:C Shape:Circle Color:Red}
9: id=07 attrs={Char:G Shape:Diamond Color:Blue}
With our sample netmap setup, we can now continue.
CBF
Consider the following policy:
REP 1
It builds a replica consisting of one copy, selected from the entire netmap. If we evaluate this policy in our sample netmap, we obtain a result which is probably unexpected:
> eval REP 1
1: [06 05 02]
The eval
commands evaluates a policy and lists in a separate line the nodes selected for each REP
operation, in the order they appear in the policy. We were expecting a single node, but we got three instead. The reason is that there's a policy-wide parameter called container backup factor (or CBF). This parameter is a multiplier which controls the maximum number of storage nodes: for example, if a policy requires 10 nodes and the CBF is 3, it means that the policy can store an object in up to 10 × 3 nodes. However, if there are not enough nodes and fewer (but at least 10) are used, this is not considered an error.
The default value for CBF is 3
, which explains our result above, given than every node in the netmap agrees with the policy. The CBF can be explicitly set in the policy right after the REP
operations. For example
> eval REP 1 CBF 1
1: [06]
results in what we expected in the first example. On the other hand
> eval REP 1 IN MyNodes SELECT 1 IN SAME Char FROM * AS MyNodes
1: [01]
results in a single node despite the default CBF, because there are not enough nodes compatible with the selection.
UNIQUE
Consider the following policy:
REP 1
REP 1
CBF 2
If we evaluate it
> eval REP 1 REP 1 CBF 2
1: [06 05]
2: [06 05]
we find that each replica gets two nodes, in accordance with the CBF. However, these nodes are exactly the same and this might not be desirable. In order to force the policy engine to select different nodes for each replica, we can use the UNIQUE
option, which is specified right before the REP
operations.
In our example, if we change it to
UNIQUE
REP 1
REP 1
CBF 2
and evaluate it
> eval UNIQUE REP 1 REP 1 CBF 2
1: [06 05]
2: [02 03]
we now find that the nodes selected for each replica are now distinct from each other.
More examples
This section presents some more examples of placement policies and their result when applied to the sample netmap. Try to figure out the result before looking at it or evaluating the policy.
Example #1
REP 1 IN TwoRedNodes
SELECT 2 FROM RedNodes AS TwoRedNodes
FILTER Color EQ 'Red' AS RedNodes
Result
> eval REP 1 IN TwoRedNodes SELECT 2 FROM RedNodes AS TwoRedNodes FILTER Color EQ 'Red' AS RedNodes
1: [06 09 03]
Example #2
REP 1 IN TwoRedNodes
REP 1 IN TwoRedNodes
SELECT 2 FROM RedNodes AS TwoRedNodes
FILTER Color EQ 'Red' AS RedNodes
Result
> eval REP 1 REP 1 IN TwoRedNodes SELECT 2 FROM RedNodes AS TwoRedNodes FILTER Color EQ 'Red' AS RedNodes
1: [06 09 03]
2: [06 09 03]
Example #3
REP 2 IN MyNodes
REP 2 IN MyNodes
SELECT 2 FROM RedOrBlueNodes AS MyNodes
FILTER Color EQ 'Red' AS RedNodes
FILTER Color EQ 'Blue' AS BlueNodes
FILTER @RedNodes OR @BlueNodes AS RedOrBlueNodes
Result
> eval REP 2 IN MyNodes REP 2 IN MyNodes SELECT 2 FROM RedOrBlueNodes AS MyNodes FILTER Color EQ 'Red' AS RedNodes FILTER Color EQ 'Blue' AS BlueNodes FILTER @RedNodes OR @BlueNodes AS RedOrBlueNodes
1: [06 01 04 03 09 07]
2: [06 01 04 03 09 07]
Example #4
REP 2 IN MyRedNodes
REP 2 IN MyBlueNodes
CBF 1
SELECT 2 FROM RedNodes AS MyRedNodes
SELECT 2 FROM BlueNodes AS MyBlueNodes
FILTER Color EQ 'Red' AS RedNodes
FILTER Color EQ 'Blue' AS BlueNodes
Result
> eval REP 2 IN MyRedNodes REP 2 IN MyBlueNodes CBF 1 SELECT 2 FROM RedNodes AS MyRedNodes SELECT 2 FROM BlueNodes AS MyBlueNodes FILTER Color EQ 'Red' AS RedNodes FILTER Color EQ 'Blue' AS BlueNodes
1: [06 03]
2: [01 04]
Example #5
UNIQUE
REP 1 IN MyGreenNodes
REP 1 IN MyGreenNodes
REP 1 IN MyGreenNodes
CBF 1
SELECT 1 FROM GreenNodes AS MyGreenNodes
FILTER Color EQ 'Green' AS GreenNodes
Result
> eval UNIQUE REP 1 IN MyGreenNodes REP 1 IN MyGreenNodes REP 1 IN MyGreenNodes CBF 1 SELECT 1 FROM GreenNodes AS MyGreenNodes FILTER Color EQ 'Green' AS GreenNodes
1: [05]
2: [02]
3: [08]
Example #6
REP 1 IN MyNodes
REP 2
CBF 2
SELECT 1 FROM CuteNodes AS MyNodes
FILTER (Color EQ 'Blue') AND NOT (Shape EQ 'Circle' OR Shape EQ 'Square') AS CuteNodes
Result
eval REP 1 IN MyNodes REP 2 CBF 2 SELECT 1 FROM CuteNodes AS MyNodes FILTER (Color EQ 'Blue') AND NOT (Shape EQ 'Circle' OR Shape EQ 'Square') AS CuteNodes
1: [07]
2: [06 05 02 03]
Appendix 1: Operators
Comparison operators (all binary):
EQ
: equalsNE
: not equalGE
: greater or equalGT
: greater thanLE
: less or equalLT
: less than
Pattern operator:
LIKE
: specifies pattern for an attribute. Uses as a wildcard symbol*
.... ATTR LIKE "VAL"
- the behaviour is equal toEQ
... ATTR LIKE "VAL*"
- matches all which starts withVAL
... ATTR LIKE "*VAL"
- matches all which ends withVAL
... ATTR LIKE "*VAL*"
- matches all which containsVAL
Logical operators:
NOT
: negation (unary)AND
: conjunction (binary)OR
: disjunction (binary)
Others:
@
: filter reference(
: left parenthesis)
: right parenthesis
Appendix 2: Policy playground commands
ls
: list nodes in the current netmap and their attributesadd
: add a node to the current netmap. If it already exists, it will be overwritten.remove
: remove a node from the current netmap.eval
: evaluate a placement policy on the current netmap.