[#421] Create common interface for kv repositories #454

Closed
ale64bit wants to merge 1 commits from ale64bit/frostfs-node:feature/kvio into master
Collaborator

Add a common interface for key-value repositories.
This makes it easier to keep implementation-specific details from leaking into our codebase and swap underlying implementations with ease for the various components.
In this particular PR, writecache is rewritten using the interfaces and a small example is provided on how to use e.g. a badger-backed repository instead.

Signed-off-by: Alejandro Lopez a.lopez@yadro.com

Add a common interface for key-value repositories. This makes it easier to keep implementation-specific details from leaking into our codebase and swap underlying implementations with ease for the various components. In this particular PR, `writecache` is rewritten using the interfaces and a small example is provided on how to use e.g. a badger-backed repository instead. Signed-off-by: Alejandro Lopez <a.lopez@yadro.com>
ale64bit added the
discussion
label 2023-06-20 09:48:13 +00:00
ale64bit force-pushed feature/kvio from 9eae170c80 to 00c8712dc6 2023-06-20 09:55:21 +00:00 Compare
ale64bit requested review from storage-core-committers 2023-06-20 09:55:55 +00:00
ale64bit requested review from storage-core-developers 2023-06-20 09:56:02 +00:00
fyrchik reviewed 2023-06-20 10:25:02 +00:00
fyrchik left a comment
Owner

I am not sure I see the usefulness of introducing new interface:

  1. The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple.
  2. In some places the interface does not reflect the DB structure: for example for bolt Write uses Batch, but ReadWrite uses Update -- the distinction between Write/ReadWrite seems unrelated to the Batch/Update distinction. We would like to control such things on the client, the interface makes this harder.
  3. This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value.
  4. We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead).

So what about implementing some interface for just the write-cache?
I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions.

cc @TrueCloudLab/storage-core-developers @realloc

I am not sure I see the usefulness of introducing new interface: 1. The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple. 2. In some places the interface does not reflect the DB structure: for example for bolt `Write` uses `Batch`, but `ReadWrite` uses `Update` -- the distinction between `Write`/`ReadWrite` seems unrelated to the `Batch`/`Update` distinction. We would like to control such things on the client, the interface makes this harder. 3. This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value. 4. We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead). So what about implementing some interface for just the write-cache? I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions. cc @TrueCloudLab/storage-core-developers @realloc
@ -28,2 +30,2 @@
func openWC(cmd *cobra.Command) *bbolt.DB {
db, err := writecache.OpenDB(vPath, true, os.OpenFile)
func openWC(cmd *cobra.Command) kvio.Repository {
db, err := bboltrepo.Open(vPath, &bboltrepo.Options{

That is an implementation detail, at some point it can stop being bolt, for example.

That is an implementation detail, at some point it can stop being `bolt`, for example.
Poster
Collaborator

I'm aware. I just replaced the original call, which was already using an implementation detail (i.e. depending literally on bbolt).

I'm aware. I just replaced the original call, which was already using an implementation detail (i.e. depending literally on bbolt).
Poster
Collaborator

I am not sure I see the usefulness of introducing new interface:

  1. The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple.

This is intentional. Nothing wrong with thin interfaces, though? I would argue those are preferred in Go vs fat ones (such as our own common.Storage which is more of a virtual class than an interface).

  1. In some places the interface does not reflect the DB structure: for example for bolt Write uses Batch, but ReadWrite uses Update -- the distinction between Write/ReadWrite seems unrelated to the Batch/Update distinction. We would like to control such things on the client, the interface makes this harder.

This is also intentional, but can be refined. The distinction between Batch/Update in bbolt is only meaningful for concurrent calls. While we can make the interface reflect this, I don't think we depend anywhere on actually blocking and waiting for flush with Update.
At any rate, I don't think this is an issue with the interface itself.

  1. This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value.

How is the interface preventing this? It provides the means to do the same type of transactions as both repository implementations (including copy-less value reading and batched reads/writes)

  1. We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead).

I don't see how is this related to having an interface or not.

So what about implementing some interface for just the write-cache?
I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions.

cc @TrueCloudLab/storage-core-developers @realloc

From my viewpoint, an interface that is specific to write-cache serves no purpose or is not as useful as having a layer isolating this common piece of utility (a key-value repository). We use these all over our codebase for different purposes (write-cache, metabase, pilorama, blobovnicza, testing registries, etc.). Currently we are just spamming it with bbolt-specifics when we could use an interface to make it easy to switch or compose specific components easily. The argument could be equally applied to every thin interface in the standard io package. A key-value repository is pretty much the basic building block of our products.

I would love to hear other's opinions as well.

> I am not sure I see the usefulness of introducing new interface: > 1. The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple. This is intentional. Nothing wrong with thin interfaces, though? I would argue those are preferred in Go vs fat ones (such as our own `common.Storage` which is more of a virtual class than an interface). > 2. In some places the interface does not reflect the DB structure: for example for bolt `Write` uses `Batch`, but `ReadWrite` uses `Update` -- the distinction between `Write`/`ReadWrite` seems unrelated to the `Batch`/`Update` distinction. We would like to control such things on the client, the interface makes this harder. This is also intentional, but can be refined. The distinction between `Batch`/`Update` in bbolt is only meaningful for concurrent calls. While we can make the interface reflect this, I don't think we depend anywhere on actually blocking and waiting for flush with `Update`. At any rate, I don't think this is an issue with the interface itself. > 3. This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value. How is the interface preventing this? It provides the means to do the same type of transactions as both repository implementations (including copy-less value reading and batched reads/writes) > 4. We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead). I don't see how is this related to having an interface or not. > > So what about implementing some interface for just the write-cache? > I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions. > > cc @TrueCloudLab/storage-core-developers @realloc From my viewpoint, an interface that is specific to write-cache serves no purpose or is not as useful as having a layer isolating this common piece of utility (a key-value repository). We use these all over our codebase for different purposes (write-cache, metabase, pilorama, blobovnicza, testing registries, etc.). Currently we are just spamming it with bbolt-specifics when we could use an interface to make it easy to switch or compose specific components easily. The argument could be equally applied to every thin interface in the standard `io` package. A key-value repository is pretty much the basic building block of our products. I would love to hear other's opinions as well.
dstepanov-yadro approved these changes 2023-06-21 07:15:58 +00:00
fyrchik requested review from storage-core-committers 2023-06-21 07:19:34 +00:00
fyrchik requested review from storage-core-developers 2023-06-21 07:19:34 +00:00

I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases. For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache.
In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca.

I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases. For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache. In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca.
Poster
Collaborator

I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases.

This applies to any interface. When would the right time be?

For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache.
In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca.

The idea is not to abstract what pilorama and writecache and others do with the key value repository, but to abstract what realistic implementations of key value repositories provide (bbolt, badger, others).

Consider the alternative: we make separate interfaces WriteCacheKVRepo, PiloramaKVRepo, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better.

But perhaps I don't really understand what @fyrchik means by "So what about implementing some interface for just the write-cache?". If you can provide some minimal example, that would help and I will happily oblige if everyone thinks that's better.

> I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases. This applies to any interface. When would the right time be? > For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache. > In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca. The idea is not to abstract what pilorama and writecache and others do with the key value repository, but to abstract what realistic implementations of key value repositories provide (bbolt, badger, others). Consider the alternative: we make separate interfaces `WriteCacheKVRepo`, `PiloramaKVRepo`, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better. But perhaps I don't really understand what @fyrchik means by *"So what about implementing some interface for just the write-cache?"*. If you can provide some minimal example, that would help and I will happily oblige if everyone thinks that's better.
aarifullin approved these changes 2023-06-21 09:36:23 +00:00
aarifullin left a comment
Collaborator

LGTM

LGTM

Consider the alternative: we make separate interfaces WriteCacheKVRepo, PiloramaKVRepo, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better.

But when we will have duplicates then we can merge they into one, if they are the same.

> Consider the alternative: we make separate interfaces `WriteCacheKVRepo`, `PiloramaKVRepo`, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better. But when we will have duplicates then we can merge they into one, if they are the same.
Poster
Collaborator

Consider the alternative: we make separate interfaces WriteCacheKVRepo, PiloramaKVRepo, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better.

But when we will have duplicates then we can merge they into one, if they are the same.

How is that different/better than splitting them when they diverge instead?

> > Consider the alternative: we make separate interfaces `WriteCacheKVRepo`, `PiloramaKVRepo`, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better. > > But when we will have duplicates then we can merge they into one, if they are the same. > How is that different/better than splitting them when they diverge instead?

How is that different/better than splitting them when they diverge instead?

Well, it is just my point of view.

> How is that different/better than splitting them when they diverge instead? Well, it is just my point of view.
dstepanov-yadro approved these changes 2023-06-21 15:16:53 +00:00
ale64bit closed this pull request 2023-06-22 07:34:34 +00:00
All checks were successful
ci/woodpecker/pr/pre-commit Pipeline was successful
Required
Details

Pull request closed

Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#454
There is no content yet.