[#421] Create common interface for kv repositories #454

ale64bit · 2023-06-20T09:48:13Z

ale64bit commented

2023-06-20 09:48:13 +00:00

Add a common interface for key-value repositories.
This makes it easier to keep implementation-specific details from leaking into our codebase and swap underlying implementations with ease for the various components.
In this particular PR, writecache is rewritten using the interfaces and a small example is provided on how to use e.g. a badger-backed repository instead.

Signed-off-by: Alejandro Lopez a.lopez@yadro.com

Add a common interface for key-value repositories. This makes it easier to keep implementation-specific details from leaking into our codebase and swap underlying implementations with ease for the various components. In this particular PR, `writecache` is rewritten using the interfaces and a small example is provided on how to use e.g. a badger-backed repository instead. Signed-off-by: Alejandro Lopez <a.lopez@yadro.com>

ale64bit added the

discussion

label 2023-06-20 09:48:13 +00:00

ale64bit force-pushed feature/kvio from 9eae170c80 to 00c8712dc6

2023-06-20 09:55:21 +00:00

Compare

requested reviews from storage-core-committers, storage-core-developers

2023-06-20 09:55:55 +00:00

fyrchik reviewed 2023-06-20 10:25:02 +00:00

fyrchik left a comment

I am not sure I see the usefulness of introducing new interface:

The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple.
In some places the interface does not reflect the DB structure: for example for bolt Write uses Batch, but ReadWrite uses Update -- the distinction between Write/ReadWrite seems unrelated to the Batch/Update distinction. We would like to control such things on the client, the interface makes this harder.
This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value.
We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead).

So what about implementing some interface for just the write-cache?
I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions.

cc @TrueCloudLab/storage-core-developers @realloc

I am not sure I see the usefulness of introducing new interface: 1. The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple. 2. In some places the interface does not reflect the DB structure: for example for bolt `Write` uses `Batch`, but `ReadWrite` uses `Update` -- the distinction between `Write`/`ReadWrite` seems unrelated to the `Batch`/`Update` distinction. We would like to control such things on the client, the interface makes this harder. 3. This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value. 4. We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead). So what about implementing some interface for just the write-cache? I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions. cc @TrueCloudLab/storage-core-developers @realloc

cmd/frostfs-lens/internal/writecache/root.go

					
				@ -28,2 +30,2 @@

				func openWC(cmd *cobra.Command) *bbolt.DB {

					db, err := writecache.OpenDB(vPath, true, os.OpenFile)

				func openWC(cmd *cobra.Command) kvio.Repository {

					db, err := bboltrepo.Open(vPath, &bboltrepo.Options{

fyrchik commented

2023-06-20 10:15:24 +00:00

That is an implementation detail, at some point it can stop being bolt, for example.

That is an implementation detail, at some point it can stop being `bolt`, for example.

ale64bit commented

2023-06-20 10:46:42 +00:00

I'm aware. I just replaced the original call, which was already using an implementation detail (i.e. depending literally on bbolt).

ale64bit commented

2023-06-20 11:01:03 +00:00

I am not sure I see the usefulness of introducing new interface:

The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple.

This is intentional. Nothing wrong with thin interfaces, though? I would argue those are preferred in Go vs fat ones (such as our own common.Storage which is more of a virtual class than an interface).

In some places the interface does not reflect the DB structure: for example for bolt Write uses Batch, but ReadWrite uses Update -- the distinction between Write/ReadWrite seems unrelated to the Batch/Update distinction. We would like to control such things on the client, the interface makes this harder.

This is also intentional, but can be refined. The distinction between Batch/Update in bbolt is only meaningful for concurrent calls. While we can make the interface reflect this, I don't think we depend anywhere on actually blocking and waiting for flush with Update.
At any rate, I don't think this is an issue with the interface itself.

This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value.

How is the interface preventing this? It provides the means to do the same type of transactions as both repository implementations (including copy-less value reading and batched reads/writes)

We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead).

I don't see how is this related to having an interface or not.

So what about implementing some interface for just the write-cache?
I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions.

cc @TrueCloudLab/storage-core-developers @realloc

From my viewpoint, an interface that is specific to write-cache serves no purpose or is not as useful as having a layer isolating this common piece of utility (a key-value repository). We use these all over our codebase for different purposes (write-cache, metabase, pilorama, blobovnicza, testing registries, etc.). Currently we are just spamming it with bbolt-specifics when we could use an interface to make it easy to switch or compose specific components easily. The argument could be equally applied to every thin interface in the standard io package. A key-value repository is pretty much the basic building block of our products.

I would love to hear other's opinions as well.

> I am not sure I see the usefulness of introducing new interface: > 1. The interface is only used where raw KV pairs are stored and most of the wrappers are rather simple. This is intentional. Nothing wrong with thin interfaces, though? I would argue those are preferred in Go vs fat ones (such as our own `common.Storage` which is more of a virtual class than an interface). > 2. In some places the interface does not reflect the DB structure: for example for bolt `Write` uses `Batch`, but `ReadWrite` uses `Update` -- the distinction between `Write`/`ReadWrite` seems unrelated to the `Batch`/`Update` distinction. We would like to control such things on the client, the interface makes this harder. This is also intentional, but can be refined. The distinction between `Batch`/`Update` in bbolt is only meaningful for concurrent calls. While we can make the interface reflect this, I don't think we depend anywhere on actually blocking and waiting for flush with `Update`. At any rate, I don't think this is an issue with the interface itself. > 3. This also prevents us from doing selective optimizations (like we MUST read in batches for boltDB (because of remapping), and could perform more optimally for badger (maybe there is no need to accumulate batches in memory?) Same for copying the key/value. How is the interface preventing this? It provides the means to do the same type of transactions as both repository implementations (including copy-less value reading and batched reads/writes) > 4. We can change the schema while moving to badger, for example do not encode address to string (it does 3 allocations, so we could use 64-byte fixed key instead). I don't see how is this related to having an interface or not. > > So what about implementing some interface for just the write-cache? > I think it would be easier to maintain, albeit bigger in terms of LOC, would like to hear other opinions. > > cc @TrueCloudLab/storage-core-developers @realloc From my viewpoint, an interface that is specific to write-cache serves no purpose or is not as useful as having a layer isolating this common piece of utility (a key-value repository). We use these all over our codebase for different purposes (write-cache, metabase, pilorama, blobovnicza, testing registries, etc.). Currently we are just spamming it with bbolt-specifics when we could use an interface to make it easy to switch or compose specific components easily. The argument could be equally applied to every thin interface in the standard `io` package. A key-value repository is pretty much the basic building block of our products. I would love to hear other's opinions as well.

dstepanov-yadro approved these changes 2023-06-21 07:15:58 +00:00

requested reviews from storage-core-committers, storage-core-developers

2023-06-21 07:19:34 +00:00

dstepanov-yadro commented

2023-06-21 07:26:24 +00:00

I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases. For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache.
In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca.

I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases. For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache. In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca.

ale64bit commented

2023-06-21 07:38:35 +00:00

I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases.

This applies to any interface. When would the right time be?

For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache.
In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca.

The idea is not to abstract what pilorama and writecache and others do with the key value repository, but to abstract what realistic implementations of key value repositories provide (bbolt, badger, others).

Consider the alternative: we make separate interfaces WriteCacheKVRepo, PiloramaKVRepo, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better.

But perhaps I don't really understand what @fyrchik means by "So what about implementing some interface for just the write-cache?". If you can provide some minimal example, that would help and I will happily oblige if everyone thinks that's better.

> I think it's premature to make one interface for key-value storage right now, since it's possible that you won't be able to use it in all cases. This applies to any interface. When would the right time be? > For example, it is possible that the key-value implementation of the repository for pilorama will differ from the implementation for writecache. > In addition, this interface will no longer be as thin as we would like, since the backend for writecache may differ from the backend for pilorama or blobovnizca. The idea is not to abstract what pilorama and writecache and others do with the key value repository, but to abstract what realistic implementations of key value repositories provide (bbolt, badger, others). Consider the alternative: we make separate interfaces `WriteCacheKVRepo`, `PiloramaKVRepo`, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better. But perhaps I don't really understand what @fyrchik means by *"So what about implementing some interface for just the write-cache?"*. If you can provide some minimal example, that would help and I will happily oblige if everyone thinks that's better.

aarifullin approved these changes 2023-06-21 09:36:23 +00:00

aarifullin left a comment

LGTM

dstepanov-yadro commented

2023-06-21 11:38:08 +00:00

Consider the alternative: we make separate interfaces WriteCacheKVRepo, PiloramaKVRepo, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better.

But when we will have duplicates then we can merge they into one, if they are the same.

> Consider the alternative: we make separate interfaces `WriteCacheKVRepo`, `PiloramaKVRepo`, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better. But when we will have duplicates then we can merge they into one, if they are the same.

ale64bit commented

2023-06-21 11:58:51 +00:00

Consider the alternative: we make separate interfaces WriteCacheKVRepo, PiloramaKVRepo, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better.

But when we will have duplicates then we can merge they into one, if they are the same.

How is that different/better than splitting them when they diverge instead?

> > Consider the alternative: we make separate interfaces `WriteCacheKVRepo`, `PiloramaKVRepo`, etc. to hide bbolt or badger and these interfaces will end up being mostly identical to the one provided in this PR, for no further benefit. We will have to have duplicate wrappers for bbolt, badger to satisfy those interfaces. I don't see how this is any better. > > But when we will have duplicates then we can merge they into one, if they are the same. > How is that different/better than splitting them when they diverge instead?

dstepanov-yadro commented

2023-06-21 15:16:36 +00:00